Linux Power Management development

Linux Power Management development
 help / color / mirror / Atom feed

* Re: Re: [PATCH 1/2] cgroup/cpuset: Make cpuset hotplug processing synchronous
From: Michal Koutný @ 2024-04-03 14:54 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Waiman Long, Tejun Heo, Zefan Li, Johannes Weiner,
	Thomas Gleixner, Peter Zijlstra, Rafael J. Wysocki, Len Brown,
	Pavel Machek, Shuah Khan, linux-kernel, cgroups, linux-pm,
	linux-kselftest, Frederic Weisbecker, Paul E. McKenney,
	Ingo Molnar, Anna-Maria Behnsen, Alex Shi, Vincent Guittot,
	Barry Song
In-Reply-To: <xhsmhedbmbjz5.mognet@vschneid-thinkpadt14sgen2i.remote.csb>

[-- Attachment #1: Type: text/plain, Size: 846 bytes --]

On Wed, Apr 03, 2024 at 04:26:38PM +0200, Valentin Schneider <vschneid@redhat.com> wrote:
> Also, I gave Michal's patch a try and it looks like it's introducing a

Thank you.

>   cgroup_threadgroup_rwsem -> cpuset_mutex
> ordering from
>   cgroup_transfer_tasks_locked()
>   `\
>     percpu_down_write(&cgroup_threadgroup_rwsem);
>     cgroup_migrate()
>     `\
>       cgroup_migrate_execute()
>       `\
>         ss->can_attach() // cpuset_can_attach()
>         `\
>           mutex_lock(&cpuset_mutex);
> 
> which is invalid, see below.

_This_ should be the right order (cpuset_mutex inside
cgroup_threadgroup_rwsem), at least in my mental model. Thus I missed
that cpuset_mutex must have been taken somewhere higher up in the
hotplug code (CPU 0 in the lockdep dump, I can't easily see where from)
:-/.

Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH V2 0/2] PM: wakeup: make device_wakeup_disable return void
From: Rafael J. Wysocki @ 2024-04-03 14:53 UTC (permalink / raw)
  To: Dhruva Gole
  Cc: Rafael J . Wysocki, Tony Lindgren, Len Brown, Pavel Machek,
	Greg Kroah-Hartman, Viresh Kumar, theo.lebrun, linux-pm,
	linux-kernel, linux-mmc
In-Reply-To: <20240318151631.1621881-1-d-gole@ti.com>

On Mon, Mar 18, 2024 at 4:17 PM Dhruva Gole <d-gole@ti.com> wrote:
>
> This is a follow up patch based on discussions with Rafael[0] on a previous
> patch I sent to propagate return value from device_wakeup_disable
> further upward inside device_init_wakeup
>
> However, it doesn't seem like today any return values from
> device_wakeup_disable are very useful to the caller.
>
> I could only spot one caller of this function that was actually
> propagating the return value upward other than the PM core calls. I have
> tried to update sdhci-pci-core to work with the new changes
>
> [0] https://lore.kernel.org/all/CAJZ5v0jbHwiZemtNAoM-jmgB_58VqmKUkqv4P7qrPkxWzBzMyQ@mail.gmail.com/
>
> Changelog:
>
> v1 --> v2:
> * Squashed the mmc fix into first patch [Rafael]
>
> Dhruva Gole (2):
>   PM: wakeup: make device_wakeup_disable return void
>   PM: wakeup: Remove unnecessary else from device_init_wakeup
>
>  drivers/base/power/wakeup.c       | 11 +++++++----
>  drivers/mmc/host/sdhci-pci-core.c |  2 +-
>  include/linux/pm_wakeup.h         | 12 +++++-------
>  3 files changed, 13 insertions(+), 12 deletions(-)
>
> --

Both patches applied as 6.10 material, thanks!

^ permalink raw reply

* Re: [PATCH 1/2] cgroup/cpuset: Make cpuset hotplug processing synchronous
From: Waiman Long @ 2024-04-03 14:47 UTC (permalink / raw)
  To: Valentin Schneider, Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Thomas Gleixner,
	Peter Zijlstra, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Shuah Khan, linux-kernel, cgroups, linux-pm, linux-kselftest,
	Frederic Weisbecker, Paul E. McKenney, Ingo Molnar,
	Anna-Maria Behnsen, Alex Shi, Vincent Guittot, Barry Song
In-Reply-To: <xhsmhedbmbjz5.mognet@vschneid-thinkpadt14sgen2i.remote.csb>

On 4/3/24 10:26, Valentin Schneider wrote:
> On 03/04/24 09:38, Waiman Long wrote:
>> On 4/3/24 08:02, Michal Koutný wrote:
>>> On Tue, Apr 02, 2024 at 11:30:11AM -0400, Waiman Long <longman@redhat.com> wrote:
>>>> Yes, there is a potential that a cpus_read_lock() may be called leading to
>>>> deadlock. So unless we reverse the current cgroup_mutex --> cpu_hotplug_lock
>>>> ordering, it is not safe to call cgroup_transfer_tasks() directly.
>>> I see that cgroup_transfer_tasks() has the only user -- cpuset. What
>>> about bending it for the specific use like:
>>>
>>> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
>>> index 34aaf0e87def..64deb7212c5c 100644
>>> --- a/include/linux/cgroup.h
>>> +++ b/include/linux/cgroup.h
>>> @@ -109,7 +109,7 @@ struct cgroup *cgroup_get_from_fd(int fd);
>>>    struct cgroup *cgroup_v1v2_get_from_fd(int fd);
>>>
>>>    int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
>>> -int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from);
>>> +int cgroup_transfer_tasks_locked(struct cgroup *to, struct cgroup *from);
>>>
>>>    int cgroup_add_dfl_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
>>>    int cgroup_add_legacy_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
>>> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
>>> index 520a11cb12f4..f97025858c7a 100644
>>> --- a/kernel/cgroup/cgroup-v1.c
>>> +++ b/kernel/cgroup/cgroup-v1.c
>>> @@ -91,7 +91,8 @@ EXPORT_SYMBOL_GPL(cgroup_attach_task_all);
>>>     *
>>>     * Return: %0 on success or a negative errno code on failure
>>>     */
>>> -int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
>>> +int cgroup_transfer_tasks_locked(struct cgroup *to, struct cgroup *from)
>>>    {
>>>       DEFINE_CGROUP_MGCTX(mgctx);
>>>       struct cgrp_cset_link *link;
>>> @@ -106,9 +106,11 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
>>>       if (ret)
>>>               return ret;
>>>
>>> -	cgroup_lock();
>>> -
>>> -	cgroup_attach_lock(true);
>>> +	/* The locking rules serve specific purpose of v1 cpuset hotplug
>>> +	 * migration, see hotplug_update_tasks_legacy() and
>>> +	 * cgroup_attach_lock() */
>>> +	lockdep_assert_held(&cgroup_mutex);
>>> +	lockdep_assert_cpus_held();
>>> +	percpu_down_write(&cgroup_threadgroup_rwsem);
>>>
>>>       /* all tasks in @from are being moved, all csets are source */
>>>       spin_lock_irq(&css_set_lock);
>>> @@ -144,8 +146,7 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
>>>       } while (task && !ret);
>>>    out_err:
>>>       cgroup_migrate_finish(&mgctx);
>>> -	cgroup_attach_unlock(true);
>>> -	cgroup_unlock();
>>> +	percpu_up_write(&cgroup_threadgroup_rwsem);
>>>       return ret;
>>>    }
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 13d27b17c889..94fb8b26f038 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -4331,7 +4331,7 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
>>>                       nodes_empty(parent->mems_allowed))
>>>               parent = parent_cs(parent);
>>>
>>> -	if (cgroup_transfer_tasks(parent->css.cgroup, cs->css.cgroup)) {
>>> +	if (cgroup_transfer_tasks_locked(parent->css.cgroup, cs->css.cgroup)) {
>>>               pr_err("cpuset: failed to transfer tasks out of empty cpuset ");
>>>               pr_cont_cgroup_name(cs->css.cgroup);
>>>               pr_cont("\n");
>>> @@ -4376,21 +4376,9 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
>>>
>>>       /*
>>>        * Move tasks to the nearest ancestor with execution resources,
>>> -	 * This is full cgroup operation which will also call back into
>>> -	 * cpuset. Execute it asynchronously using workqueue.
>>>        */
>>> -	if (is_empty && css_tryget_online(&cs->css)) {
>>> -		struct cpuset_remove_tasks_struct *s;
>>> -
>>> -		s = kzalloc(sizeof(*s), GFP_KERNEL);
>>> -		if (WARN_ON_ONCE(!s)) {
>>> -			css_put(&cs->css);
>>> -			return;
>>> -		}
>>> -
>>> -		s->cs = cs;
>>> -		INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
>>> -		schedule_work(&s->work);
>>> +	if (is_empty)
>>> +		remove_tasks_in_empty_cpuset(cs);
>>>       }
>>>    }
>>>
>> It still won't work because of the possibility of mutiple tasks
>> involving in a circular locking dependency. The hotplug thread always
>> acquire the cpu_hotplug_lock first before acquiring cpuset_mutex or
>> cgroup_mtuex in this case (cpu_hotplug_lock --> cgroup_mutex). Other
>> tasks calling into cgroup code will acquire the pair in the order
>> cgroup_mutex --> cpu_hotplug_lock. This may lead to a deadlock if these
>> 2 locking sequences happen at the same time. Lockdep will certainly
>> spill out a splat because of this.
>> So unless we change all the relevant
>> cgroup code to the new cpu_hotplug_lock --> cgroup_mutex locking order,
>> the hotplug code can't call cgroup_transfer_tasks() directly.
>>
> IIUC that was Thomas' suggestion [1], but I can't tell yet how bad it would
> be to change cgroup_lock() to also do a cpus_read_lock().

Changing the locking order is certainly doable. I have taken a cursory 
look at it and at least the following files need to be changed:

  kernel/bpf/cgroup.c
  kernel/cgroup/cgroup.c
  kernel/cgroup/legacy_freezer.c
  mm/memcontrol.c

That requires a lot more testing to make sure that there won't be a 
regression. Given that the call to cgroup_transfer_tasks() should be 
rare these days as it will only apply in the case of cgroup v1 under 
certain condtion, I am not sure this requirement justifies making such 
extensive changes. So I kind of defer it until we reach a consensus that 
it is the right thing to do.

Cheers,
Longman

>
> Also, I gave Michal's patch a try and it looks like it's introducing a
>    cgroup_threadgroup_rwsem -> cpuset_mutex
> ordering from
>    cgroup_transfer_tasks_locked()
>    `\
>      percpu_down_write(&cgroup_threadgroup_rwsem);
>      cgroup_migrate()
>      `\
>        cgroup_migrate_execute()
>        `\
>          ss->can_attach() // cpuset_can_attach()
>          `\
>            mutex_lock(&cpuset_mutex);
>
> which is invalid, see below.
>
> [1]: https://lore.kernel.org/lkml/87cyrfe7a3.ffs@tglx/
>
> [   77.627915] WARNING: possible circular locking dependency detected
> [   77.628419] 6.9.0-rc1-00042-g844178b616c7-dirty #23 Tainted: G        W
> [   77.629035] ------------------------------------------------------
> [   77.629548] cpuhp/2/24 is trying to acquire lock:
> [   77.629946] ffffffff82d680b0 (cgroup_threadgroup_rwsem){++++}-{0:0}, at: cgroup_transfer_tasks_locked+0x123/0x450
> [   77.630851]
> [   77.630851] but task is already holding lock:
> [   77.631397] ffffffff82d6c088 (cpuset_mutex){+.+.}-{3:3}, at: cpuset_update_active_cpus+0x352/0xca0
> [   77.632169]
> [   77.632169] which lock already depends on the new lock.
> [   77.632169]
> [   77.632891]
> [   77.632891] the existing dependency chain (in reverse order) is:
> [   77.633521]
> [   77.633521] -> #1 (cpuset_mutex){+.+.}-{3:3}:
> [   77.634028]        lock_acquire+0xc0/0x2d0
> [   77.634393]        __mutex_lock+0xaa/0x710
> [   77.634751]        cpuset_can_attach+0x6d/0x2c0
> [   77.635146]        cgroup_migrate_execute+0x6f/0x520
> [   77.635565]        cgroup_attach_task+0x2e2/0x450
> [   77.635989]        __cgroup1_procs_write.isra.0+0xfd/0x150
> [   77.636440]        kernfs_fop_write_iter+0x127/0x1c0
> [   77.636917]        vfs_write+0x2b0/0x540
> [   77.637330]        ksys_write+0x70/0xf0
> [   77.637707]        int80_emulation+0xf8/0x1b0
> [   77.638183]        asm_int80_emulation+0x1a/0x20
> [   77.638636]
> [   77.638636] -> #0 (cgroup_threadgroup_rwsem){++++}-{0:0}:
> [   77.639321]        check_prev_add+0xeb/0xb20
> [   77.639751]        __lock_acquire+0x12ac/0x16d0
> [   77.640345]        lock_acquire+0xc0/0x2d0
> [   77.640903]        percpu_down_write+0x33/0x260
> [   77.641347]        cgroup_transfer_tasks_locked+0x123/0x450
> [   77.641826]        cpuset_update_active_cpus+0x782/0xca0
> [   77.642268]        sched_cpu_deactivate+0x1ad/0x1d0
> [   77.642677]        cpuhp_invoke_callback+0x16b/0x6b0
> [   77.643098]        cpuhp_thread_fun+0x1ba/0x240
> [   77.643488]        smpboot_thread_fn+0xd8/0x1d0
> [   77.643873]        kthread+0xce/0x100
> [   77.644209]        ret_from_fork+0x2f/0x50
> [   77.644626]        ret_from_fork_asm+0x1a/0x30
> [   77.645084]
> [   77.645084] other info that might help us debug this:
> [   77.645084]
> [   77.645829]  Possible unsafe locking scenario:
> [   77.645829]
> [   77.646356]        CPU0                    CPU1
> [   77.646748]        ----                    ----
> [   77.647143]   lock(cpuset_mutex);
> [   77.647529]                                lock(cgroup_threadgroup_rwsem);
> [   77.648193]                                lock(cpuset_mutex);
> [   77.648767]   lock(cgroup_threadgroup_rwsem);
> [   77.649216]
> [   77.649216]  *** DEADLOCK ***
>


^ permalink raw reply

* Re: [PATCH 27/34] cpufreq: intel_pstate: hide unused intel_pstate_cpu_oob_ids[]
From: Rafael J. Wysocki @ 2024-04-03 14:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, Srinivas Pandruvada, Len Brown, Rafael J. Wysocki,
	Viresh Kumar, Arnd Bergmann, Doug Smythies, Zhenguo Yao,
	Tero Kristo, Jiri Slaby (SUSE), linux-pm
In-Reply-To: <20240403080702.3509288-28-arnd@kernel.org>

On Wed, Apr 3, 2024 at 10:11 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> The reference to this variable is hidden in an #ifdef:
>
> drivers/cpufreq/intel_pstate.c:2440:32: error: 'intel_pstate_cpu_oob_ids' defined but not used [-Werror=unused-const-variable=]
>
> Use the same check around the definition.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  drivers/cpufreq/intel_pstate.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index dbbf299f4219..29ce9edc6f68 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -2437,6 +2437,7 @@ static const struct x86_cpu_id intel_pstate_cpu_ids[] = {
>  };
>  MODULE_DEVICE_TABLE(x86cpu, intel_pstate_cpu_ids);
>
> +#ifdef CONFIG_ACPI
>  static const struct x86_cpu_id intel_pstate_cpu_oob_ids[] __initconst = {
>         X86_MATCH(BROADWELL_D,          core_funcs),
>         X86_MATCH(BROADWELL_X,          core_funcs),
> @@ -2445,6 +2446,7 @@ static const struct x86_cpu_id intel_pstate_cpu_oob_ids[] __initconst = {
>         X86_MATCH(SAPPHIRERAPIDS_X,     core_funcs),
>         {}
>  };
> +#endif
>
>  static const struct x86_cpu_id intel_pstate_cpu_ee_disable_ids[] = {
>         X86_MATCH(KABYLAKE,             core_funcs),
> --

Applied as 6.10 material, thanks!

^ permalink raw reply

* Re: [PATCH v2 1/3] thermal: gov_power_allocator: Allow binding without cooling devices
From: Rafael J. Wysocki @ 2024-04-03 14:41 UTC (permalink / raw)
  To: Lukasz Luba, Nikita Travkin
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, linux-pm, Daniel Lezcano,
	linux-kernel, Zhang Rui
In-Reply-To: <d048f863-05d1-4aeb-8904-4c09ecbe55d9@arm.com>

On Wed, Apr 3, 2024 at 2:44 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 4/3/24 12:31, Nikita Travkin via B4 Relay wrote:
> > From: Nikita Travkin <nikita@trvn.ru>
> >
> > IPA was recently refactored to split out memory allocation into a
> > separate funciton. That funciton was made to return -EINVAL if there is
> > zero power_actors and thus no memory to allocate. This causes IPA to
> > fail probing when the thermal zone has no attached cooling devices.
> >
> > Since cooling devices can attach after the thermal zone is created and
> > the governer is attached to it, failing probe due to the lack of cooling
> > devices is incorrect.
> >
> > Change the allocate_actors_buffer() to return success when there is no
> > cooling devices present.
> >
> > Fixes: 912e97c67cc3 ("thermal: gov_power_allocator: Move memory allocation out of throttle()")
> > Signed-off-by: Nikita Travkin <nikita@trvn.ru>
> > ---
> >   drivers/thermal/gov_power_allocator.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
> > index 1b17dc4c219c..ec637071ef1f 100644
> > --- a/drivers/thermal/gov_power_allocator.c
> > +++ b/drivers/thermal/gov_power_allocator.c
> > @@ -606,7 +606,7 @@ static int allocate_actors_buffer(struct power_allocator_params *params,
> >
> >       /* There might be no cooling devices yet. */
> >       if (!num_actors) {
> > -             ret = -EINVAL;
> > +             ret = 0;
> >               goto clean_state;
> >       }
> >
> >
>
> LGTM
>
> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

Applied as 6.9-rc material along with the [2/3], thanks!

^ permalink raw reply

* Re: [PATCH 1/2] cgroup/cpuset: Make cpuset hotplug processing synchronous
From: Valentin Schneider @ 2024-04-03 14:26 UTC (permalink / raw)
  To: Waiman Long, Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Thomas Gleixner,
	Peter Zijlstra, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Shuah Khan, linux-kernel, cgroups, linux-pm, linux-kselftest,
	Frederic Weisbecker, Paul E. McKenney, Ingo Molnar,
	Anna-Maria Behnsen, Alex Shi, Vincent Guittot, Barry Song
In-Reply-To: <7e62b37d-6c9c-4e55-a01a-175695475cb5@redhat.com>

On 03/04/24 09:38, Waiman Long wrote:
> On 4/3/24 08:02, Michal Koutný wrote:
>> On Tue, Apr 02, 2024 at 11:30:11AM -0400, Waiman Long <longman@redhat.com> wrote:
>>> Yes, there is a potential that a cpus_read_lock() may be called leading to
>>> deadlock. So unless we reverse the current cgroup_mutex --> cpu_hotplug_lock
>>> ordering, it is not safe to call cgroup_transfer_tasks() directly.
>> I see that cgroup_transfer_tasks() has the only user -- cpuset. What
>> about bending it for the specific use like:
>>
>> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
>> index 34aaf0e87def..64deb7212c5c 100644
>> --- a/include/linux/cgroup.h
>> +++ b/include/linux/cgroup.h
>> @@ -109,7 +109,7 @@ struct cgroup *cgroup_get_from_fd(int fd);
>>   struct cgroup *cgroup_v1v2_get_from_fd(int fd);
>>
>>   int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
>> -int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from);
>> +int cgroup_transfer_tasks_locked(struct cgroup *to, struct cgroup *from);
>>
>>   int cgroup_add_dfl_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
>>   int cgroup_add_legacy_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
>> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
>> index 520a11cb12f4..f97025858c7a 100644
>> --- a/kernel/cgroup/cgroup-v1.c
>> +++ b/kernel/cgroup/cgroup-v1.c
>> @@ -91,7 +91,8 @@ EXPORT_SYMBOL_GPL(cgroup_attach_task_all);
>>    *
>>    * Return: %0 on success or a negative errno code on failure
>>    */
>> -int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
>> +int cgroup_transfer_tasks_locked(struct cgroup *to, struct cgroup *from)
>>   {
>>      DEFINE_CGROUP_MGCTX(mgctx);
>>      struct cgrp_cset_link *link;
>> @@ -106,9 +106,11 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
>>      if (ret)
>>              return ret;
>>
>> -	cgroup_lock();
>> -
>> -	cgroup_attach_lock(true);
>> +	/* The locking rules serve specific purpose of v1 cpuset hotplug
>> +	 * migration, see hotplug_update_tasks_legacy() and
>> +	 * cgroup_attach_lock() */
>> +	lockdep_assert_held(&cgroup_mutex);
>> +	lockdep_assert_cpus_held();
>> +	percpu_down_write(&cgroup_threadgroup_rwsem);
>>
>>      /* all tasks in @from are being moved, all csets are source */
>>      spin_lock_irq(&css_set_lock);
>> @@ -144,8 +146,7 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
>>      } while (task && !ret);
>>   out_err:
>>      cgroup_migrate_finish(&mgctx);
>> -	cgroup_attach_unlock(true);
>> -	cgroup_unlock();
>> +	percpu_up_write(&cgroup_threadgroup_rwsem);
>>      return ret;
>>   }
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 13d27b17c889..94fb8b26f038 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -4331,7 +4331,7 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
>>                      nodes_empty(parent->mems_allowed))
>>              parent = parent_cs(parent);
>>
>> -	if (cgroup_transfer_tasks(parent->css.cgroup, cs->css.cgroup)) {
>> +	if (cgroup_transfer_tasks_locked(parent->css.cgroup, cs->css.cgroup)) {
>>              pr_err("cpuset: failed to transfer tasks out of empty cpuset ");
>>              pr_cont_cgroup_name(cs->css.cgroup);
>>              pr_cont("\n");
>> @@ -4376,21 +4376,9 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
>>
>>      /*
>>       * Move tasks to the nearest ancestor with execution resources,
>> -	 * This is full cgroup operation which will also call back into
>> -	 * cpuset. Execute it asynchronously using workqueue.
>>       */
>> -	if (is_empty && css_tryget_online(&cs->css)) {
>> -		struct cpuset_remove_tasks_struct *s;
>> -
>> -		s = kzalloc(sizeof(*s), GFP_KERNEL);
>> -		if (WARN_ON_ONCE(!s)) {
>> -			css_put(&cs->css);
>> -			return;
>> -		}
>> -
>> -		s->cs = cs;
>> -		INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
>> -		schedule_work(&s->work);
>> +	if (is_empty)
>> +		remove_tasks_in_empty_cpuset(cs);
>>      }
>>   }
>>
>
> It still won't work because of the possibility of mutiple tasks
> involving in a circular locking dependency. The hotplug thread always
> acquire the cpu_hotplug_lock first before acquiring cpuset_mutex or
> cgroup_mtuex in this case (cpu_hotplug_lock --> cgroup_mutex). Other
> tasks calling into cgroup code will acquire the pair in the order
> cgroup_mutex --> cpu_hotplug_lock. This may lead to a deadlock if these
> 2 locking sequences happen at the same time. Lockdep will certainly
> spill out a splat because of this.

> So unless we change all the relevant
> cgroup code to the new cpu_hotplug_lock --> cgroup_mutex locking order,
> the hotplug code can't call cgroup_transfer_tasks() directly.
>

IIUC that was Thomas' suggestion [1], but I can't tell yet how bad it would
be to change cgroup_lock() to also do a cpus_read_lock().

Also, I gave Michal's patch a try and it looks like it's introducing a
  cgroup_threadgroup_rwsem -> cpuset_mutex
ordering from
  cgroup_transfer_tasks_locked()
  `\
    percpu_down_write(&cgroup_threadgroup_rwsem);
    cgroup_migrate()
    `\
      cgroup_migrate_execute()
      `\
        ss->can_attach() // cpuset_can_attach()
        `\
          mutex_lock(&cpuset_mutex);

which is invalid, see below.

[1]: https://lore.kernel.org/lkml/87cyrfe7a3.ffs@tglx/

[   77.627915] WARNING: possible circular locking dependency detected
[   77.628419] 6.9.0-rc1-00042-g844178b616c7-dirty #23 Tainted: G        W
[   77.629035] ------------------------------------------------------
[   77.629548] cpuhp/2/24 is trying to acquire lock:
[   77.629946] ffffffff82d680b0 (cgroup_threadgroup_rwsem){++++}-{0:0}, at: cgroup_transfer_tasks_locked+0x123/0x450
[   77.630851]
[   77.630851] but task is already holding lock:
[   77.631397] ffffffff82d6c088 (cpuset_mutex){+.+.}-{3:3}, at: cpuset_update_active_cpus+0x352/0xca0
[   77.632169]
[   77.632169] which lock already depends on the new lock.
[   77.632169]
[   77.632891]
[   77.632891] the existing dependency chain (in reverse order) is:
[   77.633521]
[   77.633521] -> #1 (cpuset_mutex){+.+.}-{3:3}:
[   77.634028]        lock_acquire+0xc0/0x2d0
[   77.634393]        __mutex_lock+0xaa/0x710
[   77.634751]        cpuset_can_attach+0x6d/0x2c0
[   77.635146]        cgroup_migrate_execute+0x6f/0x520
[   77.635565]        cgroup_attach_task+0x2e2/0x450
[   77.635989]        __cgroup1_procs_write.isra.0+0xfd/0x150
[   77.636440]        kernfs_fop_write_iter+0x127/0x1c0
[   77.636917]        vfs_write+0x2b0/0x540
[   77.637330]        ksys_write+0x70/0xf0
[   77.637707]        int80_emulation+0xf8/0x1b0
[   77.638183]        asm_int80_emulation+0x1a/0x20
[   77.638636]
[   77.638636] -> #0 (cgroup_threadgroup_rwsem){++++}-{0:0}:
[   77.639321]        check_prev_add+0xeb/0xb20
[   77.639751]        __lock_acquire+0x12ac/0x16d0
[   77.640345]        lock_acquire+0xc0/0x2d0
[   77.640903]        percpu_down_write+0x33/0x260
[   77.641347]        cgroup_transfer_tasks_locked+0x123/0x450
[   77.641826]        cpuset_update_active_cpus+0x782/0xca0
[   77.642268]        sched_cpu_deactivate+0x1ad/0x1d0
[   77.642677]        cpuhp_invoke_callback+0x16b/0x6b0
[   77.643098]        cpuhp_thread_fun+0x1ba/0x240
[   77.643488]        smpboot_thread_fn+0xd8/0x1d0
[   77.643873]        kthread+0xce/0x100
[   77.644209]        ret_from_fork+0x2f/0x50
[   77.644626]        ret_from_fork_asm+0x1a/0x30
[   77.645084]
[   77.645084] other info that might help us debug this:
[   77.645084]
[   77.645829]  Possible unsafe locking scenario:
[   77.645829]
[   77.646356]        CPU0                    CPU1
[   77.646748]        ----                    ----
[   77.647143]   lock(cpuset_mutex);
[   77.647529]                                lock(cgroup_threadgroup_rwsem);
[   77.648193]                                lock(cpuset_mutex);
[   77.648767]   lock(cgroup_threadgroup_rwsem);
[   77.649216]
[   77.649216]  *** DEADLOCK ***


^ permalink raw reply

* Re: [PATCH 1/2] cgroup/cpuset: Make cpuset hotplug processing synchronous
From: Waiman Long @ 2024-04-03 13:38 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Thomas Gleixner,
	Peter Zijlstra, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Shuah Khan, linux-kernel, cgroups, linux-pm, linux-kselftest,
	Frederic Weisbecker, Paul E. McKenney, Ingo Molnar,
	Valentin Schneider, Anna-Maria Behnsen, Alex Shi, Vincent Guittot,
	Barry Song
In-Reply-To: <u3naomgv34t5rnc7pmyy4zjppgf36skeo45orss2xnqcvtrcez@m74tsl2ws76f>

On 4/3/24 08:02, Michal Koutný wrote:
> On Tue, Apr 02, 2024 at 11:30:11AM -0400, Waiman Long <longman@redhat.com> wrote:
>> Yes, there is a potential that a cpus_read_lock() may be called leading to
>> deadlock. So unless we reverse the current cgroup_mutex --> cpu_hotplug_lock
>> ordering, it is not safe to call cgroup_transfer_tasks() directly.
> I see that cgroup_transfer_tasks() has the only user -- cpuset. What
> about bending it for the specific use like:
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 34aaf0e87def..64deb7212c5c 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -109,7 +109,7 @@ struct cgroup *cgroup_get_from_fd(int fd);
>   struct cgroup *cgroup_v1v2_get_from_fd(int fd);
>   
>   int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
> -int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from);
> +int cgroup_transfer_tasks_locked(struct cgroup *to, struct cgroup *from);
>   
>   int cgroup_add_dfl_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
>   int cgroup_add_legacy_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> index 520a11cb12f4..f97025858c7a 100644
> --- a/kernel/cgroup/cgroup-v1.c
> +++ b/kernel/cgroup/cgroup-v1.c
> @@ -91,7 +91,8 @@ EXPORT_SYMBOL_GPL(cgroup_attach_task_all);
>    *
>    * Return: %0 on success or a negative errno code on failure
>    */
> -int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
> +int cgroup_transfer_tasks_locked(struct cgroup *to, struct cgroup *from)
>   {
>   	DEFINE_CGROUP_MGCTX(mgctx);
>   	struct cgrp_cset_link *link;
> @@ -106,9 +106,11 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
>   	if (ret)
>   		return ret;
>   
> -	cgroup_lock();
> -
> -	cgroup_attach_lock(true);
> +	/* The locking rules serve specific purpose of v1 cpuset hotplug
> +	 * migration, see hotplug_update_tasks_legacy() and
> +	 * cgroup_attach_lock() */
> +	lockdep_assert_held(&cgroup_mutex);
> +	lockdep_assert_cpus_held();
> +	percpu_down_write(&cgroup_threadgroup_rwsem);
>   
>   	/* all tasks in @from are being moved, all csets are source */
>   	spin_lock_irq(&css_set_lock);
> @@ -144,8 +146,7 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
>   	} while (task && !ret);
>   out_err:
>   	cgroup_migrate_finish(&mgctx);
> -	cgroup_attach_unlock(true);
> -	cgroup_unlock();
> +	percpu_up_write(&cgroup_threadgroup_rwsem);
>   	return ret;
>   }
>   
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 13d27b17c889..94fb8b26f038 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -4331,7 +4331,7 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
>   			nodes_empty(parent->mems_allowed))
>   		parent = parent_cs(parent);
>   
> -	if (cgroup_transfer_tasks(parent->css.cgroup, cs->css.cgroup)) {
> +	if (cgroup_transfer_tasks_locked(parent->css.cgroup, cs->css.cgroup)) {
>   		pr_err("cpuset: failed to transfer tasks out of empty cpuset ");
>   		pr_cont_cgroup_name(cs->css.cgroup);
>   		pr_cont("\n");
> @@ -4376,21 +4376,9 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
>   
>   	/*
>   	 * Move tasks to the nearest ancestor with execution resources,
> -	 * This is full cgroup operation which will also call back into
> -	 * cpuset. Execute it asynchronously using workqueue.
>   	 */
> -	if (is_empty && css_tryget_online(&cs->css)) {
> -		struct cpuset_remove_tasks_struct *s;
> -
> -		s = kzalloc(sizeof(*s), GFP_KERNEL);
> -		if (WARN_ON_ONCE(!s)) {
> -			css_put(&cs->css);
> -			return;
> -		}
> -
> -		s->cs = cs;
> -		INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
> -		schedule_work(&s->work);
> +	if (is_empty)
> +		remove_tasks_in_empty_cpuset(cs);
>   	}
>   }
>   

It still won't work because of the possibility of mutiple tasks 
involving in a circular locking dependency. The hotplug thread always 
acquire the cpu_hotplug_lock first before acquiring cpuset_mutex or 
cgroup_mtuex in this case (cpu_hotplug_lock --> cgroup_mutex). Other 
tasks calling into cgroup code will acquire the pair in the order 
cgroup_mutex --> cpu_hotplug_lock. This may lead to a deadlock if these 
2 locking sequences happen at the same time. Lockdep will certainly 
spill out a splat because of this. So unless we change all the relevant 
cgroup code to the new cpu_hotplug_lock --> cgroup_mutex locking order, 
the hotplug code can't call cgroup_transfer_tasks() directly.

Cheers,
Longman


^ permalink raw reply

* Re: kernel/sched/core.c:961:15: error: incompatible pointer to integer conversion passing 'typeof
From: Vincent Guittot @ 2024-04-03 13:26 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: open list, Linux Regressions, lkft-triage, clang-built-linux,
	Linux PM, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Nathan Chancellor, Nick Desaulniers, linux-riscv, Paul Walmsley,
	Palmer Dabbelt, Albert Ou
In-Reply-To: <CA+G9fYtj3aBdRreBmKZDQApEe2x8mugycPgN+_J5ebJzXDEq4g@mail.gmail.com>

Hi Naresh,

Adding riscv people

On Wed, 3 Apr 2024 at 09:38, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> The riscv clang-17 defconfig build failed due to following warnings / errors
> on the Linux next-20240402.

Could you confirm that there is no problem with other arch and/or
other toolchain ?

>
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>
> riscv:
>   build:
>     * clang-17-lkftconfig - Failed
>     * rv32-clang-17-defconfig - Failed
>     * clang-17-tinyconfig - Failed
>     * rv32-clang-17-tinyconfig - Failed
>     * clang-17-defconfig - Failed
>     * clang-17-allnoconfig - Failed
>     * rv32-clang-17-allnoconfig - Failed
>
> Build log:
> -------
> kernel/sched/core.c:961:15: error: incompatible pointer to integer
> conversion passing 'typeof (*((__ai_ptr)))' (aka 'struct wake_q_node
> *') to parameter of type 'uintptr_t' (aka 'unsigned long')
> [-Wint-conversion]
>   961 |         if (unlikely(cmpxchg_relaxed(&node->next, NULL, WAKE_Q_TAIL)))
>       |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is no recent change on this code. Could it be a change in
cmpxchg_relaxed ?

>
> Steps to reproduce:
> ---------
> # tuxmake --runtime podman --target-arch riscv --toolchain clang-17
> --kconfig defconfig LLVM=1 LLVM_IAS=1
>
> Links:
>  - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240402/testrun/23264917/suite/build/test/clang-17-defconfig/details/
>  - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240402/testrun/23264917/suite/build/test/clang-17-defconfig/log
>
> --
> Linaro LKFT
> https://lkft.linaro.org

^ permalink raw reply

* Re: [PATCH v2 3/3] thermal: gov_power_allocator: Suppress sustainable_power warning without trip_points
From: Nikita Travkin @ 2024-04-03 13:05 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, linux-pm, Zhang Rui,
	Daniel Lezcano, linux-kernel
In-Reply-To: <187a3acb-d4a8-41e6-822c-f901a693aae1@arm.com>

ср, 3 апр. 2024 г. в 17:52, Lukasz Luba <lukasz.luba@arm.com>:
>
>
>
> On 4/3/24 12:31, Nikita Travkin via B4 Relay wrote:
> > From: Nikita Travkin <nikita@trvn.ru>
> >
> > IPA warns if the thermal zone it was attached to doesn't define
> > sustainable_power value. In some cases though IPA may be bound to an
> > "empty" TZ, in which case the lack of sustainable_power doesn't matter.
> >
> > Suppress the warning in case when IPA is bound to an empty TZ to make it
> > easier to see the warnings that actually matter.
> >
> > Signed-off-by: Nikita Travkin <nikita@trvn.ru>
> > ---
> >
> > I've decided to add this along to supress those warnings for some TZ on
> > sc7180. Feel free to drop this patch if you think the warning should
> > always appear.
>
> That warning should stay, since in the development or integration phase
> quite a lot of stuff is missing. This will warn that there is an issue.
> The case with 'empty' TZ is an exception only to 'work' with IPA.
>

Yes, that's understandable, though by suppressing those I could
actually see the few actual warnings for TZ with cooling devices
and no value, which I couldn't see before because it looked like
"all of them" have the warning.

In any case, as I said, I'm fine with this not being applied :)

Thanks for your review!
Nikita

> Thanks for the patches!
>
> Regards,
> Lukasz
>
>
> > ---
> >   drivers/thermal/gov_power_allocator.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
> > index e25e48d76aa7..05a40f6b5928 100644
> > --- a/drivers/thermal/gov_power_allocator.c
> > +++ b/drivers/thermal/gov_power_allocator.c
> > @@ -704,7 +704,7 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
> >               params->allocated_tzp = true;
> >       }
> >
> > -     if (!tz->tzp->sustainable_power)
> > +     if (!tz->tzp->sustainable_power && params->trip_max)
> >               dev_warn(&tz->device, "power_allocator: sustainable_power will be estimated\n");
> >       else
> >               params->sustainable_power = tz->tzp->sustainable_power;
> >

^ permalink raw reply

* Re: [PATCH v2 3/3] thermal: gov_power_allocator: Suppress sustainable_power warning without trip_points
From: Lukasz Luba @ 2024-04-03 12:52 UTC (permalink / raw)
  To: Nikita Travkin
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, linux-pm, Zhang Rui,
	Daniel Lezcano, linux-kernel
In-Reply-To: <20240403-gpa-no-cooling-devs-v2-3-79bdd8439449@trvn.ru>



On 4/3/24 12:31, Nikita Travkin via B4 Relay wrote:
> From: Nikita Travkin <nikita@trvn.ru>
> 
> IPA warns if the thermal zone it was attached to doesn't define
> sustainable_power value. In some cases though IPA may be bound to an
> "empty" TZ, in which case the lack of sustainable_power doesn't matter.
> 
> Suppress the warning in case when IPA is bound to an empty TZ to make it
> easier to see the warnings that actually matter.
> 
> Signed-off-by: Nikita Travkin <nikita@trvn.ru>
> ---
> 
> I've decided to add this along to supress those warnings for some TZ on
> sc7180. Feel free to drop this patch if you think the warning should
> always appear.

That warning should stay, since in the development or integration phase
quite a lot of stuff is missing. This will warn that there is an issue.
The case with 'empty' TZ is an exception only to 'work' with IPA.

Thanks for the patches!

Regards,
Lukasz


> ---
>   drivers/thermal/gov_power_allocator.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
> index e25e48d76aa7..05a40f6b5928 100644
> --- a/drivers/thermal/gov_power_allocator.c
> +++ b/drivers/thermal/gov_power_allocator.c
> @@ -704,7 +704,7 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
>   		params->allocated_tzp = true;
>   	}
>   
> -	if (!tz->tzp->sustainable_power)
> +	if (!tz->tzp->sustainable_power && params->trip_max)
>   		dev_warn(&tz->device, "power_allocator: sustainable_power will be estimated\n");
>   	else
>   		params->sustainable_power = tz->tzp->sustainable_power;
> 

^ permalink raw reply

* Re: [PATCH v2 2/3] thermal: gov_power_allocator: Allow binding without trip points
From: Lukasz Luba @ 2024-04-03 12:48 UTC (permalink / raw)
  To: Nikita Travkin
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Daniel Lezcano, linux-pm,
	linux-kernel, Zhang Rui
In-Reply-To: <20240403-gpa-no-cooling-devs-v2-2-79bdd8439449@trvn.ru>



On 4/3/24 12:31, Nikita Travkin via B4 Relay wrote:
> From: Nikita Travkin <nikita@trvn.ru>
> 
> IPA probe function was recently refactored to perform extra error checks
> and make sure the thermal zone has trip points necessary for the IPA
> operation. With this change, if a thermal zone is probed such that it
> has no trip points that IPA can use, IPA will fail and the TZ won't be
> created. This is the case if a platform defines a TZ without cooling
> devices and only with "hot"/"critical" trip points, often found on some
> Qualcomm devices [1].
> 
> Documentation across IPA code (notably get_governor_trips() kerneldoc)
> suggests that IPA is supposed to handle such TZ even if it won't
> actually do anything.
> 
> This commit partially reverts the previous change to allow IPA to bind
> to such "empty" thermal zones.
> 
> [1] arch/arm64/boot/dts/qcom/sc7180.dtsi#n4776
> 
> Fixes: e83747c2f8e3 ("thermal: gov_power_allocator: Set up trip points earlier")
> Signed-off-by: Nikita Travkin <nikita@trvn.ru>
> ---
>   drivers/thermal/gov_power_allocator.c | 12 ++++--------
>   1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
> index ec637071ef1f..e25e48d76aa7 100644
> --- a/drivers/thermal/gov_power_allocator.c
> +++ b/drivers/thermal/gov_power_allocator.c
> @@ -679,11 +679,6 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
>   		return -ENOMEM;
>   
>   	get_governor_trips(tz, params);
> -	if (!params->trip_max) {
> -		dev_warn(&tz->device, "power_allocator: missing trip_max\n");
> -		kfree(params);
> -		return -EINVAL;
> -	}
>   
>   	ret = check_power_actors(tz, params);
>   	if (ret < 0) {
> @@ -714,9 +709,10 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
>   	else
>   		params->sustainable_power = tz->tzp->sustainable_power;
>   
> -	estimate_pid_constants(tz, tz->tzp->sustainable_power,
> -			       params->trip_switch_on,
> -			       params->trip_max->temperature);
> +	if (params->trip_max)
> +		estimate_pid_constants(tz, tz->tzp->sustainable_power,
> +				       params->trip_switch_on,
> +				       params->trip_max->temperature);
>   
>   	reset_pid_controller(params);
>   
> 

LGTM

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply

* Re: [PATCH v2 1/3] thermal: gov_power_allocator: Allow binding without cooling devices
From: Lukasz Luba @ 2024-04-03 12:43 UTC (permalink / raw)
  To: Nikita Travkin
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, linux-pm, Daniel Lezcano,
	linux-kernel, Zhang Rui
In-Reply-To: <20240403-gpa-no-cooling-devs-v2-1-79bdd8439449@trvn.ru>



On 4/3/24 12:31, Nikita Travkin via B4 Relay wrote:
> From: Nikita Travkin <nikita@trvn.ru>
> 
> IPA was recently refactored to split out memory allocation into a
> separate funciton. That funciton was made to return -EINVAL if there is
> zero power_actors and thus no memory to allocate. This causes IPA to
> fail probing when the thermal zone has no attached cooling devices.
> 
> Since cooling devices can attach after the thermal zone is created and
> the governer is attached to it, failing probe due to the lack of cooling
> devices is incorrect.
> 
> Change the allocate_actors_buffer() to return success when there is no
> cooling devices present.
> 
> Fixes: 912e97c67cc3 ("thermal: gov_power_allocator: Move memory allocation out of throttle()")
> Signed-off-by: Nikita Travkin <nikita@trvn.ru>
> ---
>   drivers/thermal/gov_power_allocator.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
> index 1b17dc4c219c..ec637071ef1f 100644
> --- a/drivers/thermal/gov_power_allocator.c
> +++ b/drivers/thermal/gov_power_allocator.c
> @@ -606,7 +606,7 @@ static int allocate_actors_buffer(struct power_allocator_params *params,
>   
>   	/* There might be no cooling devices yet. */
>   	if (!num_actors) {
> -		ret = -EINVAL;
> +		ret = 0;
>   		goto clean_state;
>   	}
>   
> 

LGTM

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply

* Re: [PATCH v4 0/4] Update Energy Model after chip binning adjusted voltages
From: Lukasz Luba @ 2024-04-03 12:36 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-arm-kernel, sboyd, rafael, linux-pm, nm, linux-samsung-soc,
	daniel.lezcano, viresh.kumar, krzysztof.kozlowski, alim.akhtar,
	m.szyprowski, mhiramat, linux-kernel
In-Reply-To: <045fa6db-4f76-46aa-85ba-c9e698c7e390@arm.com>

Hi Dietmar,

On 4/3/24 13:07, Dietmar Eggemann wrote:
> On 02/04/2024 17:58, Lukasz Luba wrote:
>> Hi all,
>>
>> This is a follow-up patch aiming to add EM modification due to chip binning.
>> The first RFC and the discussion can be found here [1].
>>
>> It uses Exynos chip driver code as a 1st user. The EM framework has been
>> extended to handle this use case easily, when the voltage has been changed
>> after setup. On my Odroid-xu4 in some OPPs I can observe ~20% power difference.
>> According to that data in driver tables it could be up to ~29%.
>>
>> This chip binning is applicable to a lot of SoCs, so the EM framework should
>> make it easy to update. It uses the existing OPP and DT information to
>> re-calculate the new power values.
>>
>> It has dependency on Exynos SoC driver tree.
>>
>> Changes:
>> v4:
>> - added asterisk in the comment section (test robot)
>> - change the patch 2/4 header name and use 'Refactor'
>> v3:
>> - updated header description patch 2/4 (Dietmar)
>> - removed 2 sentences from comment and adjusted in patch 3/4 (Dietmar)
>> - patch 4/4 re-phrased code comment (Dietmar)
>> - collected tags (Krzysztof, Viresh)
>> v2:
>> - removed 'ret' from error message which wasn't initialized (Christian)
>> v1:
>> - exported the OPP calculation function from the OPP/OF so it can be
>>    used from EM fwk (Viresh)
>> - refactored EM updating function to re-use common code
>> - added new EM function which can be used by chip device drivers which
>>    modify the voltage in OPPs
>> RFC is at [1]
>>
>> Regards,
>> Lukasz Luba
>>
>> [1] https://lore.kernel.org/lkml/20231220110339.1065505-1-lukasz.luba@arm.com/
>>
>> Lukasz Luba (4):
>>    OPP: OF: Export dev_opp_pm_calc_power() for usage from EM
>>    PM: EM: Refactor em_adjust_new_capacity()
>>    PM: EM: Add em_dev_update_chip_binning()
>>    soc: samsung: exynos-asv: Update Energy Model after adjusting voltage
>>
>>   drivers/opp/of.c                 |  17 +++--
>>   drivers/soc/samsung/exynos-asv.c |  11 +++-
>>   include/linux/energy_model.h     |   5 ++
>>   include/linux/pm_opp.h           |   8 +++
>>   kernel/power/energy_model.c      | 106 +++++++++++++++++++++++++------
>>   5 files changed, 122 insertions(+), 25 deletions(-)
> 
> LGTM.
> 
> Just two very minor things which I mentioned in the individual patches.
> 
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> 
> 
> 

Thank you for the review. I will send the v5 with those.

Regards,
Lukasz

^ permalink raw reply

* Re: [PATCH v4 4/4] soc: samsung: exynos-asv: Update Energy Model after adjusting voltage
From: Dietmar Eggemann @ 2024-04-03 12:07 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: linux-arm-kernel, sboyd, nm, linux-samsung-soc, daniel.lezcano,
	viresh.kumar, krzysztof.kozlowski, alim.akhtar, m.szyprowski,
	mhiramat
In-Reply-To: <20240402155822.505491-5-lukasz.luba@arm.com>

On 02/04/2024 17:58, Lukasz Luba wrote:

[...]

> @@ -97,9 +98,17 @@ static int exynos_asv_update_opps(struct exynos_asv *asv)
>  			last_opp_table = opp_table;
>  
>  			ret = exynos_asv_update_cpu_opps(asv, cpu);
> -			if (ret < 0)
> +			if (!ret) {
> +				/*
> +				 * When the voltage for OPPs could be changed,
> +				 * make sure to update the EM power values, to
> +				 * reflect the reality and not use stale data.
> +				 */

Maybe shorter?

                                /*
                                 * Update EM power values since OPP
                                 * voltage values may have changed.
                                 */

[...]

^ permalink raw reply

* Re: [PATCH v4 1/4] OPP: OF: Export dev_opp_pm_calc_power() for usage from EM
From: Dietmar Eggemann @ 2024-04-03 12:07 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: linux-arm-kernel, sboyd, nm, linux-samsung-soc, daniel.lezcano,
	viresh.kumar, krzysztof.kozlowski, alim.akhtar, m.szyprowski,
	mhiramat
In-Reply-To: <20240402155822.505491-2-lukasz.luba@arm.com>

On 02/04/2024 17:58, Lukasz Luba wrote:

[...]

> @@ -539,6 +541,12 @@ static inline void dev_pm_opp_of_unregister_em(struct device *dev)
>  {
>  }
>  
> +static inline int dev_pm_opp_calc_power(struct device *dev, unsigned long *uW,
> +			  unsigned long *kHz)

This looks like a weird alignment comparing to the adjacent functions.

[...]

^ permalink raw reply

* Re: [PATCH v4 0/4] Update Energy Model after chip binning adjusted voltages
From: Dietmar Eggemann @ 2024-04-03 12:07 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm, rafael
  Cc: linux-arm-kernel, sboyd, nm, linux-samsung-soc, daniel.lezcano,
	viresh.kumar, krzysztof.kozlowski, alim.akhtar, m.szyprowski,
	mhiramat
In-Reply-To: <20240402155822.505491-1-lukasz.luba@arm.com>

On 02/04/2024 17:58, Lukasz Luba wrote:
> Hi all,
> 
> This is a follow-up patch aiming to add EM modification due to chip binning.
> The first RFC and the discussion can be found here [1].
> 
> It uses Exynos chip driver code as a 1st user. The EM framework has been
> extended to handle this use case easily, when the voltage has been changed
> after setup. On my Odroid-xu4 in some OPPs I can observe ~20% power difference.
> According to that data in driver tables it could be up to ~29%.
> 
> This chip binning is applicable to a lot of SoCs, so the EM framework should
> make it easy to update. It uses the existing OPP and DT information to
> re-calculate the new power values.
> 
> It has dependency on Exynos SoC driver tree.
> 
> Changes:
> v4:
> - added asterisk in the comment section (test robot)
> - change the patch 2/4 header name and use 'Refactor'
> v3:
> - updated header description patch 2/4 (Dietmar)
> - removed 2 sentences from comment and adjusted in patch 3/4 (Dietmar)
> - patch 4/4 re-phrased code comment (Dietmar)
> - collected tags (Krzysztof, Viresh)
> v2:
> - removed 'ret' from error message which wasn't initialized (Christian)
> v1:
> - exported the OPP calculation function from the OPP/OF so it can be
>   used from EM fwk (Viresh)
> - refactored EM updating function to re-use common code
> - added new EM function which can be used by chip device drivers which
>   modify the voltage in OPPs
> RFC is at [1]
> 
> Regards,
> Lukasz Luba
> 
> [1] https://lore.kernel.org/lkml/20231220110339.1065505-1-lukasz.luba@arm.com/
> 
> Lukasz Luba (4):
>   OPP: OF: Export dev_opp_pm_calc_power() for usage from EM
>   PM: EM: Refactor em_adjust_new_capacity()
>   PM: EM: Add em_dev_update_chip_binning()
>   soc: samsung: exynos-asv: Update Energy Model after adjusting voltage
> 
>  drivers/opp/of.c                 |  17 +++--
>  drivers/soc/samsung/exynos-asv.c |  11 +++-
>  include/linux/energy_model.h     |   5 ++
>  include/linux/pm_opp.h           |   8 +++
>  kernel/power/energy_model.c      | 106 +++++++++++++++++++++++++------
>  5 files changed, 122 insertions(+), 25 deletions(-)

LGTM.

Just two very minor things which I mentioned in the individual patches.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>




^ permalink raw reply

* Re: Re: [PATCH 1/2] cgroup/cpuset: Make cpuset hotplug processing synchronous
From: Michal Koutný @ 2024-04-03 12:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Thomas Gleixner,
	Peter Zijlstra, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Shuah Khan, linux-kernel, cgroups, linux-pm, linux-kselftest,
	Frederic Weisbecker, Paul E. McKenney, Ingo Molnar,
	Valentin Schneider, Anna-Maria Behnsen, Alex Shi, Vincent Guittot,
	Barry Song
In-Reply-To: <548efd52-e45f-41fa-a477-bc5112d7b00c@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3683 bytes --]

On Tue, Apr 02, 2024 at 11:30:11AM -0400, Waiman Long <longman@redhat.com> wrote:
> Yes, there is a potential that a cpus_read_lock() may be called leading to
> deadlock. So unless we reverse the current cgroup_mutex --> cpu_hotplug_lock
> ordering, it is not safe to call cgroup_transfer_tasks() directly.

I see that cgroup_transfer_tasks() has the only user -- cpuset. What
about bending it for the specific use like:

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 34aaf0e87def..64deb7212c5c 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -109,7 +109,7 @@ struct cgroup *cgroup_get_from_fd(int fd);
 struct cgroup *cgroup_v1v2_get_from_fd(int fd);
 
 int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
-int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from);
+int cgroup_transfer_tasks_locked(struct cgroup *to, struct cgroup *from);
 
 int cgroup_add_dfl_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
 int cgroup_add_legacy_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 520a11cb12f4..f97025858c7a 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -91,7 +91,8 @@ EXPORT_SYMBOL_GPL(cgroup_attach_task_all);
  *
  * Return: %0 on success or a negative errno code on failure
  */
-int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
+int cgroup_transfer_tasks_locked(struct cgroup *to, struct cgroup *from)
 {
 	DEFINE_CGROUP_MGCTX(mgctx);
 	struct cgrp_cset_link *link;
@@ -106,9 +106,11 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
 	if (ret)
 		return ret;
 
-	cgroup_lock();
-
-	cgroup_attach_lock(true);
+	/* The locking rules serve specific purpose of v1 cpuset hotplug
+	 * migration, see hotplug_update_tasks_legacy() and
+	 * cgroup_attach_lock() */
+	lockdep_assert_held(&cgroup_mutex);
+	lockdep_assert_cpus_held();
+	percpu_down_write(&cgroup_threadgroup_rwsem);
 
 	/* all tasks in @from are being moved, all csets are source */
 	spin_lock_irq(&css_set_lock);
@@ -144,8 +146,7 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
 	} while (task && !ret);
 out_err:
 	cgroup_migrate_finish(&mgctx);
-	cgroup_attach_unlock(true);
-	cgroup_unlock();
+	percpu_up_write(&cgroup_threadgroup_rwsem);
 	return ret;
 }
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 13d27b17c889..94fb8b26f038 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -4331,7 +4331,7 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
 			nodes_empty(parent->mems_allowed))
 		parent = parent_cs(parent);
 
-	if (cgroup_transfer_tasks(parent->css.cgroup, cs->css.cgroup)) {
+	if (cgroup_transfer_tasks_locked(parent->css.cgroup, cs->css.cgroup)) {
 		pr_err("cpuset: failed to transfer tasks out of empty cpuset ");
 		pr_cont_cgroup_name(cs->css.cgroup);
 		pr_cont("\n");
@@ -4376,21 +4376,9 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
 
 	/*
 	 * Move tasks to the nearest ancestor with execution resources,
-	 * This is full cgroup operation which will also call back into
-	 * cpuset. Execute it asynchronously using workqueue.
 	 */
-	if (is_empty && css_tryget_online(&cs->css)) {
-		struct cpuset_remove_tasks_struct *s;
-
-		s = kzalloc(sizeof(*s), GFP_KERNEL);
-		if (WARN_ON_ONCE(!s)) {
-			css_put(&cs->css);
-			return;
-		}
-
-		s->cs = cs;
-		INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
-		schedule_work(&s->work);
+	if (is_empty)
+		remove_tasks_in_empty_cpuset(cs);
 	}
 }
 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply related

* [PATCH v2 2/3] thermal: gov_power_allocator: Allow binding without trip points
From: Nikita Travkin via B4 Relay @ 2024-04-03 11:31 UTC (permalink / raw)
  To: Lukasz Luba, Rafael J. Wysocki, Daniel Lezcano, Zhang Rui
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Nikita Travkin,
	Nikita Travkin
In-Reply-To: <20240403-gpa-no-cooling-devs-v2-0-79bdd8439449@trvn.ru>

From: Nikita Travkin <nikita@trvn.ru>

IPA probe function was recently refactored to perform extra error checks
and make sure the thermal zone has trip points necessary for the IPA
operation. With this change, if a thermal zone is probed such that it
has no trip points that IPA can use, IPA will fail and the TZ won't be
created. This is the case if a platform defines a TZ without cooling
devices and only with "hot"/"critical" trip points, often found on some
Qualcomm devices [1].

Documentation across IPA code (notably get_governor_trips() kerneldoc)
suggests that IPA is supposed to handle such TZ even if it won't
actually do anything.

This commit partially reverts the previous change to allow IPA to bind
to such "empty" thermal zones.

[1] arch/arm64/boot/dts/qcom/sc7180.dtsi#n4776

Fixes: e83747c2f8e3 ("thermal: gov_power_allocator: Set up trip points earlier")
Signed-off-by: Nikita Travkin <nikita@trvn.ru>
---
 drivers/thermal/gov_power_allocator.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index ec637071ef1f..e25e48d76aa7 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -679,11 +679,6 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
 		return -ENOMEM;
 
 	get_governor_trips(tz, params);
-	if (!params->trip_max) {
-		dev_warn(&tz->device, "power_allocator: missing trip_max\n");
-		kfree(params);
-		return -EINVAL;
-	}
 
 	ret = check_power_actors(tz, params);
 	if (ret < 0) {
@@ -714,9 +709,10 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
 	else
 		params->sustainable_power = tz->tzp->sustainable_power;
 
-	estimate_pid_constants(tz, tz->tzp->sustainable_power,
-			       params->trip_switch_on,
-			       params->trip_max->temperature);
+	if (params->trip_max)
+		estimate_pid_constants(tz, tz->tzp->sustainable_power,
+				       params->trip_switch_on,
+				       params->trip_max->temperature);
 
 	reset_pid_controller(params);
 

-- 
2.44.0



^ permalink raw reply related

* [PATCH v2 3/3] thermal: gov_power_allocator: Suppress sustainable_power warning without trip_points
From: Nikita Travkin via B4 Relay @ 2024-04-03 11:31 UTC (permalink / raw)
  To: Lukasz Luba, Rafael J. Wysocki, Daniel Lezcano, Zhang Rui
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Nikita Travkin,
	Nikita Travkin
In-Reply-To: <20240403-gpa-no-cooling-devs-v2-0-79bdd8439449@trvn.ru>

From: Nikita Travkin <nikita@trvn.ru>

IPA warns if the thermal zone it was attached to doesn't define
sustainable_power value. In some cases though IPA may be bound to an
"empty" TZ, in which case the lack of sustainable_power doesn't matter.

Suppress the warning in case when IPA is bound to an empty TZ to make it
easier to see the warnings that actually matter.

Signed-off-by: Nikita Travkin <nikita@trvn.ru>
---

I've decided to add this along to supress those warnings for some TZ on
sc7180. Feel free to drop this patch if you think the warning should
always appear.
---
 drivers/thermal/gov_power_allocator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index e25e48d76aa7..05a40f6b5928 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -704,7 +704,7 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
 		params->allocated_tzp = true;
 	}
 
-	if (!tz->tzp->sustainable_power)
+	if (!tz->tzp->sustainable_power && params->trip_max)
 		dev_warn(&tz->device, "power_allocator: sustainable_power will be estimated\n");
 	else
 		params->sustainable_power = tz->tzp->sustainable_power;

-- 
2.44.0



^ permalink raw reply related

* [PATCH v2 0/3] gov_power_allocator: Allow binding before cooling devices
From: Nikita Travkin via B4 Relay @ 2024-04-03 11:31 UTC (permalink / raw)
  To: Lukasz Luba, Rafael J. Wysocki, Daniel Lezcano, Zhang Rui
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Nikita Travkin,
	Nikita Travkin

Recent changes in IPA made it fail probing if the TZ has no cooling
devices attached on probe or no trip points defined.

This series restores prior behavior to:

- allow IPA to probe before cooling devices have attached;
- allow IPA to probe when the TZ has no passive/active trip points.

I've noticed that all thermal zones fail probing with -EINVAL on my
sc7180 based Acer Aspire 1 since 6.8. This series allows me to bring
them back.

Additionally there is a commit that supresses the "sustainable_power
will be estimated" warning on TZ that have no trip points (and thus IPA
will not be able to do anything for them anyway). This allowed me to
notice that some of the TZ with cooling_devices on my platform actually
lack the sustainable_power value.

Signed-off-by: Nikita Travkin <nikita@trvn.ru>
---
Changes in v2:
- Split to two changes (Lukasz)
- Return 0 in allocate_actors_buffer() instead of suppressing -EINVAL
  (Lukasz)
- Add a change to supress "sustainable_power will be estimated" warning
  on "empty" TZ
- Link to v1: https://lore.kernel.org/r/20240321-gpa-no-cooling-devs-v1-1-5c9e0ef2062e@trvn.ru

---
Nikita Travkin (3):
      thermal: gov_power_allocator: Allow binding without cooling devices
      thermal: gov_power_allocator: Allow binding without trip points
      thermal: gov_power_allocator: Suppress sustainable_power warning without trip_points

 drivers/thermal/gov_power_allocator.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)
---
base-commit: 727900b675b749c40ba1f6669c7ae5eb7eb8e837
change-id: 20240321-gpa-no-cooling-devs-c79ee3288325

Best regards,
-- 
Nikita Travkin <nikita@trvn.ru>

^ permalink raw reply

* [PATCH v2 1/3] thermal: gov_power_allocator: Allow binding without cooling devices
From: Nikita Travkin via B4 Relay @ 2024-04-03 11:31 UTC (permalink / raw)
  To: Lukasz Luba, Rafael J. Wysocki, Daniel Lezcano, Zhang Rui
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Nikita Travkin,
	Nikita Travkin
In-Reply-To: <20240403-gpa-no-cooling-devs-v2-0-79bdd8439449@trvn.ru>

From: Nikita Travkin <nikita@trvn.ru>

IPA was recently refactored to split out memory allocation into a
separate funciton. That funciton was made to return -EINVAL if there is
zero power_actors and thus no memory to allocate. This causes IPA to
fail probing when the thermal zone has no attached cooling devices.

Since cooling devices can attach after the thermal zone is created and
the governer is attached to it, failing probe due to the lack of cooling
devices is incorrect.

Change the allocate_actors_buffer() to return success when there is no
cooling devices present.

Fixes: 912e97c67cc3 ("thermal: gov_power_allocator: Move memory allocation out of throttle()")
Signed-off-by: Nikita Travkin <nikita@trvn.ru>
---
 drivers/thermal/gov_power_allocator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 1b17dc4c219c..ec637071ef1f 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -606,7 +606,7 @@ static int allocate_actors_buffer(struct power_allocator_params *params,

 	/* There might be no cooling devices yet. */
 	if (!num_actors) {
-		ret = -EINVAL;
+		ret = 0;
 		goto clean_state;
 	}

-- 
2.44.0

^ permalink raw reply related

* Re: [PATCH v7 3/5] clk: qcom: common: Add interconnect clocks support
From: Dmitry Baryshkov @ 2024-04-03 11:06 UTC (permalink / raw)
  To: Varadarajan Narayanan
  Cc: andersson, konrad.dybcio, mturquette, sboyd, robh, krzk+dt,
	conor+dt, djakov, quic_anusha, linux-arm-msm, linux-clk,
	devicetree, linux-kernel, linux-pm
In-Reply-To: <20240403104220.1092431-4-quic_varada@quicinc.com>

On Wed, 3 Apr 2024 at 13:42, Varadarajan Narayanan
<quic_varada@quicinc.com> wrote:
>
> Unlike MSM platforms that manage NoC related clocks and scaling
> from RPM, IPQ SoCs dont involve RPM in managing NoC related
> clocks and there is no NoC scaling.
>
> However, there is a requirement to enable some NoC interface
> clocks for accessing the peripheral controllers present on
> these NoCs. Though exposing these as normal clocks would work,
> having a minimalistic interconnect driver to handle these clocks
> would make it consistent with other Qualcomm platforms resulting
> in common code paths. This is similar to msm8996-cbf's usage of
> icc-clk framework.
>
> Signed-off-by: Varadarajan Narayanan <quic_varada@quicinc.com>
> ---
> v7: Restore clk_get
> v6: first_id -> icc_first_node_id
>     Remove clock get so that the peripheral that uses the clock
>     can do the clock get
> v5: Split changes in common.c to separate patch
>     Fix error handling
>     Use devm_icc_clk_register instead of icc_clk_register
> v4: Use clk_hw instead of indices
>     Do icc register in qcom_cc_probe() call stream
>     Add icc clock info to qcom_cc_desc structure
> v3: Use indexed identifiers here to avoid confusion
>     Fix error messages and move to common.c
> v2: Move DTS to separate patch
>     Update commit log
>     Auto select CONFIG_INTERCONNECT & CONFIG_INTERCONNECT_CLK to fix build error
> ---
>  drivers/clk/qcom/common.c | 31 ++++++++++++++++++++++++++++++-
>  drivers/clk/qcom/common.h |  3 +++
>  2 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/clk/qcom/common.c b/drivers/clk/qcom/common.c
> index 8b6080eb43a7..fa4ec89c04c4 100644
> --- a/drivers/clk/qcom/common.c
> +++ b/drivers/clk/qcom/common.c
> @@ -8,6 +8,7 @@
>  #include <linux/regmap.h>
>  #include <linux/platform_device.h>
>  #include <linux/clk-provider.h>
> +#include <linux/interconnect-clk.h>
>  #include <linux/reset-controller.h>
>  #include <linux/of.h>
>
> @@ -252,6 +253,34 @@ static struct clk_hw *qcom_cc_clk_hw_get(struct of_phandle_args *clkspec,
>         return cc->rclks[idx] ? &cc->rclks[idx]->hw : NULL;
>  }
>
> +static int qcom_cc_icc_register(struct device *dev,
> +                               const struct qcom_cc_desc *desc)
> +{
> +       struct icc_clk_data *icd;
> +       int i;
> +
> +       if (!IS_ENABLED(CONFIG_INTERCONNECT_CLK))
> +               return 0;
> +
> +       if (!desc->icc_hws)
> +               return 0;
> +
> +       icd = devm_kcalloc(dev, desc->num_icc_hws, sizeof(*icd), GFP_KERNEL);
> +       if (!icd)
> +               return -ENOMEM;
> +
> +       for (i = 0; i < desc->num_icc_hws; i++) {
> +               icd[i].clk = devm_clk_hw_get_clk(dev, desc->icc_hws[i], "icc");
> +               if (!icd[i].clk)
> +                       return dev_err_probe(dev, -ENOENT,
> +                                            "(%d) clock entry is null\n", i);
> +               icd[i].name = clk_hw_get_name(desc->icc_hws[i]);
> +       }
> +
> +       return devm_icc_clk_register(dev, desc->icc_first_node_id,
> +                                                    desc->num_icc_hws, icd);
> +}
> +
>  int qcom_cc_really_probe(struct platform_device *pdev,
>                          const struct qcom_cc_desc *desc, struct regmap *regmap)
>  {
> @@ -327,7 +356,7 @@ int _qcom_cc_really_probe(struct device *dev,
>         if (ret)
>                 return ret;
>
> -       return 0;
> +       return qcom_cc_icc_register(dev, desc);
>  }
>  EXPORT_SYMBOL_GPL(_qcom_cc_really_probe);
>
> diff --git a/drivers/clk/qcom/common.h b/drivers/clk/qcom/common.h
> index 8657257d56d3..43073d2ef32a 100644
> --- a/drivers/clk/qcom/common.h
> +++ b/drivers/clk/qcom/common.h
> @@ -29,6 +29,9 @@ struct qcom_cc_desc {
>         size_t num_gdscs;
>         struct clk_hw **clk_hws;
>         size_t num_clk_hws;
> +       struct clk_hw **icc_hws;

Still we are passing hws here. We already have all the hws in a
different array. Can we just pass the indices?

> +       size_t num_icc_hws;
> +       unsigned int icc_first_node_id;
>  };
>
>  /**
> --
> 2.34.1
>


-- 
With best wishes
Dmitry

^ permalink raw reply

* Re: [PATCH v7 2/5] interconnect: icc-clk: Add devm_icc_clk_register
From: Dmitry Baryshkov @ 2024-04-03 11:04 UTC (permalink / raw)
  To: Varadarajan Narayanan
  Cc: andersson, konrad.dybcio, mturquette, sboyd, robh, krzk+dt,
	conor+dt, djakov, quic_anusha, linux-arm-msm, linux-clk,
	devicetree, linux-kernel, linux-pm
In-Reply-To: <20240403104220.1092431-3-quic_varada@quicinc.com>

On Wed, 3 Apr 2024 at 13:42, Varadarajan Narayanan
<quic_varada@quicinc.com> wrote:
>
> Wrap icc_clk_register to create devm_icc_clk_register to be
> able to release the resources properly.
>
> Signed-off-by: Varadarajan Narayanan <quic_varada@quicinc.com>
> ---
> v7: Simplify devm_icc_clk_register implementation as suggested in review
> v5: Introduced devm_icc_clk_register
> ---
>  drivers/interconnect/icc-clk.c   | 18 ++++++++++++++++++
>  include/linux/interconnect-clk.h |  2 ++
>  2 files changed, 20 insertions(+)

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

-- 
With best wishes
Dmitry

^ permalink raw reply

* Re: [PATCH v6 3/6] interconnect: icc-clk: Add devm_icc_clk_register
From: Varadarajan Narayanan @ 2024-04-03 10:45 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: andersson, konrad.dybcio, mturquette, sboyd, robh,
	krzysztof.kozlowski+dt, conor+dt, djakov, quic_anusha,
	linux-arm-msm, linux-clk, devicetree, linux-kernel, linux-pm
In-Reply-To: <CAA8EJprN3TuMF-v5PeFW_JUKk+a+MxB7poccZbi9biZNniRnTQ@mail.gmail.com>

On Tue, Apr 02, 2024 at 03:12:04PM +0300, Dmitry Baryshkov wrote:
> On Tue, 2 Apr 2024 at 14:23, Varadarajan Narayanan
> <quic_varada@quicinc.com> wrote:
> >
> > On Tue, Apr 02, 2024 at 02:16:56PM +0300, Dmitry Baryshkov wrote:
> > > On Tue, 2 Apr 2024 at 14:02, Varadarajan Narayanan
> > > <quic_varada@quicinc.com> wrote:
> > > >
> > > > On Tue, Apr 02, 2024 at 01:48:08PM +0300, Dmitry Baryshkov wrote:
> > > > > On Tue, 2 Apr 2024 at 13:40, Dmitry Baryshkov
> > > > > <dmitry.baryshkov@linaro.org> wrote:
> > > > > >
> > > > > > On Tue, 2 Apr 2024 at 13:34, Varadarajan Narayanan
> > > > > > <quic_varada@quicinc.com> wrote:
> > > > > > >
> > > > > > > Wrap icc_clk_register to create devm_icc_clk_register to be
> > > > > > > able to release the resources properly.
> > > > > > >
> > > > > > > Signed-off-by: Varadarajan Narayanan <quic_varada@quicinc.com>
> > > > > > > ---
> > > > > > > v5: Introduced devm_icc_clk_register
> > > > > > > ---
> > > > > > >  drivers/interconnect/icc-clk.c   | 29 +++++++++++++++++++++++++++++
> > > > > > >  include/linux/interconnect-clk.h |  4 ++++
> > > > > > >  2 files changed, 33 insertions(+)
> > > > > >
> > > > > > Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > > > >
> > > > > Wait. Actually,
> > > > >
> > > > > Unreviewed-by: me
> > > > >
> > > > > Please return int from devm_icc_clk_register instead of returning the pointer.
> > > >
> > > > Wouldn't returning int break the general assumption that
> > > > devm_foo(), returns the same type as foo(). For example
> > > > devm_clk_hw_get_clk and clk_hw_get_clk return struct clk *?
> > >
> > > Not always. The only reason to return icc_provider was to make it
> > > possible to destroy it. With devres-managed function you don't have to
> > > do anything.
> >
> > Ok. Will change as follows
> >
> >         return prov; -> return PTR_ERR_OR_ZERO(prov);
> >
>
> I think the code might become simpler if you first allocate the ICC
> provider and then just 'return devm_add_action_or_reset(dev,
> your_icc_clk_release, provider)'

Have posted v7 incorporating these and other feedback.
Please review.

Thanks
Varada

^ permalink raw reply

* Re: [PATCH v6 1/6] dt-bindings: interconnect: Add Qualcomm IPQ9574 support
From: Varadarajan Narayanan @ 2024-04-03 10:44 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: andersson, konrad.dybcio, mturquette, sboyd, robh,
	krzysztof.kozlowski+dt, conor+dt, djakov, dmitry.baryshkov,
	quic_anusha, linux-arm-msm, linux-clk, devicetree, linux-kernel,
	linux-pm
In-Reply-To: <1a6fccd3-452c-423f-b73e-cef5f9d01633@linaro.org>

On Wed, Apr 03, 2024 at 09:09:15AM +0200, Krzysztof Kozlowski wrote:
> On 02/04/2024 12:34, Varadarajan Narayanan wrote:
> > +#define ICC_NSSNOC_NSSCC	10
> > +#define ICC_NSSNOC_SNOC_0	11
> > +#define ICC_NSSNOC_SNOC_1	12
> > +#define ICC_NSSNOC_PCNOC_1	13
> > +#define ICC_NSSNOC_QOSGEN_REF	14
> > +#define ICC_NSSNOC_TIMEOUT_REF	15
> > +#define ICC_NSSNOC_XO_DCD	16
> > +#define ICC_NSSNOC_ATB		17
> > +#define ICC_MEM_NOC_NSSNOC	18
> > +#define ICC_NSSNOC_MEMNOC	19
> > +#define ICC_NSSNOC_MEM_NOC_1	20
> > +
> > +#define ICC_NSSNOC_PPE		0
> > +#define ICC_NSSNOC_PPE_CFG	1
> > +#define ICC_NSSNOC_NSS_CSR	2
> > +#define ICC_NSSNOC_IMEM_QSB	3
> > +#define ICC_NSSNOC_IMEM_AHB	4
> > +
> > +#define MASTER(x)	((ICC_ ## x) * 2)
> > +#define SLAVE(x)	(MASTER(x) + 1)
>
> You already received comment to make your bindings consistent with other
> Qualcomm bindings. Now you repeat the same mistake.
>
> No, that is neither consistent nor greppble.

Sorry. Have restored the naming and posted v7.
Kindly take a look.

Thanks
Varada

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox