* [PATCH] thermal/core: Fix refcount bugs in __thermal_cooling_device_register()
@ 2022-07-07 6:21 Liang He
2022-07-15 17:14 ` Rafael J. Wysocki
0 siblings, 1 reply; 6+ messages in thread
From: Liang He @ 2022-07-07 6:21 UTC (permalink / raw)
To: rafael, daniel.lezcano, amitk, rui.zhang, linux-pm, windhl
For each new reference of 'device_node', we should increase its
refcount. Otherwise, there will be premature free.
For example, in drivers\thermal\tegra\soctherm.c, the function
soctherm_init_hw_throt_cdev() will use for_each_child_of_node() to
iterate its child device_node which will be then passed into
__thermal_cooling_device_register(). As for_each_xxx OF APIs will
automatically increase and decrease the refcount of 'device_node',
we should use additional of_node_get() to record the new refernece.
NOTE, we should also call the corresponding of_node_put() in fail path
or when the *_unregister() function is called.
Fixes: a116b5d44f14 ("thermal: core: introduce thermal_of_cooling_device_register")
Signed-off-by: Liang He <windhl@126.com>
---
I cannot confirm, in *_unregister, we should put of_node_put() in or
out of the *_lock/*_unlock functions. Please check it carefully.
drivers/thermal/thermal_core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index cdc0552e8c42..c459e2958b7b 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -919,7 +919,7 @@ __thermal_cooling_device_register(struct device_node *np,
mutex_init(&cdev->lock);
INIT_LIST_HEAD(&cdev->thermal_instances);
- cdev->np = np;
+ cdev->np = of_node_get(np);
cdev->ops = ops;
cdev->updated = false;
cdev->device.class = &thermal_class;
@@ -947,6 +947,7 @@ __thermal_cooling_device_register(struct device_node *np,
return cdev;
out_kfree_type:
+ of_node_put(cdev->np);
thermal_cooling_device_destroy_sysfs(cdev);
kfree(cdev->type);
put_device(&cdev->device);
@@ -1111,6 +1112,7 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
mutex_unlock(&thermal_list_lock);
+ of_node_put(cdev->np);
ida_simple_remove(&thermal_cdev_ida, cdev->id);
device_del(&cdev->device);
thermal_cooling_device_destroy_sysfs(cdev);
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] thermal/core: Fix refcount bugs in __thermal_cooling_device_register()
2022-07-07 6:21 [PATCH] thermal/core: Fix refcount bugs in __thermal_cooling_device_register() Liang He
@ 2022-07-15 17:14 ` Rafael J. Wysocki
2022-07-16 2:00 ` Liang He
2022-07-16 22:03 ` Daniel Lezcano
0 siblings, 2 replies; 6+ messages in thread
From: Rafael J. Wysocki @ 2022-07-15 17:14 UTC (permalink / raw)
To: Liang He, Daniel Lezcano
Cc: Rafael J. Wysocki, Amit Kucheria, Zhang, Rui, Linux PM
On Thu, Jul 7, 2022 at 8:21 AM Liang He <windhl@126.com> wrote:
>
> For each new reference of 'device_node', we should increase its
> refcount. Otherwise, there will be premature free.
>
> For example, in drivers\thermal\tegra\soctherm.c, the function
> soctherm_init_hw_throt_cdev() will use for_each_child_of_node() to
> iterate its child device_node which will be then passed into
> __thermal_cooling_device_register(). As for_each_xxx OF APIs will
> automatically increase and decrease the refcount of 'device_node',
> we should use additional of_node_get() to record the new refernece.
reference
>
> NOTE, we should also call the corresponding of_node_put() in fail path
> or when the *_unregister() function is called.
The NOTE in capitals above is somewhat confusing. I would just say
"Accordingly, the corresponding of_node_put() needs to be run in the
error code path and on cooling device unregistration."
>
> Fixes: a116b5d44f14 ("thermal: core: introduce thermal_of_cooling_device_register")
> Signed-off-by: Liang He <windhl@126.com>
> ---
> I cannot confirm, in *_unregister, we should put of_node_put() in or
> out of the *_lock/*_unlock functions. Please check it carefully.
This doesn't matter too much AFAICS.
Please note that the of_node_put() can still "leak" into the critical
section through the "unlock" operation, because the latter is not a
full memory barrier.
Moreover, dropping the reference means that the object in question
won't be used any more by the holder of that reference and it is no
reason I can see why it would be necessary to hold the lock while
doing that.
> drivers/thermal/thermal_core.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index cdc0552e8c42..c459e2958b7b 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -919,7 +919,7 @@ __thermal_cooling_device_register(struct device_node *np,
>
> mutex_init(&cdev->lock);
> INIT_LIST_HEAD(&cdev->thermal_instances);
> - cdev->np = np;
> + cdev->np = of_node_get(np);
> cdev->ops = ops;
> cdev->updated = false;
> cdev->device.class = &thermal_class;
> @@ -947,6 +947,7 @@ __thermal_cooling_device_register(struct device_node *np,
> return cdev;
>
> out_kfree_type:
> + of_node_put(cdev->np);
> thermal_cooling_device_destroy_sysfs(cdev);
> kfree(cdev->type);
> put_device(&cdev->device);
> @@ -1111,6 +1112,7 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
>
> mutex_unlock(&thermal_list_lock);
>
> + of_node_put(cdev->np);
Could this be done right before the
thermal_cooling_device_destroy_sysfs() below? Then the sequence would
be completely analogous to the error code path above.
> ida_simple_remove(&thermal_cdev_ida, cdev->id);
> device_del(&cdev->device);
> thermal_cooling_device_destroy_sysfs(cdev);
> --
Overall, this looks like a genuine fix to me.
Daniel, what do you think?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re:Re: [PATCH] thermal/core: Fix refcount bugs in __thermal_cooling_device_register()
2022-07-15 17:14 ` Rafael J. Wysocki
@ 2022-07-16 2:00 ` Liang He
2022-07-16 22:03 ` Daniel Lezcano
1 sibling, 0 replies; 6+ messages in thread
From: Liang He @ 2022-07-16 2:00 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Daniel Lezcano, Amit Kucheria, Zhang, Rui, Linux PM
At 2022-07-16 01:14:31, "Rafael J. Wysocki" <rafael@kernel.org> wrote:
>On Thu, Jul 7, 2022 at 8:21 AM Liang He <windhl@126.com> wrote:
>>
>> For each new reference of 'device_node', we should increase its
>> refcount. Otherwise, there will be premature free.
>>
>> For example, in drivers\thermal\tegra\soctherm.c, the function
>> soctherm_init_hw_throt_cdev() will use for_each_child_of_node() to
>> iterate its child device_node which will be then passed into
>> __thermal_cooling_device_register(). As for_each_xxx OF APIs will
>> automatically increase and decrease the refcount of 'device_node',
>> we should use additional of_node_get() to record the new refernece.
>
>reference
Thanks!
>
>>
>> NOTE, we should also call the corresponding of_node_put() in fail path
>> or when the *_unregister() function is called.
>
>The NOTE in capitals above is somewhat confusing. I would just say
>"Accordingly, the corresponding of_node_put() needs to be run in the
>error code path and on cooling device unregistration."
>
Thanks, I will change that in new version.
>>
>> Fixes: a116b5d44f14 ("thermal: core: introduce thermal_of_cooling_device_register")
>> Signed-off-by: Liang He <windhl@126.com>
>> ---
>> I cannot confirm, in *_unregister, we should put of_node_put() in or
>> out of the *_lock/*_unlock functions. Please check it carefully.
>
>This doesn't matter too much AFAICS.
>
>Please note that the of_node_put() can still "leak" into the critical
>section through the "unlock" operation, because the latter is not a
>full memory barrier.
>
>Moreover, dropping the reference means that the object in question
>won't be used any more by the holder of that reference and it is no
>reason I can see why it would be necessary to hold the lock while
>doing that.
>
>> drivers/thermal/thermal_core.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
>> index cdc0552e8c42..c459e2958b7b 100644
>> --- a/drivers/thermal/thermal_core.c
>> +++ b/drivers/thermal/thermal_core.c
>> @@ -919,7 +919,7 @@ __thermal_cooling_device_register(struct device_node *np,
>>
>> mutex_init(&cdev->lock);
>> INIT_LIST_HEAD(&cdev->thermal_instances);
>> - cdev->np = np;
>> + cdev->np = of_node_get(np);
>> cdev->ops = ops;
>> cdev->updated = false;
>> cdev->device.class = &thermal_class;
>> @@ -947,6 +947,7 @@ __thermal_cooling_device_register(struct device_node *np,
>> return cdev;
>>
>> out_kfree_type:
>> + of_node_put(cdev->np);
>> thermal_cooling_device_destroy_sysfs(cdev);
>> kfree(cdev->type);
>> put_device(&cdev->device);
>> @@ -1111,6 +1112,7 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
>>
>> mutex_unlock(&thermal_list_lock);
>>
>> + of_node_put(cdev->np);
>
>Could this be done right before the
>thermal_cooling_device_destroy_sysfs() below? Then the sequence would
>be completely analogous to the error code path above.
>
That souds good, I will change it in new version.
>> ida_simple_remove(&thermal_cdev_ida, cdev->id);
>> device_del(&cdev->device);
>> thermal_cooling_device_destroy_sysfs(cdev);
>> --
>
>Overall, this looks like a genuine fix to me.
>
>Daniel, what do you think?
I will also wait Daniel's reponse before I send new version.
Thanks again,
Liang
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] thermal/core: Fix refcount bugs in __thermal_cooling_device_register()
2022-07-15 17:14 ` Rafael J. Wysocki
2022-07-16 2:00 ` Liang He
@ 2022-07-16 22:03 ` Daniel Lezcano
2022-07-17 2:57 ` Liang He
1 sibling, 1 reply; 6+ messages in thread
From: Daniel Lezcano @ 2022-07-16 22:03 UTC (permalink / raw)
To: Rafael J. Wysocki, Liang He; +Cc: Amit Kucheria, Zhang, Rui, Linux PM
On 15/07/2022 19:14, Rafael J. Wysocki wrote:
> On Thu, Jul 7, 2022 at 8:21 AM Liang He <windhl@126.com> wrote:
[ ... ]
>> - cdev->np = np;
>> + cdev->np = of_node_get(np);
>> cdev->ops = ops;
>> cdev->updated = false;
>> cdev->device.class = &thermal_class;
>> @@ -947,6 +947,7 @@ __thermal_cooling_device_register(struct device_node *np,
>> return cdev;
>>
>> out_kfree_type:
>> + of_node_put(cdev->np);
>> thermal_cooling_device_destroy_sysfs(cdev);
>> kfree(cdev->type);
>> put_device(&cdev->device);
>> @@ -1111,6 +1112,7 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
>>
>> mutex_unlock(&thermal_list_lock);
>>
>> + of_node_put(cdev->np);
>
> Could this be done right before the
> thermal_cooling_device_destroy_sysfs() below? Then the sequence would
> be completely analogous to the error code path above.
>
>> ida_simple_remove(&thermal_cdev_ida, cdev->id);
>> device_del(&cdev->device);
>> thermal_cooling_device_destroy_sysfs(cdev);
>> --
>
> Overall, this looks like a genuine fix to me.
>
> Daniel, what do you think?
Yes, the of_node_put() is often missing when there is the for_each_xxx
OF API. But here the cdev->np is only used to compare pointers so used
as an identifier, not de-referenced just comparing the addresses.
It is used to bind a thermal zone with a cooling device:
thermal_of.c: if (tcbp->cooling_device == cdev->np) {
thermal_of.c: if (tcbp->cooling_device == cdev->np) {
That is probably why no refcount was taken on this device node.
Moreover, this will go away with the series reworking the thermal-of I
sent last week
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re:Re: [PATCH] thermal/core: Fix refcount bugs in __thermal_cooling_device_register()
2022-07-16 22:03 ` Daniel Lezcano
@ 2022-07-17 2:57 ` Liang He
2022-07-17 7:06 ` Daniel Lezcano
0 siblings, 1 reply; 6+ messages in thread
From: Liang He @ 2022-07-17 2:57 UTC (permalink / raw)
To: Daniel Lezcano; +Cc: Rafael J. Wysocki, Amit Kucheria, Zhang, Rui, Linux PM
At 2022-07-17 06:03:46, "Daniel Lezcano" <daniel.lezcano@linaro.org> wrote:
>On 15/07/2022 19:14, Rafael J. Wysocki wrote:
>> On Thu, Jul 7, 2022 at 8:21 AM Liang He <windhl@126.com> wrote:
>
>[ ... ]
>
>>> - cdev->np = np;
>>> + cdev->np = of_node_get(np);
>>> cdev->ops = ops;
>>> cdev->updated = false;
>>> cdev->device.class = &thermal_class;
>>> @@ -947,6 +947,7 @@ __thermal_cooling_device_register(struct device_node *np,
>>> return cdev;
>>>
>>> out_kfree_type:
>>> + of_node_put(cdev->np);
>>> thermal_cooling_device_destroy_sysfs(cdev);
>>> kfree(cdev->type);
>>> put_device(&cdev->device);
>>> @@ -1111,6 +1112,7 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
>>>
>>> mutex_unlock(&thermal_list_lock);
>>>
>>> + of_node_put(cdev->np);
>>
>> Could this be done right before the
>> thermal_cooling_device_destroy_sysfs() below? Then the sequence would
>> be completely analogous to the error code path above.
>>
>>> ida_simple_remove(&thermal_cdev_ida, cdev->id);
>>> device_del(&cdev->device);
>>> thermal_cooling_device_destroy_sysfs(cdev);
>>> --
>>
>> Overall, this looks like a genuine fix to me.
>>
>> Daniel, what do you think?
>
>Yes, the of_node_put() is often missing when there is the for_each_xxx
>OF API. But here the cdev->np is only used to compare pointers so used
>as an identifier, not de-referenced just comparing the addresses.
>
Thanks, this is a good lesson that explains when there is no need
to refcount new reference.
So I think there is also no need to patch anything, right?
Thanks again,
Liang
>It is used to bind a thermal zone with a cooling device:
>
>thermal_of.c: if (tcbp->cooling_device == cdev->np) {
>thermal_of.c: if (tcbp->cooling_device == cdev->np) {
>
>That is probably why no refcount was taken on this device node.
>
>Moreover, this will go away with the series reworking the thermal-of I
>sent last week
>
>>
>--
><http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
>Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
><http://twitter.com/#!/linaroorg> Twitter |
><http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] thermal/core: Fix refcount bugs in __thermal_cooling_device_register()
2022-07-17 2:57 ` Liang He
@ 2022-07-17 7:06 ` Daniel Lezcano
0 siblings, 0 replies; 6+ messages in thread
From: Daniel Lezcano @ 2022-07-17 7:06 UTC (permalink / raw)
To: Liang He; +Cc: Rafael J. Wysocki, Amit Kucheria, Zhang, Rui, Linux PM
On 17/07/2022 04:57, Liang He wrote:
[ ... ]
>> Yes, the of_node_put() is often missing when there is the for_each_xxx
>> OF API. But here the cdev->np is only used to compare pointers so used
>> as an identifier, not de-referenced just comparing the addresses.
>
>>
>
> Thanks, this is a good lesson that explains when there is no need
> to refcount new reference.
> So I think there is also no need to patch anything, right?
Right, no need a change for this.
Thanks anyway for tracking down the refcount in the code
[ ... ]
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-07-17 7:06 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-07 6:21 [PATCH] thermal/core: Fix refcount bugs in __thermal_cooling_device_register() Liang He
2022-07-15 17:14 ` Rafael J. Wysocki
2022-07-16 2:00 ` Liang He
2022-07-16 22:03 ` Daniel Lezcano
2022-07-17 2:57 ` Liang He
2022-07-17 7:06 ` Daniel Lezcano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox