linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] interconnect: avoid memory allocation when 'icc_bw_lock' is held
@ 2025-05-29 14:46 ` Gabor Juhos
  2025-05-30  9:16   ` Bryan O'Donoghue
  2025-06-18 12:43   ` Johan Hovold
  0 siblings, 2 replies; 8+ messages in thread
From: Gabor Juhos @ 2025-05-29 14:46 UTC (permalink / raw)
  To: Georgi Djakov, Raviteja Laggyshetty
  Cc: linux-pm, linux-arm-msm, linux-kernel, Gabor Juhos

The 'icc_bw_lock' mutex is introduced in commit af42269c3523
("interconnect: Fix locking for runpm vs reclaim") in order
to decouple serialization of bw aggregation from codepaths
that require memory allocation.

However commit d30f83d278a9 ("interconnect: core: Add dynamic
id allocation support") added a devm_kasprintf() call into a
path protected by the 'icc_bw_lock' which causes this lockdep
warning (at least on the IPQ9574 platform):

    ======================================================
    WARNING: possible circular locking dependency detected
    6.15.0-next-20250529 #0 Not tainted
    ------------------------------------------------------
    swapper/0/1 is trying to acquire lock:
    ffffffc081df57d8 (icc_bw_lock){+.+.}-{4:4}, at: icc_init+0x8/0x108

    but task is already holding lock:
    ffffffc081d7db10 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x28/0x108

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (fs_reclaim){+.+.}-{0:0}:
           fs_reclaim_acquire+0x7c/0xb8
           slab_alloc_node.isra.0+0x48/0x188
           __kmalloc_node_track_caller_noprof+0xa4/0x2b8
           devm_kmalloc+0x5c/0x138
           devm_kvasprintf+0x6c/0xb8
           devm_kasprintf+0x50/0x68
           icc_node_add+0xbc/0x160
           icc_clk_register+0x15c/0x230
           devm_icc_clk_register+0x20/0x90
           qcom_cc_really_probe+0x320/0x338
           nss_cc_ipq9574_probe+0xac/0x1e8
           platform_probe+0x70/0xd0
           really_probe+0xdc/0x3b8
           __driver_probe_device+0x94/0x178
           driver_probe_device+0x48/0xf0
           __driver_attach+0x13c/0x208
           bus_for_each_dev+0x6c/0xb8
           driver_attach+0x2c/0x40
           bus_add_driver+0x100/0x250
           driver_register+0x68/0x138
           __platform_driver_register+0x2c/0x40
           nss_cc_ipq9574_driver_init+0x24/0x38
           do_one_initcall+0x88/0x340
           kernel_init_freeable+0x2ac/0x4f8
           kernel_init+0x28/0x1e8
           ret_from_fork+0x10/0x20

    -> #0 (icc_bw_lock){+.+.}-{4:4}:
           __lock_acquire+0x1348/0x2090
           lock_acquire+0x108/0x2d8
           icc_init+0x50/0x108
           do_one_initcall+0x88/0x340
           kernel_init_freeable+0x2ac/0x4f8
           kernel_init+0x28/0x1e8
           ret_from_fork+0x10/0x20

    other info that might help us debug this:

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(fs_reclaim);
                                   lock(icc_bw_lock);
                                   lock(fs_reclaim);
      lock(icc_bw_lock);

     *** DEADLOCK ***

    1 lock held by swapper/0/1:
     #0: ffffffc081d7db10 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x28/0x108

    stack backtrace:
    CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.15.0-next-20250529 #0 NONE
    Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C7 (DT)
    Call trace:
     show_stack+0x20/0x38 (C)
     dump_stack_lvl+0x90/0xd0
     dump_stack+0x18/0x28
     print_circular_bug+0x334/0x448
     check_noncircular+0x12c/0x140
     __lock_acquire+0x1348/0x2090
     lock_acquire+0x108/0x2d8
     icc_init+0x50/0x108
     do_one_initcall+0x88/0x340
     kernel_init_freeable+0x2ac/0x4f8
     kernel_init+0x28/0x1e8
     ret_from_fork+0x10/0x20

Move the memory allocation part of the code outside of the protected
path to eliminate the warning. Also add a note about why it is moved
to there,

Fixes: d30f83d278a9 ("interconnect: core: Add dynamic id allocation support")
Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
---
 drivers/interconnect/core.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index 1a41e59c77f85a811f78986e98401625f4cadfa3..acdb3b8f1e54942dbb1b71ec2b170b08ad709e6b 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -1023,6 +1023,16 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider)
 		return;
 
 	mutex_lock(&icc_lock);
+
+	if (node->id >= ICC_DYN_ID_START) {
+		/*
+		 * Memory allocation must be done outside of codepaths
+		 * protected by icc_bw_lock.
+		 */
+		node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
+					    node->name, dev_name(provider->dev));
+	}
+
 	mutex_lock(&icc_bw_lock);
 
 	node->provider = provider;
@@ -1038,10 +1048,6 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider)
 	node->avg_bw = node->init_avg;
 	node->peak_bw = node->init_peak;
 
-	if (node->id >= ICC_DYN_ID_START)
-		node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
-					    node->name, dev_name(provider->dev));
-
 	if (node->avg_bw || node->peak_bw) {
 		if (provider->pre_aggregate)
 			provider->pre_aggregate(node);

---
base-commit: 5fed7fe33c2cd7104fc87b7bc699a7be892befa2
change-id: 20250529-icc-bw-lockdep-ed030d892a19

Best regards,
-- 
Gabor Juhos <j4g8y7@gmail.com>


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] interconnect: avoid memory allocation when 'icc_bw_lock' is held
  2025-05-29 14:46 ` [PATCH] interconnect: avoid memory allocation when 'icc_bw_lock' is held Gabor Juhos
@ 2025-05-30  9:16   ` Bryan O'Donoghue
  2025-06-03  9:15     ` Gabor Juhos
  2025-06-18 12:43   ` Johan Hovold
  1 sibling, 1 reply; 8+ messages in thread
From: Bryan O'Donoghue @ 2025-05-30  9:16 UTC (permalink / raw)
  To: Gabor Juhos, Georgi Djakov, Raviteja Laggyshetty
  Cc: linux-pm, linux-arm-msm, linux-kernel

On 29/05/2025 15:46, Gabor Juhos wrote:
> The 'icc_bw_lock' mutex is introduced in commit af42269c3523
> ("interconnect: Fix locking for runpm vs reclaim") in order
> to decouple serialization of bw aggregation from codepaths
> that require memory allocation.
> 
> However commit d30f83d278a9 ("interconnect: core: Add dynamic
> id allocation support") added a devm_kasprintf() call into a
> path protected by the 'icc_bw_lock' which causes this lockdep
> warning (at least on the IPQ9574 platform):

Missing a Fixes tag.

> 
>      ======================================================
>      WARNING: possible circular locking dependency detected
>      6.15.0-next-20250529 #0 Not tainted
>      ------------------------------------------------------
>      swapper/0/1 is trying to acquire lock:
>      ffffffc081df57d8 (icc_bw_lock){+.+.}-{4:4}, at: icc_init+0x8/0x108
> 
>      but task is already holding lock:
>      ffffffc081d7db10 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x28/0x108
> 
>      which lock already depends on the new lock.
> 
>      the existing dependency chain (in reverse order) is:
> 
>      -> #1 (fs_reclaim){+.+.}-{0:0}:
>             fs_reclaim_acquire+0x7c/0xb8
>             slab_alloc_node.isra.0+0x48/0x188
>             __kmalloc_node_track_caller_noprof+0xa4/0x2b8
>             devm_kmalloc+0x5c/0x138
>             devm_kvasprintf+0x6c/0xb8
>             devm_kasprintf+0x50/0x68
>             icc_node_add+0xbc/0x160
>             icc_clk_register+0x15c/0x230
>             devm_icc_clk_register+0x20/0x90
>             qcom_cc_really_probe+0x320/0x338
>             nss_cc_ipq9574_probe+0xac/0x1e8
>             platform_probe+0x70/0xd0
>             really_probe+0xdc/0x3b8
>             __driver_probe_device+0x94/0x178
>             driver_probe_device+0x48/0xf0
>             __driver_attach+0x13c/0x208
>             bus_for_each_dev+0x6c/0xb8
>             driver_attach+0x2c/0x40
>             bus_add_driver+0x100/0x250
>             driver_register+0x68/0x138
>             __platform_driver_register+0x2c/0x40
>             nss_cc_ipq9574_driver_init+0x24/0x38
>             do_one_initcall+0x88/0x340
>             kernel_init_freeable+0x2ac/0x4f8
>             kernel_init+0x28/0x1e8
>             ret_from_fork+0x10/0x20
> 
>      -> #0 (icc_bw_lock){+.+.}-{4:4}:
>             __lock_acquire+0x1348/0x2090
>             lock_acquire+0x108/0x2d8
>             icc_init+0x50/0x108
>             do_one_initcall+0x88/0x340
>             kernel_init_freeable+0x2ac/0x4f8
>             kernel_init+0x28/0x1e8
>             ret_from_fork+0x10/0x20
> 
>      other info that might help us debug this:
> 
>       Possible unsafe locking scenario:
> 
>             CPU0                    CPU1
>             ----                    ----
>        lock(fs_reclaim);
>                                     lock(icc_bw_lock);
>                                     lock(fs_reclaim);
>        lock(icc_bw_lock);
> 
>       *** DEADLOCK ***
> 
>      1 lock held by swapper/0/1:
>       #0: ffffffc081d7db10 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x28/0x108
> 
>      stack backtrace:
>      CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.15.0-next-20250529 #0 NONE
>      Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C7 (DT)
>      Call trace:
>       show_stack+0x20/0x38 (C)
>       dump_stack_lvl+0x90/0xd0
>       dump_stack+0x18/0x28
>       print_circular_bug+0x334/0x448
>       check_noncircular+0x12c/0x140
>       __lock_acquire+0x1348/0x2090
>       lock_acquire+0x108/0x2d8
>       icc_init+0x50/0x108
>       do_one_initcall+0x88/0x340
>       kernel_init_freeable+0x2ac/0x4f8
>       kernel_init+0x28/0x1e8
>       ret_from_fork+0x10/0x20
> 
> Move the memory allocation part of the code outside of the protected
> path to eliminate the warning. Also add a note about why it is moved
> to there,
> 
> Fixes: d30f83d278a9 ("interconnect: core: Add dynamic id allocation support")
> Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
> ---
>   drivers/interconnect/core.c | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> index 1a41e59c77f85a811f78986e98401625f4cadfa3..acdb3b8f1e54942dbb1b71ec2b170b08ad709e6b 100644
> --- a/drivers/interconnect/core.c
> +++ b/drivers/interconnect/core.c
> @@ -1023,6 +1023,16 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider)
>   		return;
> 
>   	mutex_lock(&icc_lock);
> +
> +	if (node->id >= ICC_DYN_ID_START) {
> +		/*
> +		 * Memory allocation must be done outside of codepaths
> +		 * protected by icc_bw_lock.
> +		 */
> +		node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
> +					    node->name, dev_name(provider->dev));
> +	}
> +
>   	mutex_lock(&icc_bw_lock);
> 
>   	node->provider = provider;
> @@ -1038,10 +1048,6 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider)
>   	node->avg_bw = node->init_avg;
>   	node->peak_bw = node->init_peak;
> 
> -	if (node->id >= ICC_DYN_ID_START)
> -		node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
> -					    node->name, dev_name(provider->dev));
> -
>   	if (node->avg_bw || node->peak_bw) {
>   		if (provider->pre_aggregate)
>   			provider->pre_aggregate(node);
> 
> ---
> base-commit: 5fed7fe33c2cd7104fc87b7bc699a7be892befa2
> change-id: 20250529-icc-bw-lockdep-ed030d892a19
> 
> Best regards,
> --
> Gabor Juhos <j4g8y7@gmail.com>
> 
> 

The locking in this code is a mess.

Which data-structures does icc_lock protect node* pointers I think and 
which data-structures does icc_bw_lock protect - "bw" data structures ?

Hmm.

Looking at this code I'm not sure at all what icc_lock was introduced to do.

Can we not just drop it entirely ?

---
bod

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] interconnect: avoid memory allocation when 'icc_bw_lock' is held
  2025-05-30  9:16   ` Bryan O'Donoghue
@ 2025-06-03  9:15     ` Gabor Juhos
  2025-06-03 10:01       ` Bryan O'Donoghue
  0 siblings, 1 reply; 8+ messages in thread
From: Gabor Juhos @ 2025-06-03  9:15 UTC (permalink / raw)
  To: Bryan O'Donoghue, Georgi Djakov, Raviteja Laggyshetty
  Cc: linux-pm, linux-arm-msm, linux-kernel

Hello Bryan,

Sorry for the late reply, I missed your mail.

2025. 05. 30. 11:16 keltezéssel, Bryan O'Donoghue írta:
> On 29/05/2025 15:46, Gabor Juhos wrote:
>> The 'icc_bw_lock' mutex is introduced in commit af42269c3523
>> ("interconnect: Fix locking for runpm vs reclaim") in order
>> to decouple serialization of bw aggregation from codepaths
>> that require memory allocation.
>>
>> However commit d30f83d278a9 ("interconnect: core: Add dynamic
>> id allocation support") added a devm_kasprintf() call into a
>> path protected by the 'icc_bw_lock' which causes this lockdep
>> warning (at least on the IPQ9574 platform):
> 
> Missing a Fixes tag.

Erm, it is before my s-o-b tag.

...

>> Move the memory allocation part of the code outside of the protected
>> path to eliminate the warning. Also add a note about why it is moved
>> to there,
>>
>> Fixes: d30f83d278a9 ("interconnect: core: Add dynamic id allocation support")
>> Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
>> ---
>>   drivers/interconnect/core.c | 14 ++++++++++----
>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
>> index
>> 1a41e59c77f85a811f78986e98401625f4cadfa3..acdb3b8f1e54942dbb1b71ec2b170b08ad709e6b 100644
>> --- a/drivers/interconnect/core.c
>> +++ b/drivers/interconnect/core.c
>> @@ -1023,6 +1023,16 @@ void icc_node_add(struct icc_node *node, struct
>> icc_provider *provider)
>>           return;
>>
>>       mutex_lock(&icc_lock);
>> +
>> +    if (node->id >= ICC_DYN_ID_START) {
>> +        /*
>> +         * Memory allocation must be done outside of codepaths
>> +         * protected by icc_bw_lock.
>> +         */
>> +        node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
>> +                        node->name, dev_name(provider->dev));
>> +    }
>> +
>>       mutex_lock(&icc_bw_lock);
>>
>>       node->provider = provider;
>> @@ -1038,10 +1048,6 @@ void icc_node_add(struct icc_node *node, struct
>> icc_provider *provider)
>>       node->avg_bw = node->init_avg;
>>       node->peak_bw = node->init_peak;
>>
>> -    if (node->id >= ICC_DYN_ID_START)
>> -        node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
>> -                        node->name, dev_name(provider->dev));
>> -
>>       if (node->avg_bw || node->peak_bw) {
>>           if (provider->pre_aggregate)
>>               provider->pre_aggregate(node);
>>
>> ---
>> base-commit: 5fed7fe33c2cd7104fc87b7bc699a7be892befa2
>> change-id: 20250529-icc-bw-lockdep-ed030d892a19
>>
>> Best regards,
>> -- 
>> Gabor Juhos <j4g8y7@gmail.com>
>>
>>
> 
> The locking in this code is a mess.
> 
> Which data-structures does icc_lock protect node* pointers I think and which
> data-structures does icc_bw_lock protect - "bw" data structures ?
> 
> Hmm.
> 
> Looking at this code I'm not sure at all what icc_lock was introduced to do.

Initially, only the 'icc_lock' mutex was here, and that protected 'everything'.
The 'icc_bw_lock' has been introduced later by commit af42269c3523
("interconnect: Fix locking for runpm vs reclaim") as part of the
"drm/msm+PM+icc: Make job_run() reclaim-safe" series [1].

Here is the reason copied from the original commit message:

    "For cases where icc_bw_set() can be called in callbaths that could
    deadlock against shrinker/reclaim, such as runpm resume, we need to
    decouple the icc locking.  Introduce a new icc_bw_lock for cases where
    we need to serialize bw aggregation and update to decouple that from
    paths that require memory allocation such as node/link creation/
    destruction."

> Can we not just drop it entirely ?

I'm not an expert in locking, but I doubt that we can easily drop any of the two
mutexes without reintroducing the problem fixed by the change mentioned above.

[1] https://lore.kernel.org/all/20230807171148.210181-1-robdclark@gmail.com/

Regards,
Gabor


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] interconnect: avoid memory allocation when 'icc_bw_lock' is held
  2025-06-03  9:15     ` Gabor Juhos
@ 2025-06-03 10:01       ` Bryan O'Donoghue
  2025-06-18 12:50         ` Johan Hovold
  0 siblings, 1 reply; 8+ messages in thread
From: Bryan O'Donoghue @ 2025-06-03 10:01 UTC (permalink / raw)
  To: Gabor Juhos, Bryan O'Donoghue, Georgi Djakov,
	Raviteja Laggyshetty
  Cc: linux-pm, linux-arm-msm, linux-kernel

On 03/06/2025 10:15, Gabor Juhos wrote:
> Hello Bryan,
> 
> Sorry for the late reply, I missed your mail.
> 
> 2025. 05. 30. 11:16 keltezéssel, Bryan O'Donoghue írta:
>> On 29/05/2025 15:46, Gabor Juhos wrote:
>>> The 'icc_bw_lock' mutex is introduced in commit af42269c3523
>>> ("interconnect: Fix locking for runpm vs reclaim") in order
>>> to decouple serialization of bw aggregation from codepaths
>>> that require memory allocation.
>>>
>>> However commit d30f83d278a9 ("interconnect: core: Add dynamic
>>> id allocation support") added a devm_kasprintf() call into a
>>> path protected by the 'icc_bw_lock' which causes this lockdep
>>> warning (at least on the IPQ9574 platform):
>>
>> Missing a Fixes tag.
> 
> Erm, it is before my s-o-b tag.

Great thank you I see that.

> ...
> 
>>> Move the memory allocation part of the code outside of the protected
>>> path to eliminate the warning. Also add a note about why it is moved
>>> to there,
>>>
>>> Fixes: d30f83d278a9 ("interconnect: core: Add dynamic id allocation support")
>>> Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
>>> ---
>>>    drivers/interconnect/core.c | 14 ++++++++++----
>>>    1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
>>> index
>>> 1a41e59c77f85a811f78986e98401625f4cadfa3..acdb3b8f1e54942dbb1b71ec2b170b08ad709e6b 100644
>>> --- a/drivers/interconnect/core.c
>>> +++ b/drivers/interconnect/core.c
>>> @@ -1023,6 +1023,16 @@ void icc_node_add(struct icc_node *node, struct
>>> icc_provider *provider)
>>>            return;
>>>
>>>        mutex_lock(&icc_lock);
>>> +
>>> +    if (node->id >= ICC_DYN_ID_START) {
>>> +        /*
>>> +         * Memory allocation must be done outside of codepaths
>>> +         * protected by icc_bw_lock.
>>> +         */
>>> +        node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
>>> +                        node->name, dev_name(provider->dev));
>>> +    }
>>> +
>>>        mutex_lock(&icc_bw_lock);
>>>
>>>        node->provider = provider;
>>> @@ -1038,10 +1048,6 @@ void icc_node_add(struct icc_node *node, struct
>>> icc_provider *provider)
>>>        node->avg_bw = node->init_avg;
>>>        node->peak_bw = node->init_peak;
>>>
>>> -    if (node->id >= ICC_DYN_ID_START)
>>> -        node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
>>> -                        node->name, dev_name(provider->dev));
>>> -
>>>        if (node->avg_bw || node->peak_bw) {
>>>            if (provider->pre_aggregate)
>>>                provider->pre_aggregate(node);
>>>
>>> ---
>>> base-commit: 5fed7fe33c2cd7104fc87b7bc699a7be892befa2
>>> change-id: 20250529-icc-bw-lockdep-ed030d892a19
>>>
>>> Best regards,
>>> --
>>> Gabor Juhos <j4g8y7@gmail.com>
>>>
>>>
>>
>> The locking in this code is a mess.
>>
>> Which data-structures does icc_lock protect node* pointers I think and which
>> data-structures does icc_bw_lock protect - "bw" data structures ?
>>
>> Hmm.
>>
>> Looking at this code I'm not sure at all what icc_lock was introduced to do.
> 
> Initially, only the 'icc_lock' mutex was here, and that protected 'everything'.
> The 'icc_bw_lock' has been introduced later by commit af42269c3523
> ("interconnect: Fix locking for runpm vs reclaim") as part of the
> "drm/msm+PM+icc: Make job_run() reclaim-safe" series [1].
> 
> Here is the reason copied from the original commit message:
> 
>      "For cases where icc_bw_set() can be called in callbaths that could
>      deadlock against shrinker/reclaim, such as runpm resume, we need to
>      decouple the icc locking.  Introduce a new icc_bw_lock for cases where
>      we need to serialize bw aggregation and update to decouple that from
>      paths that require memory allocation such as node/link creation/
>      destruction."

Right but reading this code.

icc_set_bw();
icc_lock_bw - protects struct icc_node *

icc_put();
icc_lock - locks
icc_lock_bw -locks directly after protects struct icc_node *

icc_node_add current:
icc_lock - locks
icc_lock_bw - locks
     node->name = devm_kasprintf();

After your change

icc_node_add current:
icc_lock - locks
     node->name = devm_kasprintf();
icc_lock_bw - locks
     owns node->provider - or whatever

And this is what is prompting my question. Which locks own which data here ?

I think we should sort that out, either by removing one of the locks or 
by at the very least documenting beside the mutex declarations which 
locks protect what.

---
bod



>> Can we not just drop it entirely ?
> 
> I'm not an expert in locking, but I doubt that we can easily drop any of the two
> mutexes without reintroducing the problem fixed by the change mentioned above.
> 
> [1] https://lore.kernel.org/all/20230807171148.210181-1-robdclark@gmail.com/
> 
> Regards,
> Gabor
> 
> 

Right - if this were a struct we would declare what these individual 
mutexes lock.

That's my question here, as I review this code, which mutex protects what ?

I don't think that is particularly clear.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] interconnect: avoid memory allocation when 'icc_bw_lock' is held
  2025-05-29 14:46 ` [PATCH] interconnect: avoid memory allocation when 'icc_bw_lock' is held Gabor Juhos
  2025-05-30  9:16   ` Bryan O'Donoghue
@ 2025-06-18 12:43   ` Johan Hovold
  2025-06-18 19:38     ` Gabor Juhos
  1 sibling, 1 reply; 8+ messages in thread
From: Johan Hovold @ 2025-06-18 12:43 UTC (permalink / raw)
  To: Gabor Juhos, Georgi Djakov
  Cc: Raviteja Laggyshetty, Bryan O'Donoghue, linux-pm,
	linux-arm-msm, linux-kernel

On Thu, May 29, 2025 at 04:46:22PM +0200, Gabor Juhos wrote:
> The 'icc_bw_lock' mutex is introduced in commit af42269c3523
> ("interconnect: Fix locking for runpm vs reclaim") in order
> to decouple serialization of bw aggregation from codepaths
> that require memory allocation.
> 
> However commit d30f83d278a9 ("interconnect: core: Add dynamic
> id allocation support") added a devm_kasprintf() call into a
> path protected by the 'icc_bw_lock' which causes this lockdep
> warning (at least on the IPQ9574 platform):
> 
>     ======================================================
>     WARNING: possible circular locking dependency detected
>     6.15.0-next-20250529 #0 Not tainted
>     ------------------------------------------------------
>     swapper/0/1 is trying to acquire lock:
>     ffffffc081df57d8 (icc_bw_lock){+.+.}-{4:4}, at: icc_init+0x8/0x108
> 
>     but task is already holding lock:
>     ffffffc081d7db10 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x28/0x108
> 
>     which lock already depends on the new lock.

Thanks for fixing this. I get a similar splat with sc8280xp and the
icc_ism_l3 driver since 6.16-rc1.

Georgi, this is a regression that prevents lockdep from being used on a
bunch of Qualcomm platforms and should be fixed in mainline ASAP (e.g.
to avoid further locking issues from being introduced).

> Move the memory allocation part of the code outside of the protected
> path to eliminate the warning. Also add a note about why it is moved
> to there,
> 
> Fixes: d30f83d278a9 ("interconnect: core: Add dynamic id allocation support")
> Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
> ---
>  drivers/interconnect/core.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> index 1a41e59c77f85a811f78986e98401625f4cadfa3..acdb3b8f1e54942dbb1b71ec2b170b08ad709e6b 100644
> --- a/drivers/interconnect/core.c
> +++ b/drivers/interconnect/core.c
> @@ -1023,6 +1023,16 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider)
>  		return;
>  
>  	mutex_lock(&icc_lock);
> +
> +	if (node->id >= ICC_DYN_ID_START) {
> +		/*
> +		 * Memory allocation must be done outside of codepaths
> +		 * protected by icc_bw_lock.
> +		 */
> +		node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
> +					    node->name, dev_name(provider->dev));
> +	}

The node name has already been set by the caller and the node has not
been added yet, so I think you should move this before taking the
icc_lock.

> +
>  	mutex_lock(&icc_bw_lock);
>  
>  	node->provider = provider;

With that addressed, feel free to add my:

Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>

Johan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] interconnect: avoid memory allocation when 'icc_bw_lock' is held
  2025-06-03 10:01       ` Bryan O'Donoghue
@ 2025-06-18 12:50         ` Johan Hovold
  2025-06-18 13:28           ` Bryan O'Donoghue
  0 siblings, 1 reply; 8+ messages in thread
From: Johan Hovold @ 2025-06-18 12:50 UTC (permalink / raw)
  To: Bryan O'Donoghue, Rob Clark
  Cc: Gabor Juhos, Bryan O'Donoghue, Georgi Djakov,
	Raviteja Laggyshetty, linux-pm, linux-arm-msm, linux-kernel

[ +CC: Rob ]

On Tue, Jun 03, 2025 at 10:01:31AM +0000, Bryan O'Donoghue wrote:
> On 03/06/2025 10:15, Gabor Juhos wrote:

> > 2025. 05. 30. 11:16 keltezéssel, Bryan O'Donoghue írta:
> >> On 29/05/2025 15:46, Gabor Juhos wrote:
> >>> The 'icc_bw_lock' mutex is introduced in commit af42269c3523
> >>> ("interconnect: Fix locking for runpm vs reclaim") in order
> >>> to decouple serialization of bw aggregation from codepaths
> >>> that require memory allocation.
> >>>
> >>> However commit d30f83d278a9 ("interconnect: core: Add dynamic
> >>> id allocation support") added a devm_kasprintf() call into a
> >>> path protected by the 'icc_bw_lock' which causes this lockdep
> >>> warning (at least on the IPQ9574 platform):

> >>> Move the memory allocation part of the code outside of the protected
> >>> path to eliminate the warning. Also add a note about why it is moved
> >>> to there,

> >>> @@ -1023,6 +1023,16 @@ void icc_node_add(struct icc_node *node, struct
> >>> icc_provider *provider)
> >>>            return;
> >>>
> >>>        mutex_lock(&icc_lock);
> >>> +
> >>> +    if (node->id >= ICC_DYN_ID_START) {
> >>> +        /*
> >>> +         * Memory allocation must be done outside of codepaths
> >>> +         * protected by icc_bw_lock.
> >>> +         */
> >>> +        node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
> >>> +                        node->name, dev_name(provider->dev));
> >>> +    }
> >>> +
> >>>        mutex_lock(&icc_bw_lock);
> >>>
> >>>        node->provider = provider;
> >>> @@ -1038,10 +1048,6 @@ void icc_node_add(struct icc_node *node, struct
> >>> icc_provider *provider)
> >>>        node->avg_bw = node->init_avg;
> >>>        node->peak_bw = node->init_peak;
> >>>
> >>> -    if (node->id >= ICC_DYN_ID_START)
> >>> -        node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
> >>> -                        node->name, dev_name(provider->dev));
> >>> -
> >>>        if (node->avg_bw || node->peak_bw) {
> >>>            if (provider->pre_aggregate)
> >>>                provider->pre_aggregate(node);

> >> The locking in this code is a mess.
> >>
> >> Which data-structures does icc_lock protect node* pointers I think and which
> >> data-structures does icc_bw_lock protect - "bw" data structures ?
> >>
> >> Hmm.
> >>
> >> Looking at this code I'm not sure at all what icc_lock was introduced to do.
> > 
> > Initially, only the 'icc_lock' mutex was here, and that protected 'everything'.
> > The 'icc_bw_lock' has been introduced later by commit af42269c3523
> > ("interconnect: Fix locking for runpm vs reclaim") as part of the
> > "drm/msm+PM+icc: Make job_run() reclaim-safe" series [1].
> > 
> > Here is the reason copied from the original commit message:
> > 
> >      "For cases where icc_bw_set() can be called in callbaths that could
> >      deadlock against shrinker/reclaim, such as runpm resume, we need to
> >      decouple the icc locking.  Introduce a new icc_bw_lock for cases where
> >      we need to serialize bw aggregation and update to decouple that from
> >      paths that require memory allocation such as node/link creation/
> >      destruction."
> 
> Right but reading this code.
> 
> icc_set_bw();
> icc_lock_bw - protects struct icc_node *
> 
> icc_put();
> icc_lock - locks
> icc_lock_bw -locks directly after protects struct icc_node *
> 
> icc_node_add current:
> icc_lock - locks
> icc_lock_bw - locks
>      node->name = devm_kasprintf();
> 
> After your change
> 
> icc_node_add current:
> icc_lock - locks
>      node->name = devm_kasprintf();
> icc_lock_bw - locks
>      owns node->provider - or whatever
> 
> And this is what is prompting my question. Which locks own which data here ?
> 
> I think we should sort that out, either by removing one of the locks or 
> by at the very least documenting beside the mutex declarations which 
> locks protect what.

Feel free to discuss that with Rob who added the icc_lock_bw, but it's
unrelated to the regression at hand (and should not block fixing it).

Allocations cannot be done while holding the icc_lock_bw, and this fix
is correct in moving the allocation (also note that the node has not
been added yet).

Johan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] interconnect: avoid memory allocation when 'icc_bw_lock' is held
  2025-06-18 12:50         ` Johan Hovold
@ 2025-06-18 13:28           ` Bryan O'Donoghue
  0 siblings, 0 replies; 8+ messages in thread
From: Bryan O'Donoghue @ 2025-06-18 13:28 UTC (permalink / raw)
  To: Johan Hovold, Bryan O'Donoghue, Rob Clark
  Cc: Gabor Juhos, Georgi Djakov, Raviteja Laggyshetty, linux-pm,
	linux-arm-msm, linux-kernel

On 18/06/2025 13:50, Johan Hovold wrote:
>> I think we should sort that out, either by removing one of the locks or
>> by at the very least documenting beside the mutex declarations which
>> locks protect what.
> Feel free to discuss that with Rob who added the icc_lock_bw, but it's
> unrelated to the regression at hand (and should not block fixing it).

True.

Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] interconnect: avoid memory allocation when 'icc_bw_lock' is held
  2025-06-18 12:43   ` Johan Hovold
@ 2025-06-18 19:38     ` Gabor Juhos
  0 siblings, 0 replies; 8+ messages in thread
From: Gabor Juhos @ 2025-06-18 19:38 UTC (permalink / raw)
  To: Johan Hovold, Georgi Djakov
  Cc: Raviteja Laggyshetty, Bryan O'Donoghue, linux-pm,
	linux-arm-msm, linux-kernel

Hi Johan,

Thank you for taking a look into this.

2025. 06. 18. 14:43 keltezéssel, Johan Hovold írta:
> On Thu, May 29, 2025 at 04:46:22PM +0200, Gabor Juhos wrote:
>> The 'icc_bw_lock' mutex is introduced in commit af42269c3523
>> ("interconnect: Fix locking for runpm vs reclaim") in order
>> to decouple serialization of bw aggregation from codepaths
>> that require memory allocation.
>>
>> However commit d30f83d278a9 ("interconnect: core: Add dynamic
>> id allocation support") added a devm_kasprintf() call into a
>> path protected by the 'icc_bw_lock' which causes this lockdep
>> warning (at least on the IPQ9574 platform):
>>
>>     ======================================================
>>     WARNING: possible circular locking dependency detected
>>     6.15.0-next-20250529 #0 Not tainted
>>     ------------------------------------------------------
>>     swapper/0/1 is trying to acquire lock:
>>     ffffffc081df57d8 (icc_bw_lock){+.+.}-{4:4}, at: icc_init+0x8/0x108
>>
>>     but task is already holding lock:
>>     ffffffc081d7db10 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x28/0x108
>>
>>     which lock already depends on the new lock.
> 
> Thanks for fixing this. I get a similar splat with sc8280xp and the
> icc_ism_l3 driver since 6.16-rc1.
> 
> Georgi, this is a regression that prevents lockdep from being used on a
> bunch of Qualcomm platforms and should be fixed in mainline ASAP (e.g.
> to avoid further locking issues from being introduced).
> 
>> Move the memory allocation part of the code outside of the protected
>> path to eliminate the warning. Also add a note about why it is moved
>> to there,
>>
>> Fixes: d30f83d278a9 ("interconnect: core: Add dynamic id allocation support")
>> Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
>> ---
>>  drivers/interconnect/core.c | 14 ++++++++++----
>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
>> index 1a41e59c77f85a811f78986e98401625f4cadfa3..acdb3b8f1e54942dbb1b71ec2b170b08ad709e6b 100644
>> --- a/drivers/interconnect/core.c
>> +++ b/drivers/interconnect/core.c
>> @@ -1023,6 +1023,16 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider)
>>  		return;
>>  
>>  	mutex_lock(&icc_lock);
>> +
>> +	if (node->id >= ICC_DYN_ID_START) {
>> +		/*
>> +		 * Memory allocation must be done outside of codepaths
>> +		 * protected by icc_bw_lock.
>> +		 */
>> +		node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s",
>> +					    node->name, dev_name(provider->dev));
>> +	}
> 
> The node name has already been set by the caller and the node has not
> been added yet, so I think you should move this before taking the
> icc_lock.

It seems reasonable. I will send a modified version, which also contains
handling of memory allocation failures.

> 
>> +
>>  	mutex_lock(&icc_bw_lock);
>>  
>>  	node->provider = provider;
> 
> With that addressed, feel free to add my:
> 
> Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
> Tested-by: Johan Hovold <johan+linaro@kernel.org>


Regards,
Gabor

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-06-18 19:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <TIkPOGVjPeCjPzjVtlSb6V5CIcpaXf2-6WG6HjAyaOW59Hj01-9VK7Z8DKadakOKr6fJeQICi6h0Z8mft9DQyg==@protonmail.internalid>
2025-05-29 14:46 ` [PATCH] interconnect: avoid memory allocation when 'icc_bw_lock' is held Gabor Juhos
2025-05-30  9:16   ` Bryan O'Donoghue
2025-06-03  9:15     ` Gabor Juhos
2025-06-03 10:01       ` Bryan O'Donoghue
2025-06-18 12:50         ` Johan Hovold
2025-06-18 13:28           ` Bryan O'Donoghue
2025-06-18 12:43   ` Johan Hovold
2025-06-18 19:38     ` Gabor Juhos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).