* [PATCH 0/2] PM fixes in Metric Steamer code
@ 2025-03-25 11:43 Maciej Falkowski
2025-03-25 11:43 ` [PATCH 1/2] accel/ivpu: Fix deadlock in ivpu_ms_cleanup() Maciej Falkowski
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Maciej Falkowski @ 2025-03-25 11:43 UTC (permalink / raw)
To: dri-devel
Cc: oded.gabbay, quic_jhugo, jacek.lawrynowicz, lizhi.hou,
Maciej Falkowski
This patches contains two fixes for Metric Stream:
- Fix deadlock that may occur when
executing runtime resume during cold boot where
ms_lock will be already held there,
- Fix warning to warn for suspend status only if
the runtime PM is enabled.
Jacek Lawrynowicz (2):
accel/ivpu: Fix deadlock in ivpu_ms_cleanup()
accel/ivpu: Fix PM related deadlocks in MS IOCTLs
drivers/accel/ivpu/ivpu_debugfs.c | 4 ++--
drivers/accel/ivpu/ivpu_ms.c | 24 ++++++++++++++++++++++++
2 files changed, 26 insertions(+), 2 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] accel/ivpu: Fix deadlock in ivpu_ms_cleanup()
2025-03-25 11:43 [PATCH 0/2] PM fixes in Metric Steamer code Maciej Falkowski
@ 2025-03-25 11:43 ` Maciej Falkowski
2025-03-25 20:50 ` Lizhi Hou
2025-03-25 11:43 ` [PATCH 2/2] accel/ivpu: Fix PM related deadlocks in MS IOCTLs Maciej Falkowski
2025-03-31 12:22 ` [PATCH 0/2] PM fixes in Metric Steamer code Jacek Lawrynowicz
2 siblings, 1 reply; 10+ messages in thread
From: Maciej Falkowski @ 2025-03-25 11:43 UTC (permalink / raw)
To: dri-devel
Cc: oded.gabbay, quic_jhugo, jacek.lawrynowicz, lizhi.hou, stable,
Maciej Falkowski
From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after
file_priv->ms_lock is acquired.
During a failure in runtime resume, a cold boot is executed, which
calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup()
that acquires file_priv->ms_lock and causes the deadlock.
Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support")
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
---
drivers/accel/ivpu/ivpu_ms.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c
index ffe7b10f8a76..eb485cf15ad6 100644
--- a/drivers/accel/ivpu/ivpu_ms.c
+++ b/drivers/accel/ivpu/ivpu_ms.c
@@ -4,6 +4,7 @@
*/
#include <drm/drm_file.h>
+#include <linux/pm_runtime.h>
#include "ivpu_drv.h"
#include "ivpu_gem.h"
@@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file *
void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
{
struct ivpu_ms_instance *ms, *tmp;
+ struct ivpu_device *vdev = file_priv->vdev;
+
+ pm_runtime_get_sync(vdev->drm.dev);
mutex_lock(&file_priv->ms_lock);
@@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
free_instance(file_priv, ms);
mutex_unlock(&file_priv->ms_lock);
+
+ pm_runtime_put_autosuspend(vdev->drm.dev);
}
void ivpu_ms_cleanup_all(struct ivpu_device *vdev)
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] accel/ivpu: Fix PM related deadlocks in MS IOCTLs
2025-03-25 11:43 [PATCH 0/2] PM fixes in Metric Steamer code Maciej Falkowski
2025-03-25 11:43 ` [PATCH 1/2] accel/ivpu: Fix deadlock in ivpu_ms_cleanup() Maciej Falkowski
@ 2025-03-25 11:43 ` Maciej Falkowski
2025-03-28 15:31 ` Lizhi Hou
2025-03-31 12:22 ` [PATCH 0/2] PM fixes in Metric Steamer code Jacek Lawrynowicz
2 siblings, 1 reply; 10+ messages in thread
From: Maciej Falkowski @ 2025-03-25 11:43 UTC (permalink / raw)
To: dri-devel
Cc: oded.gabbay, quic_jhugo, jacek.lawrynowicz, lizhi.hou, stable,
Maciej Falkowski
From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Prevent runtime resume/suspend while MS IOCTLs are in progress.
Failed suspend will call ivpu_ms_cleanup() that would try to acquire
file_priv->ms_lock, which is already held by the IOCTLs.
Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support")
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
---
drivers/accel/ivpu/ivpu_debugfs.c | 4 ++--
drivers/accel/ivpu/ivpu_ms.c | 18 ++++++++++++++++++
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/accel/ivpu/ivpu_debugfs.c b/drivers/accel/ivpu/ivpu_debugfs.c
index 0825851656a2..f0dad0c9ce33 100644
--- a/drivers/accel/ivpu/ivpu_debugfs.c
+++ b/drivers/accel/ivpu/ivpu_debugfs.c
@@ -332,7 +332,7 @@ ivpu_force_recovery_fn(struct file *file, const char __user *user_buf, size_t si
return -EINVAL;
ret = ivpu_rpm_get(vdev);
- if (ret)
+ if (ret < 0)
return ret;
ivpu_pm_trigger_recovery(vdev, "debugfs");
@@ -383,7 +383,7 @@ static int dct_active_set(void *data, u64 active_percent)
return -EINVAL;
ret = ivpu_rpm_get(vdev);
- if (ret)
+ if (ret < 0)
return ret;
if (active_percent)
diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c
index eb485cf15ad6..2a043baf10ca 100644
--- a/drivers/accel/ivpu/ivpu_ms.c
+++ b/drivers/accel/ivpu/ivpu_ms.c
@@ -45,6 +45,10 @@ int ivpu_ms_start_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
args->sampling_period_ns < MS_MIN_SAMPLE_PERIOD_NS)
return -EINVAL;
+ ret = ivpu_rpm_get(vdev);
+ if (ret < 0)
+ return ret;
+
mutex_lock(&file_priv->ms_lock);
if (get_instance_by_mask(file_priv, args->metric_group_mask)) {
@@ -97,6 +101,8 @@ int ivpu_ms_start_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
kfree(ms);
unlock:
mutex_unlock(&file_priv->ms_lock);
+
+ ivpu_rpm_put(vdev);
return ret;
}
@@ -161,6 +167,10 @@ int ivpu_ms_get_data_ioctl(struct drm_device *dev, void *data, struct drm_file *
if (!args->metric_group_mask)
return -EINVAL;
+ ret = ivpu_rpm_get(vdev);
+ if (ret < 0)
+ return ret;
+
mutex_lock(&file_priv->ms_lock);
ms = get_instance_by_mask(file_priv, args->metric_group_mask);
@@ -188,6 +198,7 @@ int ivpu_ms_get_data_ioctl(struct drm_device *dev, void *data, struct drm_file *
unlock:
mutex_unlock(&file_priv->ms_lock);
+ ivpu_rpm_put(vdev);
return ret;
}
@@ -205,11 +216,17 @@ int ivpu_ms_stop_ioctl(struct drm_device *dev, void *data, struct drm_file *file
{
struct ivpu_file_priv *file_priv = file->driver_priv;
struct drm_ivpu_metric_streamer_stop *args = data;
+ struct ivpu_device *vdev = file_priv->vdev;
struct ivpu_ms_instance *ms;
+ int ret;
if (!args->metric_group_mask)
return -EINVAL;
+ ret = ivpu_rpm_get(vdev);
+ if (ret < 0)
+ return ret;
+
mutex_lock(&file_priv->ms_lock);
ms = get_instance_by_mask(file_priv, args->metric_group_mask);
@@ -218,6 +235,7 @@ int ivpu_ms_stop_ioctl(struct drm_device *dev, void *data, struct drm_file *file
mutex_unlock(&file_priv->ms_lock);
+ ivpu_rpm_put(vdev);
return ms ? 0 : -EINVAL;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] accel/ivpu: Fix deadlock in ivpu_ms_cleanup()
2025-03-25 11:43 ` [PATCH 1/2] accel/ivpu: Fix deadlock in ivpu_ms_cleanup() Maciej Falkowski
@ 2025-03-25 20:50 ` Lizhi Hou
2025-03-26 8:06 ` Jacek Lawrynowicz
0 siblings, 1 reply; 10+ messages in thread
From: Lizhi Hou @ 2025-03-25 20:50 UTC (permalink / raw)
To: Maciej Falkowski, dri-devel
Cc: oded.gabbay, quic_jhugo, jacek.lawrynowicz, stable
On 3/25/25 04:43, Maciej Falkowski wrote:
> From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>
> Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after
> file_priv->ms_lock is acquired.
>
> During a failure in runtime resume, a cold boot is executed, which
> calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup()
> that acquires file_priv->ms_lock and causes the deadlock.
>
> Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support")
> Cc: <stable@vger.kernel.org> # v6.11+
> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
> ---
> drivers/accel/ivpu/ivpu_ms.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c
> index ffe7b10f8a76..eb485cf15ad6 100644
> --- a/drivers/accel/ivpu/ivpu_ms.c
> +++ b/drivers/accel/ivpu/ivpu_ms.c
> @@ -4,6 +4,7 @@
> */
>
> #include <drm/drm_file.h>
> +#include <linux/pm_runtime.h>
>
> #include "ivpu_drv.h"
> #include "ivpu_gem.h"
> @@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file *
> void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
> {
> struct ivpu_ms_instance *ms, *tmp;
> + struct ivpu_device *vdev = file_priv->vdev;
> +
> + pm_runtime_get_sync(vdev->drm.dev);
Could get_sync() be failed here? Maybe it is better to add warning for
failure?
Lizhi
>
> mutex_lock(&file_priv->ms_lock);
>
> @@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
> free_instance(file_priv, ms);
>
> mutex_unlock(&file_priv->ms_lock);
> +
> + pm_runtime_put_autosuspend(vdev->drm.dev);
> }
>
> void ivpu_ms_cleanup_all(struct ivpu_device *vdev)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] accel/ivpu: Fix deadlock in ivpu_ms_cleanup()
2025-03-25 20:50 ` Lizhi Hou
@ 2025-03-26 8:06 ` Jacek Lawrynowicz
2025-03-27 17:38 ` Lizhi Hou
0 siblings, 1 reply; 10+ messages in thread
From: Jacek Lawrynowicz @ 2025-03-26 8:06 UTC (permalink / raw)
To: Lizhi Hou, Maciej Falkowski, dri-devel; +Cc: oded.gabbay, quic_jhugo, stable
Hi,
On 3/25/2025 9:50 PM, Lizhi Hou wrote:
>
> On 3/25/25 04:43, Maciej Falkowski wrote:
>> From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>>
>> Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after
>> file_priv->ms_lock is acquired.
>>
>> During a failure in runtime resume, a cold boot is executed, which
>> calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup()
>> that acquires file_priv->ms_lock and causes the deadlock.
>>
>> Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support")
>> Cc: <stable@vger.kernel.org> # v6.11+
>> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
>> ---
>> drivers/accel/ivpu/ivpu_ms.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c
>> index ffe7b10f8a76..eb485cf15ad6 100644
>> --- a/drivers/accel/ivpu/ivpu_ms.c
>> +++ b/drivers/accel/ivpu/ivpu_ms.c
>> @@ -4,6 +4,7 @@
>> */
>> #include <drm/drm_file.h>
>> +#include <linux/pm_runtime.h>
>> #include "ivpu_drv.h"
>> #include "ivpu_gem.h"
>> @@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file *
>> void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
>> {
>> struct ivpu_ms_instance *ms, *tmp;
>> + struct ivpu_device *vdev = file_priv->vdev;
>> +
>> + pm_runtime_get_sync(vdev->drm.dev);
>
> Could get_sync() be failed here? Maybe it is better to add warning for failure?
Yes, this could fail but we already have detailed warnings in runtime resume callback (ivpu_pm_runtime_resume_cb()).
>
>> mutex_lock(&file_priv->ms_lock);
>> @@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
>> free_instance(file_priv, ms);
>> mutex_unlock(&file_priv->ms_lock);
>> +
>> + pm_runtime_put_autosuspend(vdev->drm.dev);
>> }
>> void ivpu_ms_cleanup_all(struct ivpu_device *vdev)
Regards,
Jacek
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] accel/ivpu: Fix deadlock in ivpu_ms_cleanup()
2025-03-26 8:06 ` Jacek Lawrynowicz
@ 2025-03-27 17:38 ` Lizhi Hou
2025-03-28 8:23 ` Jacek Lawrynowicz
0 siblings, 1 reply; 10+ messages in thread
From: Lizhi Hou @ 2025-03-27 17:38 UTC (permalink / raw)
To: Jacek Lawrynowicz, Maciej Falkowski, dri-devel
Cc: oded.gabbay, quic_jhugo, stable
On 3/26/25 01:06, Jacek Lawrynowicz wrote:
> Hi,
>
> On 3/25/2025 9:50 PM, Lizhi Hou wrote:
>> On 3/25/25 04:43, Maciej Falkowski wrote:
>>> From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>>>
>>> Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after
>>> file_priv->ms_lock is acquired.
>>>
>>> During a failure in runtime resume, a cold boot is executed, which
>>> calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup()
>>> that acquires file_priv->ms_lock and causes the deadlock.
>>>
>>> Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support")
>>> Cc: <stable@vger.kernel.org> # v6.11+
>>> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>>> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
>>> ---
>>> drivers/accel/ivpu/ivpu_ms.c | 6 ++++++
>>> 1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c
>>> index ffe7b10f8a76..eb485cf15ad6 100644
>>> --- a/drivers/accel/ivpu/ivpu_ms.c
>>> +++ b/drivers/accel/ivpu/ivpu_ms.c
>>> @@ -4,6 +4,7 @@
>>> */
>>> #include <drm/drm_file.h>
>>> +#include <linux/pm_runtime.h>
>>> #include "ivpu_drv.h"
>>> #include "ivpu_gem.h"
>>> @@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file *
>>> void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
>>> {
>>> struct ivpu_ms_instance *ms, *tmp;
>>> + struct ivpu_device *vdev = file_priv->vdev;
>>> +
>>> + pm_runtime_get_sync(vdev->drm.dev);
>> Could get_sync() be failed here? Maybe it is better to add warning for failure?
> Yes, this could fail but we already have detailed warnings in runtime resume callback (ivpu_pm_runtime_resume_cb()).
Will the deadlock still happens if this function fails?
Lizhi
>>> mutex_lock(&file_priv->ms_lock);
>>> @@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
>>> free_instance(file_priv, ms);
>>> mutex_unlock(&file_priv->ms_lock);
>>> +
>>> + pm_runtime_put_autosuspend(vdev->drm.dev);
>>> }
>>> void ivpu_ms_cleanup_all(struct ivpu_device *vdev)
> Regards,
> Jacek
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] accel/ivpu: Fix deadlock in ivpu_ms_cleanup()
2025-03-27 17:38 ` Lizhi Hou
@ 2025-03-28 8:23 ` Jacek Lawrynowicz
2025-03-28 15:29 ` Lizhi Hou
0 siblings, 1 reply; 10+ messages in thread
From: Jacek Lawrynowicz @ 2025-03-28 8:23 UTC (permalink / raw)
To: Lizhi Hou, Maciej Falkowski, dri-devel; +Cc: oded.gabbay, quic_jhugo, stable
On 3/27/2025 6:38 PM, Lizhi Hou wrote:
>
> On 3/26/25 01:06, Jacek Lawrynowicz wrote:
>> Hi,
>>
>> On 3/25/2025 9:50 PM, Lizhi Hou wrote:
>>> On 3/25/25 04:43, Maciej Falkowski wrote:
>>>> From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>>>>
>>>> Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after
>>>> file_priv->ms_lock is acquired.
>>>>
>>>> During a failure in runtime resume, a cold boot is executed, which
>>>> calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup()
>>>> that acquires file_priv->ms_lock and causes the deadlock.
>>>>
>>>> Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support")
>>>> Cc: <stable@vger.kernel.org> # v6.11+
>>>> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>>>> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
>>>> ---
>>>> drivers/accel/ivpu/ivpu_ms.c | 6 ++++++
>>>> 1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c
>>>> index ffe7b10f8a76..eb485cf15ad6 100644
>>>> --- a/drivers/accel/ivpu/ivpu_ms.c
>>>> +++ b/drivers/accel/ivpu/ivpu_ms.c
>>>> @@ -4,6 +4,7 @@
>>>> */
>>>> #include <drm/drm_file.h>
>>>> +#include <linux/pm_runtime.h>
>>>> #include "ivpu_drv.h"
>>>> #include "ivpu_gem.h"
>>>> @@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file *
>>>> void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
>>>> {
>>>> struct ivpu_ms_instance *ms, *tmp;
>>>> + struct ivpu_device *vdev = file_priv->vdev;
>>>> +
>>>> + pm_runtime_get_sync(vdev->drm.dev);
>>> Could get_sync() be failed here? Maybe it is better to add warning for failure?
>> Yes, this could fail but we already have detailed warnings in runtime resume callback (ivpu_pm_runtime_resume_cb()).
>
> Will the deadlock still happens if this function fails?
No. The deadlock was caused by runtime resume in free_instance().
pm_runtime_get_sync() will always bump PM usage counter, so there will be no resume regardless if it fails or not.
>>>> mutex_lock(&file_priv->ms_lock);
>>>> @@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
>>>> free_instance(file_priv, ms);
>>>> mutex_unlock(&file_priv->ms_lock);
>>>> +
>>>> + pm_runtime_put_autosuspend(vdev->drm.dev);
>>>> }
>>>> void ivpu_ms_cleanup_all(struct ivpu_device *vdev)
>> Regards,
>> Jacek
>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] accel/ivpu: Fix deadlock in ivpu_ms_cleanup()
2025-03-28 8:23 ` Jacek Lawrynowicz
@ 2025-03-28 15:29 ` Lizhi Hou
0 siblings, 0 replies; 10+ messages in thread
From: Lizhi Hou @ 2025-03-28 15:29 UTC (permalink / raw)
To: Jacek Lawrynowicz, Maciej Falkowski, dri-devel
Cc: oded.gabbay, quic_jhugo, stable
On 3/28/25 01:23, Jacek Lawrynowicz wrote:
>
> On 3/27/2025 6:38 PM, Lizhi Hou wrote:
>> On 3/26/25 01:06, Jacek Lawrynowicz wrote:
>>> Hi,
>>>
>>> On 3/25/2025 9:50 PM, Lizhi Hou wrote:
>>>> On 3/25/25 04:43, Maciej Falkowski wrote:
>>>>> From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>>>>>
>>>>> Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after
>>>>> file_priv->ms_lock is acquired.
>>>>>
>>>>> During a failure in runtime resume, a cold boot is executed, which
>>>>> calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup()
>>>>> that acquires file_priv->ms_lock and causes the deadlock.
>>>>>
>>>>> Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support")
>>>>> Cc: <stable@vger.kernel.org> # v6.11+
>>>>> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>>>>> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
>>>>> ---
>>>>> drivers/accel/ivpu/ivpu_ms.c | 6 ++++++
>>>>> 1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c
>>>>> index ffe7b10f8a76..eb485cf15ad6 100644
>>>>> --- a/drivers/accel/ivpu/ivpu_ms.c
>>>>> +++ b/drivers/accel/ivpu/ivpu_ms.c
>>>>> @@ -4,6 +4,7 @@
>>>>> */
>>>>> #include <drm/drm_file.h>
>>>>> +#include <linux/pm_runtime.h>
>>>>> #include "ivpu_drv.h"
>>>>> #include "ivpu_gem.h"
>>>>> @@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file *
>>>>> void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
>>>>> {
>>>>> struct ivpu_ms_instance *ms, *tmp;
>>>>> + struct ivpu_device *vdev = file_priv->vdev;
>>>>> +
>>>>> + pm_runtime_get_sync(vdev->drm.dev);
>>>> Could get_sync() be failed here? Maybe it is better to add warning for failure?
>>> Yes, this could fail but we already have detailed warnings in runtime resume callback (ivpu_pm_runtime_resume_cb()).
>> Will the deadlock still happens if this function fails?
> No. The deadlock was caused by runtime resume in free_instance().
> pm_runtime_get_sync() will always bump PM usage counter, so there will be no resume regardless if it fails or not.
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
>>>>> mutex_lock(&file_priv->ms_lock);
>>>>> @@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
>>>>> free_instance(file_priv, ms);
>>>>> mutex_unlock(&file_priv->ms_lock);
>>>>> +
>>>>> + pm_runtime_put_autosuspend(vdev->drm.dev);
>>>>> }
>>>>> void ivpu_ms_cleanup_all(struct ivpu_device *vdev)
>>> Regards,
>>> Jacek
>>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] accel/ivpu: Fix PM related deadlocks in MS IOCTLs
2025-03-25 11:43 ` [PATCH 2/2] accel/ivpu: Fix PM related deadlocks in MS IOCTLs Maciej Falkowski
@ 2025-03-28 15:31 ` Lizhi Hou
0 siblings, 0 replies; 10+ messages in thread
From: Lizhi Hou @ 2025-03-28 15:31 UTC (permalink / raw)
To: Maciej Falkowski, dri-devel
Cc: oded.gabbay, quic_jhugo, jacek.lawrynowicz, stable
On 3/25/25 04:43, Maciej Falkowski wrote:
> From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
>
> Prevent runtime resume/suspend while MS IOCTLs are in progress.
> Failed suspend will call ivpu_ms_cleanup() that would try to acquire
> file_priv->ms_lock, which is already held by the IOCTLs.
>
> Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support")
> Cc: <stable@vger.kernel.org> # v6.11+
> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
> ---
> drivers/accel/ivpu/ivpu_debugfs.c | 4 ++--
> drivers/accel/ivpu/ivpu_ms.c | 18 ++++++++++++++++++
> 2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/accel/ivpu/ivpu_debugfs.c b/drivers/accel/ivpu/ivpu_debugfs.c
> index 0825851656a2..f0dad0c9ce33 100644
> --- a/drivers/accel/ivpu/ivpu_debugfs.c
> +++ b/drivers/accel/ivpu/ivpu_debugfs.c
> @@ -332,7 +332,7 @@ ivpu_force_recovery_fn(struct file *file, const char __user *user_buf, size_t si
> return -EINVAL;
>
> ret = ivpu_rpm_get(vdev);
> - if (ret)
> + if (ret < 0)
> return ret;
>
> ivpu_pm_trigger_recovery(vdev, "debugfs");
> @@ -383,7 +383,7 @@ static int dct_active_set(void *data, u64 active_percent)
> return -EINVAL;
>
> ret = ivpu_rpm_get(vdev);
> - if (ret)
> + if (ret < 0)
> return ret;
>
> if (active_percent)
> diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c
> index eb485cf15ad6..2a043baf10ca 100644
> --- a/drivers/accel/ivpu/ivpu_ms.c
> +++ b/drivers/accel/ivpu/ivpu_ms.c
> @@ -45,6 +45,10 @@ int ivpu_ms_start_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
> args->sampling_period_ns < MS_MIN_SAMPLE_PERIOD_NS)
> return -EINVAL;
>
> + ret = ivpu_rpm_get(vdev);
> + if (ret < 0)
> + return ret;
> +
> mutex_lock(&file_priv->ms_lock);
>
> if (get_instance_by_mask(file_priv, args->metric_group_mask)) {
> @@ -97,6 +101,8 @@ int ivpu_ms_start_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
> kfree(ms);
> unlock:
> mutex_unlock(&file_priv->ms_lock);
> +
> + ivpu_rpm_put(vdev);
> return ret;
> }
>
> @@ -161,6 +167,10 @@ int ivpu_ms_get_data_ioctl(struct drm_device *dev, void *data, struct drm_file *
> if (!args->metric_group_mask)
> return -EINVAL;
>
> + ret = ivpu_rpm_get(vdev);
> + if (ret < 0)
> + return ret;
> +
> mutex_lock(&file_priv->ms_lock);
>
> ms = get_instance_by_mask(file_priv, args->metric_group_mask);
> @@ -188,6 +198,7 @@ int ivpu_ms_get_data_ioctl(struct drm_device *dev, void *data, struct drm_file *
> unlock:
> mutex_unlock(&file_priv->ms_lock);
>
> + ivpu_rpm_put(vdev);
> return ret;
> }
>
> @@ -205,11 +216,17 @@ int ivpu_ms_stop_ioctl(struct drm_device *dev, void *data, struct drm_file *file
> {
> struct ivpu_file_priv *file_priv = file->driver_priv;
> struct drm_ivpu_metric_streamer_stop *args = data;
> + struct ivpu_device *vdev = file_priv->vdev;
> struct ivpu_ms_instance *ms;
> + int ret;
>
> if (!args->metric_group_mask)
> return -EINVAL;
>
> + ret = ivpu_rpm_get(vdev);
> + if (ret < 0)
> + return ret;
> +
> mutex_lock(&file_priv->ms_lock);
>
> ms = get_instance_by_mask(file_priv, args->metric_group_mask);
> @@ -218,6 +235,7 @@ int ivpu_ms_stop_ioctl(struct drm_device *dev, void *data, struct drm_file *file
>
> mutex_unlock(&file_priv->ms_lock);
>
> + ivpu_rpm_put(vdev);
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
> return ms ? 0 : -EINVAL;
> }
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] PM fixes in Metric Steamer code
2025-03-25 11:43 [PATCH 0/2] PM fixes in Metric Steamer code Maciej Falkowski
2025-03-25 11:43 ` [PATCH 1/2] accel/ivpu: Fix deadlock in ivpu_ms_cleanup() Maciej Falkowski
2025-03-25 11:43 ` [PATCH 2/2] accel/ivpu: Fix PM related deadlocks in MS IOCTLs Maciej Falkowski
@ 2025-03-31 12:22 ` Jacek Lawrynowicz
2 siblings, 0 replies; 10+ messages in thread
From: Jacek Lawrynowicz @ 2025-03-31 12:22 UTC (permalink / raw)
To: Maciej Falkowski, dri-devel; +Cc: oded.gabbay, quic_jhugo, lizhi.hou
Applied to drm-misc-fixes
On 3/25/2025 12:43 PM, Maciej Falkowski wrote:
> This patches contains two fixes for Metric Stream:
> - Fix deadlock that may occur when
> executing runtime resume during cold boot where
> ms_lock will be already held there,
> - Fix warning to warn for suspend status only if
> the runtime PM is enabled.
>
> Jacek Lawrynowicz (2):
> accel/ivpu: Fix deadlock in ivpu_ms_cleanup()
> accel/ivpu: Fix PM related deadlocks in MS IOCTLs
>
> drivers/accel/ivpu/ivpu_debugfs.c | 4 ++--
> drivers/accel/ivpu/ivpu_ms.c | 24 ++++++++++++++++++++++++
> 2 files changed, 26 insertions(+), 2 deletions(-)
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-03-31 12:22 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-25 11:43 [PATCH 0/2] PM fixes in Metric Steamer code Maciej Falkowski
2025-03-25 11:43 ` [PATCH 1/2] accel/ivpu: Fix deadlock in ivpu_ms_cleanup() Maciej Falkowski
2025-03-25 20:50 ` Lizhi Hou
2025-03-26 8:06 ` Jacek Lawrynowicz
2025-03-27 17:38 ` Lizhi Hou
2025-03-28 8:23 ` Jacek Lawrynowicz
2025-03-28 15:29 ` Lizhi Hou
2025-03-25 11:43 ` [PATCH 2/2] accel/ivpu: Fix PM related deadlocks in MS IOCTLs Maciej Falkowski
2025-03-28 15:31 ` Lizhi Hou
2025-03-31 12:22 ` [PATCH 0/2] PM fixes in Metric Steamer code Jacek Lawrynowicz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.