* [PATCH 0/3] accel/ivpu: Fixes for 6.14
@ 2025-01-29 12:40 Jacek Lawrynowicz
2025-01-29 12:40 ` [PATCH 1/3] accel/ivpu: Fix error handling in ivpu_boot() Jacek Lawrynowicz
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Jacek Lawrynowicz @ 2025-01-29 12:40 UTC (permalink / raw)
To: dri-devel; +Cc: oded.gabbay, quic_jhugo, maciej.falkowski, Jacek Lawrynowicz
Stability fixes focused around power management and error handling.
Jacek Lawrynowicz (3):
accel/ivpu: Fix error handling in ivpu_boot()
accel/ivpu: Clear runtime_error after pm_runtime_resume_and_get()
fails
accel/ivpu: Fix error handling in recovery/reset
drivers/accel/ivpu/ivpu_drv.c | 8 +++-
drivers/accel/ivpu/ivpu_pm.c | 84 ++++++++++++++++++++---------------
2 files changed, 53 insertions(+), 39 deletions(-)
--
2.45.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] accel/ivpu: Fix error handling in ivpu_boot()
2025-01-29 12:40 [PATCH 0/3] accel/ivpu: Fixes for 6.14 Jacek Lawrynowicz
@ 2025-01-29 12:40 ` Jacek Lawrynowicz
2025-01-31 18:26 ` Jeffrey Hugo
2025-01-29 12:40 ` [PATCH 2/3] accel/ivpu: Clear runtime_error after pm_runtime_resume_and_get() fails Jacek Lawrynowicz
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Jacek Lawrynowicz @ 2025-01-29 12:40 UTC (permalink / raw)
To: dri-devel
Cc: oded.gabbay, quic_jhugo, maciej.falkowski, Jacek Lawrynowicz,
stable, Karol Wachowski
Ensure IRQs and IPC are properly disabled if HW sched or DCT
initialization fails.
Fixes: cc3c72c7e610 ("accel/ivpu: Refactor failure diagnostics during boot")
Cc: <stable@vger.kernel.org> # v6.13+
Reviewed-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
---
drivers/accel/ivpu/ivpu_drv.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/accel/ivpu/ivpu_drv.c b/drivers/accel/ivpu/ivpu_drv.c
index ca2bf47ce2484..0c4a82271c26d 100644
--- a/drivers/accel/ivpu/ivpu_drv.c
+++ b/drivers/accel/ivpu/ivpu_drv.c
@@ -397,15 +397,19 @@ int ivpu_boot(struct ivpu_device *vdev)
if (ivpu_fw_is_cold_boot(vdev)) {
ret = ivpu_pm_dct_init(vdev);
if (ret)
- goto err_diagnose_failure;
+ goto err_disable_ipc;
ret = ivpu_hw_sched_init(vdev);
if (ret)
- goto err_diagnose_failure;
+ goto err_disable_ipc;
}
return 0;
+err_disable_ipc:
+ ivpu_ipc_disable(vdev);
+ ivpu_hw_irq_disable(vdev);
+ disable_irq(vdev->irq);
err_diagnose_failure:
ivpu_hw_diagnose_failure(vdev);
ivpu_mmu_evtq_dump(vdev);
--
2.45.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] accel/ivpu: Clear runtime_error after pm_runtime_resume_and_get() fails
2025-01-29 12:40 [PATCH 0/3] accel/ivpu: Fixes for 6.14 Jacek Lawrynowicz
2025-01-29 12:40 ` [PATCH 1/3] accel/ivpu: Fix error handling in ivpu_boot() Jacek Lawrynowicz
@ 2025-01-29 12:40 ` Jacek Lawrynowicz
2025-01-31 18:27 ` Jeffrey Hugo
2025-01-29 12:40 ` [PATCH 3/3] accel/ivpu: Fix error handling in recovery/reset Jacek Lawrynowicz
2025-02-03 9:37 ` [PATCH 0/3] accel/ivpu: Fixes for 6.14 Jacek Lawrynowicz
3 siblings, 1 reply; 8+ messages in thread
From: Jacek Lawrynowicz @ 2025-01-29 12:40 UTC (permalink / raw)
To: dri-devel
Cc: oded.gabbay, quic_jhugo, maciej.falkowski, Jacek Lawrynowicz,
stable
pm_runtime_resume_and_get() sets dev->power.runtime_error that causes
all subsequent pm_runtime_get_sync() calls to fail.
Clear the runtime_error using pm_runtime_set_suspended(), so the driver
doesn't have to be reloaded to recover when the NPU fails to boot during
runtime resume.
Fixes: 7d4b4c74432d ("accel/ivpu: Remove suspend_reschedule_counter")
Cc: <stable@vger.kernel.org> # v6.11+
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
---
drivers/accel/ivpu/ivpu_pm.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c
index 949f4233946c6..c3774d2221326 100644
--- a/drivers/accel/ivpu/ivpu_pm.c
+++ b/drivers/accel/ivpu/ivpu_pm.c
@@ -309,7 +309,10 @@ int ivpu_rpm_get(struct ivpu_device *vdev)
int ret;
ret = pm_runtime_resume_and_get(vdev->drm.dev);
- drm_WARN_ON(&vdev->drm, ret < 0);
+ if (ret < 0) {
+ ivpu_err(vdev, "Failed to resume NPU: %d\n", ret);
+ pm_runtime_set_suspended(vdev->drm.dev);
+ }
return ret;
}
--
2.45.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] accel/ivpu: Fix error handling in recovery/reset
2025-01-29 12:40 [PATCH 0/3] accel/ivpu: Fixes for 6.14 Jacek Lawrynowicz
2025-01-29 12:40 ` [PATCH 1/3] accel/ivpu: Fix error handling in ivpu_boot() Jacek Lawrynowicz
2025-01-29 12:40 ` [PATCH 2/3] accel/ivpu: Clear runtime_error after pm_runtime_resume_and_get() fails Jacek Lawrynowicz
@ 2025-01-29 12:40 ` Jacek Lawrynowicz
2025-01-31 18:29 ` Jeffrey Hugo
2025-02-03 9:37 ` [PATCH 0/3] accel/ivpu: Fixes for 6.14 Jacek Lawrynowicz
3 siblings, 1 reply; 8+ messages in thread
From: Jacek Lawrynowicz @ 2025-01-29 12:40 UTC (permalink / raw)
To: dri-devel
Cc: oded.gabbay, quic_jhugo, maciej.falkowski, Jacek Lawrynowicz,
stable
Disable runtime PM for the duration of reset/recovery so it is possible
to set the correct runtime PM state depending on the outcome of the
`ivpu_resume()`. Don’t suspend or reset the HW if the NPU is suspended
when the reset/recovery is requested. Also, move common reset/recovery
code to separate functions for better code readability.
Fixes: 27d19268cf39 ("accel/ivpu: Improve recovery and reset support")
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
---
drivers/accel/ivpu/ivpu_pm.c | 79 ++++++++++++++++++++----------------
1 file changed, 43 insertions(+), 36 deletions(-)
diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c
index c3774d2221326..8b2b050cc41a9 100644
--- a/drivers/accel/ivpu/ivpu_pm.c
+++ b/drivers/accel/ivpu/ivpu_pm.c
@@ -115,41 +115,57 @@ static int ivpu_resume(struct ivpu_device *vdev)
return ret;
}
-static void ivpu_pm_recovery_work(struct work_struct *work)
+static void ivpu_pm_reset_begin(struct ivpu_device *vdev)
{
- struct ivpu_pm_info *pm = container_of(work, struct ivpu_pm_info, recovery_work);
- struct ivpu_device *vdev = pm->vdev;
- char *evt[2] = {"IVPU_PM_EVENT=IVPU_RECOVER", NULL};
- int ret;
-
- ivpu_err(vdev, "Recovering the NPU (reset #%d)\n", atomic_read(&vdev->pm->reset_counter));
-
- ret = pm_runtime_resume_and_get(vdev->drm.dev);
- if (ret)
- ivpu_err(vdev, "Failed to resume NPU: %d\n", ret);
-
- ivpu_jsm_state_dump(vdev);
- ivpu_dev_coredump(vdev);
+ pm_runtime_disable(vdev->drm.dev);
atomic_inc(&vdev->pm->reset_counter);
atomic_set(&vdev->pm->reset_pending, 1);
down_write(&vdev->pm->reset_lock);
+}
+
+static void ivpu_pm_reset_complete(struct ivpu_device *vdev)
+{
+ int ret;
- ivpu_suspend(vdev);
ivpu_pm_prepare_cold_boot(vdev);
ivpu_jobs_abort_all(vdev);
ivpu_ms_cleanup_all(vdev);
ret = ivpu_resume(vdev);
- if (ret)
+ if (ret) {
ivpu_err(vdev, "Failed to resume NPU: %d\n", ret);
+ pm_runtime_set_suspended(vdev->drm.dev);
+ } else {
+ pm_runtime_set_active(vdev->drm.dev);
+ }
up_write(&vdev->pm->reset_lock);
atomic_set(&vdev->pm->reset_pending, 0);
- kobject_uevent_env(&vdev->drm.dev->kobj, KOBJ_CHANGE, evt);
pm_runtime_mark_last_busy(vdev->drm.dev);
- pm_runtime_put_autosuspend(vdev->drm.dev);
+ pm_runtime_enable(vdev->drm.dev);
+}
+
+static void ivpu_pm_recovery_work(struct work_struct *work)
+{
+ struct ivpu_pm_info *pm = container_of(work, struct ivpu_pm_info, recovery_work);
+ struct ivpu_device *vdev = pm->vdev;
+ char *evt[2] = {"IVPU_PM_EVENT=IVPU_RECOVER", NULL};
+
+ ivpu_err(vdev, "Recovering the NPU (reset #%d)\n", atomic_read(&vdev->pm->reset_counter));
+
+ ivpu_pm_reset_begin(vdev);
+
+ if (!pm_runtime_status_suspended(vdev->drm.dev)) {
+ ivpu_jsm_state_dump(vdev);
+ ivpu_dev_coredump(vdev);
+ ivpu_suspend(vdev);
+ }
+
+ ivpu_pm_reset_complete(vdev);
+
+ kobject_uevent_env(&vdev->drm.dev->kobj, KOBJ_CHANGE, evt);
}
void ivpu_pm_trigger_recovery(struct ivpu_device *vdev, const char *reason)
@@ -328,16 +344,13 @@ void ivpu_pm_reset_prepare_cb(struct pci_dev *pdev)
struct ivpu_device *vdev = pci_get_drvdata(pdev);
ivpu_dbg(vdev, PM, "Pre-reset..\n");
- atomic_inc(&vdev->pm->reset_counter);
- atomic_set(&vdev->pm->reset_pending, 1);
- pm_runtime_get_sync(vdev->drm.dev);
- down_write(&vdev->pm->reset_lock);
- ivpu_prepare_for_reset(vdev);
- ivpu_hw_reset(vdev);
- ivpu_pm_prepare_cold_boot(vdev);
- ivpu_jobs_abort_all(vdev);
- ivpu_ms_cleanup_all(vdev);
+ ivpu_pm_reset_begin(vdev);
+
+ if (!pm_runtime_status_suspended(vdev->drm.dev)) {
+ ivpu_prepare_for_reset(vdev);
+ ivpu_hw_reset(vdev);
+ }
ivpu_dbg(vdev, PM, "Pre-reset done.\n");
}
@@ -345,18 +358,12 @@ void ivpu_pm_reset_prepare_cb(struct pci_dev *pdev)
void ivpu_pm_reset_done_cb(struct pci_dev *pdev)
{
struct ivpu_device *vdev = pci_get_drvdata(pdev);
- int ret;
ivpu_dbg(vdev, PM, "Post-reset..\n");
- ret = ivpu_resume(vdev);
- if (ret)
- ivpu_err(vdev, "Failed to set RESUME state: %d\n", ret);
- up_write(&vdev->pm->reset_lock);
- atomic_set(&vdev->pm->reset_pending, 0);
- ivpu_dbg(vdev, PM, "Post-reset done.\n");
- pm_runtime_mark_last_busy(vdev->drm.dev);
- pm_runtime_put_autosuspend(vdev->drm.dev);
+ ivpu_pm_reset_complete(vdev);
+
+ ivpu_dbg(vdev, PM, "Post-reset done.\n");
}
void ivpu_pm_init(struct ivpu_device *vdev)
--
2.45.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] accel/ivpu: Fix error handling in ivpu_boot()
2025-01-29 12:40 ` [PATCH 1/3] accel/ivpu: Fix error handling in ivpu_boot() Jacek Lawrynowicz
@ 2025-01-31 18:26 ` Jeffrey Hugo
0 siblings, 0 replies; 8+ messages in thread
From: Jeffrey Hugo @ 2025-01-31 18:26 UTC (permalink / raw)
To: Jacek Lawrynowicz, dri-devel
Cc: oded.gabbay, maciej.falkowski, stable, Karol Wachowski
On 1/29/2025 5:40 AM, Jacek Lawrynowicz wrote:
> Ensure IRQs and IPC are properly disabled if HW sched or DCT
> initialization fails.
>
> Fixes: cc3c72c7e610 ("accel/ivpu: Refactor failure diagnostics during boot")
> Cc: <stable@vger.kernel.org> # v6.13+
> Reviewed-by: Karol Wachowski <karol.wachowski@intel.com>
> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] accel/ivpu: Clear runtime_error after pm_runtime_resume_and_get() fails
2025-01-29 12:40 ` [PATCH 2/3] accel/ivpu: Clear runtime_error after pm_runtime_resume_and_get() fails Jacek Lawrynowicz
@ 2025-01-31 18:27 ` Jeffrey Hugo
0 siblings, 0 replies; 8+ messages in thread
From: Jeffrey Hugo @ 2025-01-31 18:27 UTC (permalink / raw)
To: Jacek Lawrynowicz, dri-devel; +Cc: oded.gabbay, maciej.falkowski, stable
On 1/29/2025 5:40 AM, Jacek Lawrynowicz wrote:
> pm_runtime_resume_and_get() sets dev->power.runtime_error that causes
> all subsequent pm_runtime_get_sync() calls to fail.
> Clear the runtime_error using pm_runtime_set_suspended(), so the driver
> doesn't have to be reloaded to recover when the NPU fails to boot during
> runtime resume.
>
> Fixes: 7d4b4c74432d ("accel/ivpu: Remove suspend_reschedule_counter")
> Cc: <stable@vger.kernel.org> # v6.11+
> Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
> ---
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3] accel/ivpu: Fix error handling in recovery/reset
2025-01-29 12:40 ` [PATCH 3/3] accel/ivpu: Fix error handling in recovery/reset Jacek Lawrynowicz
@ 2025-01-31 18:29 ` Jeffrey Hugo
0 siblings, 0 replies; 8+ messages in thread
From: Jeffrey Hugo @ 2025-01-31 18:29 UTC (permalink / raw)
To: Jacek Lawrynowicz, dri-devel; +Cc: oded.gabbay, maciej.falkowski, stable
On 1/29/2025 5:40 AM, Jacek Lawrynowicz wrote:
> Disable runtime PM for the duration of reset/recovery so it is possible
> to set the correct runtime PM state depending on the outcome of the
> `ivpu_resume()`. Don’t suspend or reset the HW if the NPU is suspended
> when the reset/recovery is requested. Also, move common reset/recovery
> code to separate functions for better code readability.
>
> Fixes: 27d19268cf39 ("accel/ivpu: Improve recovery and reset support")
> Cc: <stable@vger.kernel.org> # v6.8+
> Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] accel/ivpu: Fixes for 6.14
2025-01-29 12:40 [PATCH 0/3] accel/ivpu: Fixes for 6.14 Jacek Lawrynowicz
` (2 preceding siblings ...)
2025-01-29 12:40 ` [PATCH 3/3] accel/ivpu: Fix error handling in recovery/reset Jacek Lawrynowicz
@ 2025-02-03 9:37 ` Jacek Lawrynowicz
3 siblings, 0 replies; 8+ messages in thread
From: Jacek Lawrynowicz @ 2025-02-03 9:37 UTC (permalink / raw)
To: dri-devel; +Cc: oded.gabbay, quic_jhugo, maciej.falkowski
Applied to drm-misc-fixes
On 1/29/2025 1:40 PM, Jacek Lawrynowicz wrote:
> Stability fixes focused around power management and error handling.
>
> Jacek Lawrynowicz (3):
> accel/ivpu: Fix error handling in ivpu_boot()
> accel/ivpu: Clear runtime_error after pm_runtime_resume_and_get()
> fails
> accel/ivpu: Fix error handling in recovery/reset
>
> drivers/accel/ivpu/ivpu_drv.c | 8 +++-
> drivers/accel/ivpu/ivpu_pm.c | 84 ++++++++++++++++++++---------------
> 2 files changed, 53 insertions(+), 39 deletions(-)
>
> --
> 2.45.1
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-02-03 9:38 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-29 12:40 [PATCH 0/3] accel/ivpu: Fixes for 6.14 Jacek Lawrynowicz
2025-01-29 12:40 ` [PATCH 1/3] accel/ivpu: Fix error handling in ivpu_boot() Jacek Lawrynowicz
2025-01-31 18:26 ` Jeffrey Hugo
2025-01-29 12:40 ` [PATCH 2/3] accel/ivpu: Clear runtime_error after pm_runtime_resume_and_get() fails Jacek Lawrynowicz
2025-01-31 18:27 ` Jeffrey Hugo
2025-01-29 12:40 ` [PATCH 3/3] accel/ivpu: Fix error handling in recovery/reset Jacek Lawrynowicz
2025-01-31 18:29 ` Jeffrey Hugo
2025-02-03 9:37 ` [PATCH 0/3] accel/ivpu: Fixes for 6.14 Jacek Lawrynowicz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.