[PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error
@ 2024-10-11 22:56 Adrián Larumbe
  2024-10-11 22:57 ` [PATCH 2/3] drm/panthor: Retry OPP transition to suspension state a few times Adrián Larumbe
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Adrián Larumbe @ 2024-10-11 22:56 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: kernel, Adrián Larumbe, dri-devel, linux-kernel

In case an OPP transition to a suspension state fails during the runtime
PM suspend call, if the driver's subsystems were successfully resumed,
we should return -EAGAIN so that the device's runtime PM status remains
'active'.

If FW reload failed, then we should fall through, so that the PM core
can flag the device as having suffered a runtime error.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
index 4082c8f2951d..cedd3cbcb47d 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -528,8 +528,13 @@ int panthor_device_suspend(struct device *dev)
 		    drm_dev_enter(&ptdev->base, &cookie)) {
 			panthor_gpu_resume(ptdev);
 			panthor_mmu_resume(ptdev);
-			drm_WARN_ON(&ptdev->base, panthor_fw_resume(ptdev));
-			panthor_sched_resume(ptdev);
+			ret = panthor_fw_resume(ptdev);
+			if (!ret) {
+				panthor_sched_resume(ptdev);
+				ret = -EAGAIN;
+			} else {
+				drm_err(&ptdev->base, "FW resume failed at runtime suspend: %d\n", ret);
+			}
 			drm_dev_exit(cookie);
 		}
 
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] drm/panthor: Retry OPP transition to suspension state a few times
  2024-10-11 22:56 [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Adrián Larumbe
@ 2024-10-11 22:57 ` Adrián Larumbe
  2024-10-14  7:28   ` Boris Brezillon
  2024-10-11 22:57 ` [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend Adrián Larumbe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Adrián Larumbe @ 2024-10-11 22:57 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: kernel, Adrián Larumbe, dri-devel, linux-kernel

When the device's runtime PM suspend callback is invoked, the switch to
a suspension OPP might sometimes fail. Although this is beyond the
control of the Panthor driver, we can attempt suspending it more than
once as a defensive strategy.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
index cedd3cbcb47d..5430557bd0b8 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -490,6 +490,7 @@ int panthor_device_resume(struct device *dev)
 int panthor_device_suspend(struct device *dev)
 {
 	struct panthor_device *ptdev = dev_get_drvdata(dev);
+	unsigned int susp_retries;
 	int ret, cookie;
 
 	if (atomic_read(&ptdev->pm.state) != PANTHOR_DEVICE_PM_STATE_ACTIVE)
@@ -522,7 +523,12 @@ int panthor_device_suspend(struct device *dev)
 		drm_dev_exit(cookie);
 	}
 
-	ret = panthor_devfreq_suspend(ptdev);
+	for (susp_retries = 0; susp_retries < 5; susp_retries++) {
+		ret = panthor_devfreq_suspend(ptdev);
+		if (!ret)
+			break;
+	}
+
 	if (ret) {
 		if (panthor_device_is_initialized(ptdev) &&
 		    drm_dev_enter(&ptdev->base, &cookie)) {
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend
  2024-10-11 22:56 [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Adrián Larumbe
  2024-10-11 22:57 ` [PATCH 2/3] drm/panthor: Retry OPP transition to suspension state a few times Adrián Larumbe
@ 2024-10-11 22:57 ` Adrián Larumbe
  2024-10-14  7:27   ` Boris Brezillon
  2024-10-16  9:14   ` kernel test robot
  2024-10-11 23:22 ` [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Liviu Dudau
  2024-10-14  7:12 ` Boris Brezillon
  3 siblings, 2 replies; 8+ messages in thread
From: Adrián Larumbe @ 2024-10-11 22:57 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: kernel, Adrián Larumbe, dri-devel, linux-kernel

On rk3588 SoCs, during a runtime PM suspend, the transition to the
lowest voltage/frequency pair might sometimes fail for reasons not yet
understood. In that case, even a slow FW reset will fail, leaving the
device's PM runtime status as unusuable.

When that happens, successive attempts to resume the device upon running
a job will always fail.

Fix it by forcing a synchronous device reset, which will lead to a
successful FW reload, and also reset the device's PM runtime error
status before resuming it.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.c | 10 ++++++++++
 drivers/gpu/drm/panthor/panthor_device.h |  2 ++
 drivers/gpu/drm/panthor/panthor_sched.c  |  7 +++++++
 3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
index 5430557bd0b8..ec6fed5e996b 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -105,6 +105,16 @@ static void panthor_device_reset_cleanup(struct drm_device *ddev, void *data)
 	destroy_workqueue(ptdev->reset.wq);
 }
 
+int panthor_device_reset_sync(struct panthor_device *ptdev)
+{
+	panthor_fw_pre_reset(ptdev, false);
+	panthor_mmu_pre_reset(ptdev);
+	panthor_gpu_soft_reset(ptdev);
+	panthor_gpu_l2_power_on(ptdev);
+	panthor_mmu_post_reset(ptdev);
+	return panthor_fw_post_reset(ptdev);
+}
+
 static void panthor_device_reset_work(struct work_struct *work)
 {
 	struct panthor_device *ptdev = container_of(work, struct panthor_device, reset.work);
diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 0e68f5a70d20..05a5a7233378 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -217,6 +217,8 @@ struct panthor_file {
 int panthor_device_init(struct panthor_device *ptdev);
 void panthor_device_unplug(struct panthor_device *ptdev);
 
+int panthor_device_reset_sync(struct panthor_device *ptdev);
+
 /**
  * panthor_device_schedule_reset() - Schedules a reset operation
  */
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index c7b350fc3eba..9a854c8c5718 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -3101,6 +3101,13 @@ queue_run_job(struct drm_sched_job *sched_job)
 		return dma_fence_get(job->done_fence);
 	}
 
+	if (ptdev->base.dev->power.runtime_error) {
+		ret = panthor_device_reset_sync(ptdev);
+		if (drm_WARN_ON(&ptdev->base, ret))
+			return ERR_PTR(ret);
+		drm_WARN_ON(&ptdev->base, pm_runtime_set_active(ptdev->base.dev));
+	}
+
 	ret = pm_runtime_resume_and_get(ptdev->base.dev);
 	if (drm_WARN_ON(&ptdev->base, ret))
 		return ERR_PTR(ret);
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error
  2024-10-11 22:56 [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Adrián Larumbe
  2024-10-11 22:57 ` [PATCH 2/3] drm/panthor: Retry OPP transition to suspension state a few times Adrián Larumbe
  2024-10-11 22:57 ` [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend Adrián Larumbe
@ 2024-10-11 23:22 ` Liviu Dudau
  2024-10-14  7:12 ` Boris Brezillon
  3 siblings, 0 replies; 8+ messages in thread
From: Liviu Dudau @ 2024-10-11 23:22 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Boris Brezillon, Steven Price, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, kernel, dri-devel,
	linux-kernel

On Fri, Oct 11, 2024 at 11:56:59PM +0100, Adrián Larumbe wrote:
> In case an OPP transition to a suspension state fails during the runtime
> PM suspend call, if the driver's subsystems were successfully resumed,
> we should return -EAGAIN so that the device's runtime PM status remains
> 'active'.
> 
> If FW reload failed, then we should fall through, so that the PM core
> can flag the device as having suffered a runtime error.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>

Acked-by: Liviu Dudau <liviu.dudau@arm.com> for this patch. For the other two
I would like first if we try to understand why the suspend does not happen
quick enough (or at all).

Best regards,
Liviu

> ---
>  drivers/gpu/drm/panthor/panthor_device.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> index 4082c8f2951d..cedd3cbcb47d 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -528,8 +528,13 @@ int panthor_device_suspend(struct device *dev)
>  		    drm_dev_enter(&ptdev->base, &cookie)) {
>  			panthor_gpu_resume(ptdev);
>  			panthor_mmu_resume(ptdev);
> -			drm_WARN_ON(&ptdev->base, panthor_fw_resume(ptdev));
> -			panthor_sched_resume(ptdev);
> +			ret = panthor_fw_resume(ptdev);
> +			if (!ret) {
> +				panthor_sched_resume(ptdev);
> +				ret = -EAGAIN;
> +			} else {
> +				drm_err(&ptdev->base, "FW resume failed at runtime suspend: %d\n", ret);
> +			}
>  			drm_dev_exit(cookie);
>  		}
>  
> -- 
> 2.46.2
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error
  2024-10-11 22:56 [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Adrián Larumbe
                   ` (2 preceding siblings ...)
  2024-10-11 23:22 ` [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Liviu Dudau
@ 2024-10-14  7:12 ` Boris Brezillon
  3 siblings, 0 replies; 8+ messages in thread
From: Boris Brezillon @ 2024-10-14  7:12 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, kernel, dri-devel,
	linux-kernel

On Fri, 11 Oct 2024 23:56:59 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> In case an OPP transition to a suspension state fails during the runtime
> PM suspend call, if the driver's subsystems were successfully resumed,
> we should return -EAGAIN so that the device's runtime PM status remains
> 'active'.
> 
> If FW reload failed, then we should fall through, so that the PM core
> can flag the device as having suffered a runtime error.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_device.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> index 4082c8f2951d..cedd3cbcb47d 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -528,8 +528,13 @@ int panthor_device_suspend(struct device *dev)
>  		    drm_dev_enter(&ptdev->base, &cookie)) {
>  			panthor_gpu_resume(ptdev);
>  			panthor_mmu_resume(ptdev);
> -			drm_WARN_ON(&ptdev->base, panthor_fw_resume(ptdev));
> -			panthor_sched_resume(ptdev);
> +			ret = panthor_fw_resume(ptdev);
> +			if (!ret) {
> +				panthor_sched_resume(ptdev);
> +				ret = -EAGAIN;
> +			} else {
> +				drm_err(&ptdev->base, "FW resume failed at runtime suspend: %d\n", ret);
> +			}

Hm, I'm not convinced resuming when devfreq_suspend() fails was the
right thing to do anyway. Can't we just assume the suspend succeeded in
that case, and force the devfreq OOP transition in the resume path, or
ignore it?

>  			drm_dev_exit(cookie);
>  		}
>  


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend
  2024-10-11 22:57 ` [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend Adrián Larumbe
@ 2024-10-14  7:27   ` Boris Brezillon
  2024-10-16  9:14   ` kernel test robot
  1 sibling, 0 replies; 8+ messages in thread
From: Boris Brezillon @ 2024-10-14  7:27 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, kernel, dri-devel,
	linux-kernel

On Fri, 11 Oct 2024 23:57:01 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> On rk3588 SoCs, during a runtime PM suspend, the transition to the
> lowest voltage/frequency pair might sometimes fail for reasons not yet
> understood. In that case, even a slow FW reset will fail, leaving the
> device's PM runtime status as unusuable.
> 
> When that happens, successive attempts to resume the device upon running
> a job will always fail.
> 
> Fix it by forcing a synchronous device reset, which will lead to a
> successful FW reload, and also reset the device's PM runtime error
> status before resuming it.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_device.c | 10 ++++++++++
>  drivers/gpu/drm/panthor/panthor_device.h |  2 ++
>  drivers/gpu/drm/panthor/panthor_sched.c  |  7 +++++++
>  3 files changed, 19 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> index 5430557bd0b8..ec6fed5e996b 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -105,6 +105,16 @@ static void panthor_device_reset_cleanup(struct drm_device *ddev, void *data)
>  	destroy_workqueue(ptdev->reset.wq);
>  }
>  
> +int panthor_device_reset_sync(struct panthor_device *ptdev)
> +{
> +	panthor_fw_pre_reset(ptdev, false);
> +	panthor_mmu_pre_reset(ptdev);
> +	panthor_gpu_soft_reset(ptdev);
> +	panthor_gpu_l2_power_on(ptdev);
> +	panthor_mmu_post_reset(ptdev);
> +	return panthor_fw_post_reset(ptdev);
> +}
> +
>  static void panthor_device_reset_work(struct work_struct *work)
>  {
>  	struct panthor_device *ptdev = container_of(work, struct panthor_device, reset.work);
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 0e68f5a70d20..05a5a7233378 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -217,6 +217,8 @@ struct panthor_file {
>  int panthor_device_init(struct panthor_device *ptdev);
>  void panthor_device_unplug(struct panthor_device *ptdev);
>  
> +int panthor_device_reset_sync(struct panthor_device *ptdev);
> +
>  /**
>   * panthor_device_schedule_reset() - Schedules a reset operation
>   */
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index c7b350fc3eba..9a854c8c5718 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -3101,6 +3101,13 @@ queue_run_job(struct drm_sched_job *sched_job)
>  		return dma_fence_get(job->done_fence);
>  	}
>  
> +	if (ptdev->base.dev->power.runtime_error) {
> +		ret = panthor_device_reset_sync(ptdev);
> +		if (drm_WARN_ON(&ptdev->base, ret))
> +			return ERR_PTR(ret);
> +		drm_WARN_ON(&ptdev->base, pm_runtime_set_active(ptdev->base.dev));
> +	}

I'd rather pretend the suspend/resume worked (even if it didn't) and
deal with the consequences (force a slow reset on the next resume), than
spread the 'if-PM-op-failed-force-sync-reset' thing everywhere we do a
pm_runtime_resume_and_get(). Also not sure how resetting the GPU will
help fixing the OPP transition failure.

> +
>  	ret = pm_runtime_resume_and_get(ptdev->base.dev);
>  	if (drm_WARN_ON(&ptdev->base, ret))
>  		return ERR_PTR(ret);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] drm/panthor: Retry OPP transition to suspension state a few times
  2024-10-11 22:57 ` [PATCH 2/3] drm/panthor: Retry OPP transition to suspension state a few times Adrián Larumbe
@ 2024-10-14  7:28   ` Boris Brezillon
  0 siblings, 0 replies; 8+ messages in thread
From: Boris Brezillon @ 2024-10-14  7:28 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, kernel, dri-devel,
	linux-kernel

On Fri, 11 Oct 2024 23:57:00 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> When the device's runtime PM suspend callback is invoked, the switch to
> a suspension OPP might sometimes fail. Although this is beyond the
> control of the Panthor driver, we can attempt suspending it more than
> once as a defensive strategy.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_device.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> index cedd3cbcb47d..5430557bd0b8 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -490,6 +490,7 @@ int panthor_device_resume(struct device *dev)
>  int panthor_device_suspend(struct device *dev)
>  {
>  	struct panthor_device *ptdev = dev_get_drvdata(dev);
> +	unsigned int susp_retries;
>  	int ret, cookie;
>  
>  	if (atomic_read(&ptdev->pm.state) != PANTHOR_DEVICE_PM_STATE_ACTIVE)
> @@ -522,7 +523,12 @@ int panthor_device_suspend(struct device *dev)
>  		drm_dev_exit(cookie);
>  	}
>  
> -	ret = panthor_devfreq_suspend(ptdev);
> +	for (susp_retries = 0; susp_retries < 5; susp_retries++) {
> +		ret = panthor_devfreq_suspend(ptdev);
> +		if (!ret)
> +			break;
> +	}

This retry logic should probably be moved to panthor_devfreq_suspend(),
but as Liviu said, I think we need to better understand why it takes
several attempts for an OPP transition to succeed.

> +
>  	if (ret) {
>  		if (panthor_device_is_initialized(ptdev) &&
>  		    drm_dev_enter(&ptdev->base, &cookie)) {


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend
  2024-10-11 22:57 ` [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend Adrián Larumbe
  2024-10-14  7:27   ` Boris Brezillon
@ 2024-10-16  9:14   ` kernel test robot
  1 sibling, 0 replies; 8+ messages in thread
From: kernel test robot @ 2024-10-16  9:14 UTC (permalink / raw)
  To: Adrián Larumbe, Boris Brezillon, Steven Price, Liviu Dudau,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: llvm, oe-kbuild-all, kernel, Adrián Larumbe, dri-devel,
	linux-kernel

Hi Adrián,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm-misc/drm-misc-next]
[also build test ERROR on linus/master v6.12-rc3 next-20241016]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-Retry-OPP-transition-to-suspension-state-a-few-times/20241012-070112
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20241011225906.3789965-3-adrian.larumbe%40collabora.com
patch subject: [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend
config: i386-buildonly-randconfig-001-20241016 (https://download.01.org/0day-ci/archive/20241016/202410161634.8YjhTQM2-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241016/202410161634.8YjhTQM2-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410161634.8YjhTQM2-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/panthor/panthor_sched.c:3104:29: error: no member named 'runtime_error' in 'struct dev_pm_info'
    3104 |         if (ptdev->base.dev->power.runtime_error) {
         |             ~~~~~~~~~~~~~~~~~~~~~~ ^
   include/linux/compiler.h:55:47: note: expanded from macro 'if'
      55 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
         |                                               ^~~~
   include/linux/compiler.h:57:52: note: expanded from macro '__trace_if_var'
      57 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
         |                                                    ^~~~
>> drivers/gpu/drm/panthor/panthor_sched.c:3104:29: error: no member named 'runtime_error' in 'struct dev_pm_info'
    3104 |         if (ptdev->base.dev->power.runtime_error) {
         |             ~~~~~~~~~~~~~~~~~~~~~~ ^
   include/linux/compiler.h:55:47: note: expanded from macro 'if'
      55 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
         |                                               ^~~~
   include/linux/compiler.h:57:61: note: expanded from macro '__trace_if_var'
      57 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
         |                                                             ^~~~
>> drivers/gpu/drm/panthor/panthor_sched.c:3104:29: error: no member named 'runtime_error' in 'struct dev_pm_info'
    3104 |         if (ptdev->base.dev->power.runtime_error) {
         |             ~~~~~~~~~~~~~~~~~~~~~~ ^
   include/linux/compiler.h:55:47: note: expanded from macro 'if'
      55 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
         |                                               ^~~~
   include/linux/compiler.h:57:86: note: expanded from macro '__trace_if_var'
      57 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
         |                                                                                      ^~~~
   include/linux/compiler.h:68:3: note: expanded from macro '__trace_if_value'
      68 |         (cond) ?                                        \
         |          ^~~~
   3 errors generated.


vim +3104 drivers/gpu/drm/panthor/panthor_sched.c

  3081	
  3082	static struct dma_fence *
  3083	queue_run_job(struct drm_sched_job *sched_job)
  3084	{
  3085		struct panthor_job *job = container_of(sched_job, struct panthor_job, base);
  3086		struct panthor_group *group = job->group;
  3087		struct panthor_queue *queue = group->queues[job->queue_idx];
  3088		struct panthor_device *ptdev = group->ptdev;
  3089		struct panthor_scheduler *sched = ptdev->scheduler;
  3090		struct panthor_job_ringbuf_instrs instrs;
  3091		struct panthor_job_cs_params cs_params;
  3092		struct dma_fence *done_fence;
  3093		int ret;
  3094	
  3095		/* Stream size is zero, nothing to do except making sure all previously
  3096		 * submitted jobs are done before we signal the
  3097		 * drm_sched_job::s_fence::finished fence.
  3098		 */
  3099		if (!job->call_info.size) {
  3100			job->done_fence = dma_fence_get(queue->fence_ctx.last_fence);
  3101			return dma_fence_get(job->done_fence);
  3102		}
  3103	
> 3104		if (ptdev->base.dev->power.runtime_error) {
  3105			ret = panthor_device_reset_sync(ptdev);
  3106			if (drm_WARN_ON(&ptdev->base, ret))
  3107				return ERR_PTR(ret);
  3108			drm_WARN_ON(&ptdev->base, pm_runtime_set_active(ptdev->base.dev));
  3109		}
  3110	
  3111		ret = pm_runtime_resume_and_get(ptdev->base.dev);
  3112		if (drm_WARN_ON(&ptdev->base, ret))
  3113			return ERR_PTR(ret);
  3114	
  3115		mutex_lock(&sched->lock);
  3116		if (!group_can_run(group)) {
  3117			done_fence = ERR_PTR(-ECANCELED);
  3118			goto out_unlock;
  3119		}
  3120	
  3121		dma_fence_init(job->done_fence,
  3122			       &panthor_queue_fence_ops,
  3123			       &queue->fence_ctx.lock,
  3124			       queue->fence_ctx.id,
  3125			       atomic64_inc_return(&queue->fence_ctx.seqno));
  3126	
  3127		job->profiling.slot = queue->profiling.seqno++;
  3128		if (queue->profiling.seqno == queue->profiling.slot_count)
  3129			queue->profiling.seqno = 0;
  3130	
  3131		job->ringbuf.start = queue->iface.input->insert;
  3132	
  3133		get_job_cs_params(job, &cs_params);
  3134		prepare_job_instrs(&cs_params, &instrs);
  3135		copy_instrs_to_ringbuf(queue, job, &instrs);
  3136	
  3137		job->ringbuf.end = job->ringbuf.start + (instrs.count * sizeof(u64));
  3138	
  3139		panthor_job_get(&job->base);
  3140		spin_lock(&queue->fence_ctx.lock);
  3141		list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs);
  3142		spin_unlock(&queue->fence_ctx.lock);
  3143	
  3144		/* Make sure the ring buffer is updated before the INSERT
  3145		 * register.
  3146		 */
  3147		wmb();
  3148	
  3149		queue->iface.input->extract = queue->iface.output->extract;
  3150		queue->iface.input->insert = job->ringbuf.end;
  3151	
  3152		if (group->csg_id < 0) {
  3153			/* If the queue is blocked, we want to keep the timeout running, so we
  3154			 * can detect unbounded waits and kill the group when that happens.
  3155			 * Otherwise, we suspend the timeout so the time we spend waiting for
  3156			 * a CSG slot is not counted.
  3157			 */
  3158			if (!(group->blocked_queues & BIT(job->queue_idx)) &&
  3159			    !queue->timeout_suspended) {
  3160				queue->remaining_time = drm_sched_suspend_timeout(&queue->scheduler);
  3161				queue->timeout_suspended = true;
  3162			}
  3163	
  3164			group_schedule_locked(group, BIT(job->queue_idx));
  3165		} else {
  3166			gpu_write(ptdev, CSF_DOORBELL(queue->doorbell_id), 1);
  3167			if (!sched->pm.has_ref &&
  3168			    !(group->blocked_queues & BIT(job->queue_idx))) {
  3169				pm_runtime_get(ptdev->base.dev);
  3170				sched->pm.has_ref = true;
  3171			}
  3172			panthor_devfreq_record_busy(sched->ptdev);
  3173		}
  3174	
  3175		/* Update the last fence. */
  3176		dma_fence_put(queue->fence_ctx.last_fence);
  3177		queue->fence_ctx.last_fence = dma_fence_get(job->done_fence);
  3178	
  3179		done_fence = dma_fence_get(job->done_fence);
  3180	
  3181	out_unlock:
  3182		mutex_unlock(&sched->lock);
  3183		pm_runtime_mark_last_busy(ptdev->base.dev);
  3184		pm_runtime_put_autosuspend(ptdev->base.dev);
  3185	
  3186		return done_fence;
  3187	}
  3188	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-10-16  9:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-11 22:56 [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Adrián Larumbe
2024-10-11 22:57 ` [PATCH 2/3] drm/panthor: Retry OPP transition to suspension state a few times Adrián Larumbe
2024-10-14  7:28   ` Boris Brezillon
2024-10-11 22:57 ` [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend Adrián Larumbe
2024-10-14  7:27   ` Boris Brezillon
2024-10-16  9:14   ` kernel test robot
2024-10-11 23:22 ` [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Liviu Dudau
2024-10-14  7:12 ` Boris Brezillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox