linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/panthor: Reset queue slots if termination fails
@ 2025-05-15 10:33 Ashley Smith
  2025-05-15 10:42 ` Boris Brezillon
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Ashley Smith @ 2025-05-15 10:33 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: kernel, Ashley Smith, dri-devel, linux-kernel

This fixes a bug where if we timeout after a suspend and the termination
fails, due to waiting on a fence that will never be signalled for
example, we do not resume the group correctly. The fix forces a reset
for groups that are not terminated correctly.

Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_sched.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 43ee57728de5..1f4a5a103975 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -2727,8 +2727,17 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
 			 * automatically terminate all active groups, so let's
 			 * force the state to halted here.
 			 */
-			if (csg_slot->group->state != PANTHOR_CS_GROUP_TERMINATED)
+			if (csg_slot->group->state != PANTHOR_CS_GROUP_TERMINATED) {
 				csg_slot->group->state = PANTHOR_CS_GROUP_TERMINATED;
+
+				/* Reset the queue slots manually if the termination
+				 * request failed.
+				 */
+				for (i = 0; i queue_count; i++) {
+					if (group->queues[i])
+						cs_slot_reset_locked(ptdev, csg_id, i);
+				}
+			}
 			slot_mask &= ~BIT(csg_id);
 		}
 	}

base-commit: 9934ab18051118385c7ea44d8e14175edbe6dc9c
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/panthor: Reset queue slots if termination fails
  2025-05-15 10:33 [PATCH] drm/panthor: Reset queue slots if termination fails Ashley Smith
@ 2025-05-15 10:42 ` Boris Brezillon
  2025-05-15 15:38 ` Steven Price
  2025-05-16  6:34 ` kernel test robot
  2 siblings, 0 replies; 4+ messages in thread
From: Boris Brezillon @ 2025-05-15 10:42 UTC (permalink / raw)
  To: Ashley Smith
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, kernel, dri-devel,
	linux-kernel

On Thu, 15 May 2025 11:33:05 +0100
Ashley Smith <ashley.smith@collabora.com> wrote:

> This fixes a bug where if we timeout after a suspend and the termination
> fails, due to waiting on a fence that will never be signalled for
> example, we do not resume the group correctly. The fix forces a reset
> for groups that are not terminated correctly.
> 
> Signed-off-by: Ashley Smith <ashley.smith@collabora.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/gpu/drm/panthor/panthor_sched.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index 43ee57728de5..1f4a5a103975 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -2727,8 +2727,17 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
>  			 * automatically terminate all active groups, so let's
>  			 * force the state to halted here.
>  			 */
> -			if (csg_slot->group->state != PANTHOR_CS_GROUP_TERMINATED)
> +			if (csg_slot->group->state != PANTHOR_CS_GROUP_TERMINATED) {
>  				csg_slot->group->state = PANTHOR_CS_GROUP_TERMINATED;
> +
> +				/* Reset the queue slots manually if the termination
> +				 * request failed.
> +				 */
> +				for (i = 0; i queue_count; i++) {
> +					if (group->queues[i])
> +						cs_slot_reset_locked(ptdev, csg_id, i);
> +				}
> +			}
>  			slot_mask &= ~BIT(csg_id);
>  		}
>  	}
> 
> base-commit: 9934ab18051118385c7ea44d8e14175edbe6dc9c


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/panthor: Reset queue slots if termination fails
  2025-05-15 10:33 [PATCH] drm/panthor: Reset queue slots if termination fails Ashley Smith
  2025-05-15 10:42 ` Boris Brezillon
@ 2025-05-15 15:38 ` Steven Price
  2025-05-16  6:34 ` kernel test robot
  2 siblings, 0 replies; 4+ messages in thread
From: Steven Price @ 2025-05-15 15:38 UTC (permalink / raw)
  To: Ashley Smith, Boris Brezillon, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: kernel, dri-devel, linux-kernel

On 15/05/2025 11:33, Ashley Smith wrote:
> This fixes a bug where if we timeout after a suspend and the termination
> fails, due to waiting on a fence that will never be signalled for
> example, we do not resume the group correctly. The fix forces a reset
> for groups that are not terminated correctly.
> 
> Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_sched.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index 43ee57728de5..1f4a5a103975 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -2727,8 +2727,17 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
>  			 * automatically terminate all active groups, so let's
>  			 * force the state to halted here.
>  			 */
> -			if (csg_slot->group->state != PANTHOR_CS_GROUP_TERMINATED)
> +			if (csg_slot->group->state != PANTHOR_CS_GROUP_TERMINATED) {
>  				csg_slot->group->state = PANTHOR_CS_GROUP_TERMINATED;
> +
> +				/* Reset the queue slots manually if the termination
> +				 * request failed.
> +				 */
> +				for (i = 0; i queue_count; i++) {

Missing "<"?

Steve

> +					if (group->queues[i])
> +						cs_slot_reset_locked(ptdev, csg_id, i);
> +				}
> +			}
>  			slot_mask &= ~BIT(csg_id);
>  		}
>  	}
> 
> base-commit: 9934ab18051118385c7ea44d8e14175edbe6dc9c


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/panthor: Reset queue slots if termination fails
  2025-05-15 10:33 [PATCH] drm/panthor: Reset queue slots if termination fails Ashley Smith
  2025-05-15 10:42 ` Boris Brezillon
  2025-05-15 15:38 ` Steven Price
@ 2025-05-16  6:34 ` kernel test robot
  2 siblings, 0 replies; 4+ messages in thread
From: kernel test robot @ 2025-05-16  6:34 UTC (permalink / raw)
  To: Ashley Smith, Boris Brezillon, Steven Price, Liviu Dudau,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: oe-kbuild-all, kernel, Ashley Smith, dri-devel, linux-kernel

Hi Ashley,

kernel test robot noticed the following build errors:

[auto build test ERROR on 9934ab18051118385c7ea44d8e14175edbe6dc9c]

url:    https://github.com/intel-lab-lkp/linux/commits/Ashley-Smith/drm-panthor-Reset-queue-slots-if-termination-fails/20250515-183502
base:   9934ab18051118385c7ea44d8e14175edbe6dc9c
patch link:    https://lore.kernel.org/r/20250515103314.1682471-1-ashley.smith%40collabora.com
patch subject: [PATCH] drm/panthor: Reset queue slots if termination fails
config: sparc-randconfig-002-20250516 (https://download.01.org/0day-ci/archive/20250516/202505161417.tAUp1jmc-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250516/202505161417.tAUp1jmc-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505161417.tAUp1jmc-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/gpu/drm/panthor/panthor_sched.c: In function 'panthor_sched_suspend':
>> drivers/gpu/drm/panthor/panthor_sched.c:2736:18: error: expected ';' before 'queue_count'
        for (i = 0; i queue_count; i++) {
                     ^~~~~~~~~~~~
                     ;


vim +2736 drivers/gpu/drm/panthor/panthor_sched.c

  2666	
  2667	void panthor_sched_suspend(struct panthor_device *ptdev)
  2668	{
  2669		struct panthor_scheduler *sched = ptdev->scheduler;
  2670		struct panthor_csg_slots_upd_ctx upd_ctx;
  2671		struct panthor_group *group;
  2672		u32 suspended_slots;
  2673		u32 i;
  2674	
  2675		mutex_lock(&sched->lock);
  2676		csgs_upd_ctx_init(&upd_ctx);
  2677		for (i = 0; i < sched->csg_slot_count; i++) {
  2678			struct panthor_csg_slot *csg_slot = &sched->csg_slots[i];
  2679	
  2680			if (csg_slot->group) {
  2681				csgs_upd_ctx_queue_reqs(ptdev, &upd_ctx, i,
  2682							group_can_run(csg_slot->group) ?
  2683							CSG_STATE_SUSPEND : CSG_STATE_TERMINATE,
  2684							CSG_STATE_MASK);
  2685			}
  2686		}
  2687	
  2688		suspended_slots = upd_ctx.update_mask;
  2689	
  2690		csgs_upd_ctx_apply_locked(ptdev, &upd_ctx);
  2691		suspended_slots &= ~upd_ctx.timedout_mask;
  2692	
  2693		if (upd_ctx.timedout_mask) {
  2694			u32 slot_mask = upd_ctx.timedout_mask;
  2695	
  2696			drm_err(&ptdev->base, "CSG suspend failed, escalating to termination");
  2697			csgs_upd_ctx_init(&upd_ctx);
  2698			while (slot_mask) {
  2699				u32 csg_id = ffs(slot_mask) - 1;
  2700				struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
  2701	
  2702				/* If the group was still usable before that point, we consider
  2703				 * it innocent.
  2704				 */
  2705				if (group_can_run(csg_slot->group))
  2706					csg_slot->group->innocent = true;
  2707	
  2708				/* We consider group suspension failures as fatal and flag the
  2709				 * group as unusable by setting timedout=true.
  2710				 */
  2711				csg_slot->group->timedout = true;
  2712	
  2713				csgs_upd_ctx_queue_reqs(ptdev, &upd_ctx, csg_id,
  2714							CSG_STATE_TERMINATE,
  2715							CSG_STATE_MASK);
  2716				slot_mask &= ~BIT(csg_id);
  2717			}
  2718	
  2719			csgs_upd_ctx_apply_locked(ptdev, &upd_ctx);
  2720	
  2721			slot_mask = upd_ctx.timedout_mask;
  2722			while (slot_mask) {
  2723				u32 csg_id = ffs(slot_mask) - 1;
  2724				struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
  2725	
  2726				/* Terminate command timedout, but the soft-reset will
  2727				 * automatically terminate all active groups, so let's
  2728				 * force the state to halted here.
  2729				 */
  2730				if (csg_slot->group->state != PANTHOR_CS_GROUP_TERMINATED) {
  2731					csg_slot->group->state = PANTHOR_CS_GROUP_TERMINATED;
  2732	
  2733					/* Reset the queue slots manually if the termination
  2734					 * request failed.
  2735					 */
> 2736					for (i = 0; i queue_count; i++) {
  2737						if (group->queues[i])
  2738							cs_slot_reset_locked(ptdev, csg_id, i);
  2739					}
  2740				}
  2741				slot_mask &= ~BIT(csg_id);
  2742			}
  2743		}
  2744	
  2745		/* Flush L2 and LSC caches to make sure suspend state is up-to-date.
  2746		 * If the flush fails, flag all queues for termination.
  2747		 */
  2748		if (suspended_slots) {
  2749			bool flush_caches_failed = false;
  2750			u32 slot_mask = suspended_slots;
  2751	
  2752			if (panthor_gpu_flush_caches(ptdev, CACHE_CLEAN, CACHE_CLEAN, 0))
  2753				flush_caches_failed = true;
  2754	
  2755			while (slot_mask) {
  2756				u32 csg_id = ffs(slot_mask) - 1;
  2757				struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
  2758	
  2759				if (flush_caches_failed)
  2760					csg_slot->group->state = PANTHOR_CS_GROUP_TERMINATED;
  2761				else
  2762					csg_slot_sync_update_locked(ptdev, csg_id);
  2763	
  2764				slot_mask &= ~BIT(csg_id);
  2765			}
  2766		}
  2767	
  2768		for (i = 0; i < sched->csg_slot_count; i++) {
  2769			struct panthor_csg_slot *csg_slot = &sched->csg_slots[i];
  2770	
  2771			group = csg_slot->group;
  2772			if (!group)
  2773				continue;
  2774	
  2775			group_get(group);
  2776	
  2777			if (group->csg_id >= 0)
  2778				sched_process_csg_irq_locked(ptdev, group->csg_id);
  2779	
  2780			group_unbind_locked(group);
  2781	
  2782			drm_WARN_ON(&group->ptdev->base, !list_empty(&group->run_node));
  2783	
  2784			if (group_can_run(group)) {
  2785				list_add(&group->run_node,
  2786					 &sched->groups.idle[group->priority]);
  2787			} else {
  2788				/* We don't bother stopping the scheduler if the group is
  2789				 * faulty, the group termination work will finish the job.
  2790				 */
  2791				list_del_init(&group->wait_node);
  2792				group_queue_work(group, term);
  2793			}
  2794			group_put(group);
  2795		}
  2796		mutex_unlock(&sched->lock);
  2797	}
  2798	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-16  6:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 10:33 [PATCH] drm/panthor: Reset queue slots if termination fails Ashley Smith
2025-05-15 10:42 ` Boris Brezillon
2025-05-15 15:38 ` Steven Price
2025-05-16  6:34 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).