* [PATCH] drm/panthor: always set fence errors on CS_FAULT
@ 2025-06-18 14:55 Chia-I Wu
2025-06-23 6:32 ` Boris Brezillon
0 siblings, 1 reply; 4+ messages in thread
From: Chia-I Wu @ 2025-06-18 14:55 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
It is unclear why fence errors were set only for CS_INHERIT_FAULT.
Downstream driver also does not treat CS_INHERIT_FAULT specially.
Remove the check.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
drivers/gpu/drm/panthor/panthor_sched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index a2248f692a030..1a3b1c49f7d7b 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1399,7 +1399,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
fault = cs_iface->output->fault;
info = cs_iface->output->fault_info;
- if (queue && CS_EXCEPTION_TYPE(fault) == DRM_PANTHOR_EXCEPTION_CS_INHERIT_FAULT) {
+ if (queue) {
u64 cs_extract = queue->iface.output->extract;
struct panthor_job *job;
--
2.50.0.rc2.696.g1fc2a0284f-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/panthor: always set fence errors on CS_FAULT
2025-06-18 14:55 [PATCH] drm/panthor: always set fence errors on CS_FAULT Chia-I Wu
@ 2025-06-23 6:32 ` Boris Brezillon
2025-07-08 21:40 ` Chia-I Wu
0 siblings, 1 reply; 4+ messages in thread
From: Boris Brezillon @ 2025-06-23 6:32 UTC (permalink / raw)
To: Chia-I Wu
Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
linux-kernel
On Wed, 18 Jun 2025 07:55:49 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:
> It is unclear why fence errors were set only for CS_INHERIT_FAULT.
> Downstream driver also does not treat CS_INHERIT_FAULT specially.
> Remove the check.
>
> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> ---
> drivers/gpu/drm/panthor/panthor_sched.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index a2248f692a030..1a3b1c49f7d7b 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -1399,7 +1399,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
> fault = cs_iface->output->fault;
> info = cs_iface->output->fault_info;
>
> - if (queue && CS_EXCEPTION_TYPE(fault) == DRM_PANTHOR_EXCEPTION_CS_INHERIT_FAULT) {
> + if (queue) {
> u64 cs_extract = queue->iface.output->extract;
> struct panthor_job *job;
>
Now that I look at the code, I think we should record the error when
the ERROR_BARRIER is executed instead of flagging all in-flight jobs as
faulty. One option would be to re-use the profiling buffer by adding an
error field to panthor_job_profiling_data, but we're going to lose 4
bytes per slot because of the 64-bit alignment we want for timestamps,
so maybe just create a separate buffers with N entries of:
struct panthor_job_status {
u32 error;
};
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/panthor: always set fence errors on CS_FAULT
2025-06-23 6:32 ` Boris Brezillon
@ 2025-07-08 21:40 ` Chia-I Wu
2025-08-20 8:43 ` Boris Brezillon
0 siblings, 1 reply; 4+ messages in thread
From: Chia-I Wu @ 2025-07-08 21:40 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
linux-kernel
On Sun, Jun 22, 2025 at 11:32 PM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Wed, 18 Jun 2025 07:55:49 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
> > It is unclear why fence errors were set only for CS_INHERIT_FAULT.
> > Downstream driver also does not treat CS_INHERIT_FAULT specially.
> > Remove the check.
> >
> > Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> > ---
> > drivers/gpu/drm/panthor/panthor_sched.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> > index a2248f692a030..1a3b1c49f7d7b 100644
> > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > @@ -1399,7 +1399,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
> > fault = cs_iface->output->fault;
> > info = cs_iface->output->fault_info;
> >
> > - if (queue && CS_EXCEPTION_TYPE(fault) == DRM_PANTHOR_EXCEPTION_CS_INHERIT_FAULT) {
> > + if (queue) {
> > u64 cs_extract = queue->iface.output->extract;
> > struct panthor_job *job;
> >
>
> Now that I look at the code, I think we should record the error when
> the ERROR_BARRIER is executed instead of flagging all in-flight jobs as
> faulty. One option would be to re-use the profiling buffer by adding an
> error field to panthor_job_profiling_data, but we're going to lose 4
> bytes per slot because of the 64-bit alignment we want for timestamps,
> so maybe just create a separate buffers with N entries of:
>
> struct panthor_job_status {
> u32 error;
> };
The current error path uses cs_extract to mark exactly the offending
job faulty. Innocent in-flight jobs do not seem to be affected.
I looked into emitting LOAD/STORE after SYNC_ADD64 to copy the error
to panthor_job_status. Other than the extra instrs and storage,
because group_sync_upd_work can be called before LOAD/STORE, it will
need to check both panthor_job_status and panthor_syncobj_64b. That
will be a bit ugly as well.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/panthor: always set fence errors on CS_FAULT
2025-07-08 21:40 ` Chia-I Wu
@ 2025-08-20 8:43 ` Boris Brezillon
0 siblings, 0 replies; 4+ messages in thread
From: Boris Brezillon @ 2025-08-20 8:43 UTC (permalink / raw)
To: Chia-I Wu
Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
linux-kernel
On Tue, 8 Jul 2025 14:40:06 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:
> On Sun, Jun 22, 2025 at 11:32 PM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Wed, 18 Jun 2025 07:55:49 -0700
> > Chia-I Wu <olvaffe@gmail.com> wrote:
> >
> > > It is unclear why fence errors were set only for CS_INHERIT_FAULT.
> > > Downstream driver also does not treat CS_INHERIT_FAULT specially.
> > > Remove the check.
> > >
> > > Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> > > ---
> > > drivers/gpu/drm/panthor/panthor_sched.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> > > index a2248f692a030..1a3b1c49f7d7b 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > > @@ -1399,7 +1399,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
> > > fault = cs_iface->output->fault;
> > > info = cs_iface->output->fault_info;
> > >
> > > - if (queue && CS_EXCEPTION_TYPE(fault) == DRM_PANTHOR_EXCEPTION_CS_INHERIT_FAULT) {
> > > + if (queue) {
> > > u64 cs_extract = queue->iface.output->extract;
> > > struct panthor_job *job;
> > >
> >
> > Now that I look at the code, I think we should record the error when
> > the ERROR_BARRIER is executed instead of flagging all in-flight jobs as
> > faulty. One option would be to re-use the profiling buffer by adding an
> > error field to panthor_job_profiling_data, but we're going to lose 4
> > bytes per slot because of the 64-bit alignment we want for timestamps,
> > so maybe just create a separate buffers with N entries of:
> >
> > struct panthor_job_status {
> > u32 error;
> > };
> The current error path uses cs_extract to mark exactly the offending
> job faulty. Innocent in-flight jobs do not seem to be affected.
My bad, I thought the faulty CS was automatically entering the recovery
substate (fetching all instructions and ignoring RUN_xxx ones), but it
turns out CS instruction fetching is stalled until the fault is
acknowledged, so we're good.
>
> I looked into emitting LOAD/STORE after SYNC_ADD64 to copy the error
> to panthor_job_status. Other than the extra instrs and storage,
> because group_sync_upd_work can be called before LOAD/STORE, it will
> need to check both panthor_job_status and panthor_syncobj_64b. That
> will be a bit ugly as well.
Nah, I think you're right, I just had a wrong recollection of how
recovery mode works. The patch is
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-08-20 8:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18 14:55 [PATCH] drm/panthor: always set fence errors on CS_FAULT Chia-I Wu
2025-06-23 6:32 ` Boris Brezillon
2025-07-08 21:40 ` Chia-I Wu
2025-08-20 8:43 ` Boris Brezillon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).