* [PATCH] accel/ivpu: Return correct job error status
@ 2025-10-07 8:35 Karol Wachowski
2025-10-07 14:54 ` Jeff Hugo
0 siblings, 1 reply; 3+ messages in thread
From: Karol Wachowski @ 2025-10-07 8:35 UTC (permalink / raw)
To: dri-devel
Cc: oded.gabbay, jeff.hugo, maciej.falkowski, lizhi.hou,
Andrzej Kacprowski, Karol Wachowski
From: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
The driver was returning ABORTED for all error that trigger engine reset.
Refactor ivpu_job_signal_and_destroy() by extracting engine error
handling logic into a new function ivpu_job_handle_engine_error().
This simplifies engine error handling logic by removing necessity of
calling ivpu_job_singal_and_destroy() multiple times by a single job
changing it's behavior based on job status.
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
---
drivers/accel/ivpu/ivpu_job.c | 49 ++++++++++++++++++++++++-----------
drivers/accel/ivpu/ivpu_job.h | 3 +++
2 files changed, 37 insertions(+), 15 deletions(-)
diff --git a/drivers/accel/ivpu/ivpu_job.c b/drivers/accel/ivpu/ivpu_job.c
index fd548028f1d6..ba4535a75aa7 100644
--- a/drivers/accel/ivpu/ivpu_job.c
+++ b/drivers/accel/ivpu/ivpu_job.c
@@ -564,25 +564,26 @@ static struct ivpu_job *ivpu_job_remove_from_submitted_jobs(struct ivpu_device *
return job;
}
-static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32 job_status)
+bool ivpu_job_handle_engine_error(struct ivpu_device *vdev, u32 job_id, u32 job_status)
{
- struct ivpu_job *job;
-
lockdep_assert_held(&vdev->submitted_jobs_lock);
- job = xa_load(&vdev->submitted_jobs_xa, job_id);
- if (!job)
- return -ENOENT;
-
switch (job_status) {
case VPU_JSM_STATUS_PROCESSING_ERR:
case VPU_JSM_STATUS_ENGINE_RESET_REQUIRED_MIN ... VPU_JSM_STATUS_ENGINE_RESET_REQUIRED_MAX:
{
+ struct ivpu_job *job = xa_load(&vdev->submitted_jobs_xa, job_id);
+
+ if (!job)
+ return false;
+
/* Trigger an engine reset */
guard(mutex)(&job->file_priv->lock);
+ job->job_status = job_status;
+
if (job->file_priv->has_mmu_faults)
- return 0;
+ return false;
/*
* Mark context as faulty and defer destruction of the job to jobs abort thread
@@ -591,26 +592,42 @@ static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32
*/
job->file_priv->has_mmu_faults = true;
queue_work(system_wq, &vdev->context_abort_work);
- return 0;
+ return true;
}
default:
/* Complete job with error status, engine reset not required */
break;
}
- job = ivpu_job_remove_from_submitted_jobs(vdev, job_id);
+ return false;
+}
+
+static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32 job_status)
+{
+ struct ivpu_job *job;
+
+ lockdep_assert_held(&vdev->submitted_jobs_lock);
+
+ job = xa_load(&vdev->submitted_jobs_xa, job_id);
if (!job)
return -ENOENT;
- if (job->file_priv->has_mmu_faults)
- job_status = DRM_IVPU_JOB_STATUS_ABORTED;
+ ivpu_job_remove_from_submitted_jobs(vdev, job_id);
+
+ if (job->job_status == VPU_JSM_STATUS_SUCCESS) {
+ if (job->file_priv->has_mmu_faults)
+ job->job_status = DRM_IVPU_JOB_STATUS_ABORTED;
+ else
+ job->job_status = job_status;
+ }
- job->bos[CMD_BUF_IDX]->job_status = job_status;
+ job->bos[CMD_BUF_IDX]->job_status = job->job_status;
dma_fence_signal(job->done_fence);
trace_job("done", job);
ivpu_dbg(vdev, JOB, "Job complete: id %3u ctx %2d cmdq_id %u engine %d status 0x%x\n",
- job->job_id, job->file_priv->ctx.id, job->cmdq_id, job->engine_idx, job_status);
+ job->job_id, job->file_priv->ctx.id, job->cmdq_id, job->engine_idx,
+ job->job_status);
ivpu_job_destroy(job);
ivpu_stop_job_timeout_detection(vdev);
@@ -1030,7 +1047,9 @@ ivpu_job_done_callback(struct ivpu_device *vdev, struct ivpu_ipc_hdr *ipc_hdr,
payload = (struct vpu_ipc_msg_payload_job_done *)&jsm_msg->payload;
mutex_lock(&vdev->submitted_jobs_lock);
- ivpu_job_signal_and_destroy(vdev, payload->job_id, payload->job_status);
+ if (!ivpu_job_handle_engine_error(vdev, payload->job_id, payload->job_status))
+ /* No engine error, complete the job normally */
+ ivpu_job_signal_and_destroy(vdev, payload->job_id, payload->job_status);
mutex_unlock(&vdev->submitted_jobs_lock);
}
diff --git a/drivers/accel/ivpu/ivpu_job.h b/drivers/accel/ivpu/ivpu_job.h
index d2fc4c151614..3ab61e6a5616 100644
--- a/drivers/accel/ivpu/ivpu_job.h
+++ b/drivers/accel/ivpu/ivpu_job.h
@@ -51,6 +51,7 @@ struct ivpu_cmdq {
* @cmdq_id: Command queue ID used for submission
* @job_id: Unique job ID for tracking and status reporting
* @engine_idx: Engine index for job execution
+ * @job_status: Status reported by firmware for this job
* @primary_preempt_buf: Primary preemption buffer for job
* @secondary_preempt_buf: Secondary preemption buffer for job (optional)
* @bo_count: Number of buffer objects associated with this job
@@ -64,6 +65,7 @@ struct ivpu_job {
u32 cmdq_id;
u32 job_id;
u32 engine_idx;
+ u32 job_status;
struct ivpu_bo *primary_preempt_buf;
struct ivpu_bo *secondary_preempt_buf;
size_t bo_count;
@@ -83,6 +85,7 @@ void ivpu_cmdq_abort_all_jobs(struct ivpu_device *vdev, u32 ctx_id, u32 cmdq_id)
void ivpu_job_done_consumer_init(struct ivpu_device *vdev);
void ivpu_job_done_consumer_fini(struct ivpu_device *vdev);
+bool ivpu_job_handle_engine_error(struct ivpu_device *vdev, u32 job_id, u32 job_status);
void ivpu_context_abort_work_fn(struct work_struct *work);
void ivpu_jobs_abort_all(struct ivpu_device *vdev);
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] accel/ivpu: Return correct job error status
2025-10-07 8:35 [PATCH] accel/ivpu: Return correct job error status Karol Wachowski
@ 2025-10-07 14:54 ` Jeff Hugo
2025-10-08 6:12 ` Karol Wachowski
0 siblings, 1 reply; 3+ messages in thread
From: Jeff Hugo @ 2025-10-07 14:54 UTC (permalink / raw)
To: Karol Wachowski, dri-devel
Cc: oded.gabbay, maciej.falkowski, lizhi.hou, Andrzej Kacprowski
On 10/7/2025 2:35 AM, Karol Wachowski wrote:
> From: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
>
> The driver was returning ABORTED for all error that trigger engine reset.
"is returning" right? The driver currently does this (without this patch)?
Is this a bad thing? Should the driver do something different? I feel
like there should be more explanation here.
> Refactor ivpu_job_signal_and_destroy() by extracting engine error
> handling logic into a new function ivpu_job_handle_engine_error().
> This simplifies engine error handling logic by removing necessity of
> calling ivpu_job_singal_and_destroy() multiple times by a single job
> changing it's behavior based on job status.
>
> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] accel/ivpu: Return correct job error status
2025-10-07 14:54 ` Jeff Hugo
@ 2025-10-08 6:12 ` Karol Wachowski
0 siblings, 0 replies; 3+ messages in thread
From: Karol Wachowski @ 2025-10-08 6:12 UTC (permalink / raw)
To: Jeff Hugo, dri-devel
Cc: oded.gabbay, maciej.falkowski, lizhi.hou, Andrzej Kacprowski
Thanks for pointing this out. Returning ABORTED was not generally a
problem, but it limited user space ability to distinguish between
different failure modes.
Changing this improves debugability and allows applications to take
actions based on separate return codes accordingly.
I have improved clarification of the change in commit message in PATCHv2.
Karol
On 10/7/2025 4:54 PM, Jeff Hugo wrote:
> On 10/7/2025 2:35 AM, Karol Wachowski wrote:
>> From: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
>>
>> The driver was returning ABORTED for all error that trigger engine
>> reset.
>
> "is returning" right? The driver currently does this (without this
> patch)?
>
> Is this a bad thing? Should the driver do something different? I feel
> like there should be more explanation here.
>
>> Refactor ivpu_job_signal_and_destroy() by extracting engine error
>> handling logic into a new function ivpu_job_handle_engine_error().
>> This simplifies engine error handling logic by removing necessity of
>> calling ivpu_job_singal_and_destroy() multiple times by a single job
>> changing it's behavior based on job status.
>>
>> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
>> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-10-08 6:12 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-07 8:35 [PATCH] accel/ivpu: Return correct job error status Karol Wachowski
2025-10-07 14:54 ` Jeff Hugo
2025-10-08 6:12 ` Karol Wachowski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).