All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] drm/rocket: Stop submitting hardware work from the IRQ handler
@ 2026-06-05 16:06 Maíra Canal
  2026-06-05 16:06 ` [PATCH 1/3] drm/rocket: Remove unused reset worker Maíra Canal
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Maíra Canal @ 2026-06-05 16:06 UTC (permalink / raw)
  To: Tomeu Vizoso, Oded Gabbay, Christian König,
	Christian König, Rob Herring, Matthew Brost,
	Danilo Krummrich, Philipp Stanner, Sumit Semwal
  Cc: kernel-dev, dri-devel, Maíra Canal

After Rob mentioned to me that Rocket could have a similar redundant
job_lock just like Ethos [1], I decided to take a look at the driver to
see if we could remove this lock. However, as I was reading the code, I
identified that, more than the job_lock, the issue is a bit different. The
job submission procedure in Rocket breaks the DRM scheduler's design in a
fundamental way.

Currently, a job spawns further hardware work from outside the scheduler.
The function rocket_job_run() submits only the first task of an inference;
every subsequent task is submitted by the threaded IRQ handler, which calls
rocket_job_hw_submit() directly.

The scheduler expects all of a job's hardware submission to happen in
run_job(). Submitting jobs from the IRQ handler instead is completely
invisible to the scheduler, which can cause some issues, like:
drm_sched_stop() only synchronizes the scheduler's workqueue, not the IRQ,
so the reset path races these IRQ-driven submissions. This creates the need
of a job_lock mutex and the reset.pending flag, which exist only as a
workaround to that self-inflicted race.

Considering the current status of the driver, solving this issue is quite
simple: don't consider the whole submission as a DRM sched job, instead
consider a task a DRM sched job. With that, the driver can comply to the
DRM scheduler expectations and get rid of some locks, flags and indexes.

Having said that, this is only "compile-tested", I don't have this
hardware. I was just driven by Rob's comment to take a look at Rocket's
code and the design looked unusual to what I would expect from a DRM
scheduler-based driver. I'm also CCing some scheduler maintainers to check
if they agree that the IRQ handler shouldn't spawn further HW work.

Apart from that, this series also has some clean-up patches.

[1] https://lore.kernel.org/dri-devel/20260516144623.2582427-2-mcanal@igalia.com/T/

Best regards,
- Maíra

---
Maíra Canal (3):
      drm/rocket: Remove unused reset worker
      drm/rocket: Submit one drm_sched_job per task
      drm/rocket: Drop the dedicated reset workqueue

 drivers/accel/rocket/rocket_core.h |  10 +-
 drivers/accel/rocket/rocket_job.c  | 282 ++++++++++++++++++-------------------
 drivers/accel/rocket/rocket_job.h  |  26 +++-
 3 files changed, 159 insertions(+), 159 deletions(-)
---
base-commit: 640c57d6ca1346a1c2363a3f473b405af979e046
change-id: 20260605-rocket-per-task-jobs-b797f7e2b1e9


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-05 18:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05 16:06 [PATCH 0/3] drm/rocket: Stop submitting hardware work from the IRQ handler Maíra Canal
2026-06-05 16:06 ` [PATCH 1/3] drm/rocket: Remove unused reset worker Maíra Canal
2026-06-05 16:25   ` sashiko-bot
2026-06-05 16:06 ` [PATCH 2/3] drm/rocket: Submit one drm_sched_job per task Maíra Canal
2026-06-05 16:20   ` sashiko-bot
2026-06-05 16:07 ` [PATCH 3/3] drm/rocket: Drop the dedicated reset workqueue Maíra Canal
2026-06-05 18:20   ` sashiko-bot
2026-06-05 16:40 ` [RFC PATCH 0/3] drm/rocket: Stop submitting hardware work from the IRQ handler Maíra Canal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.