* [PATCH] drm/etnaviv: Protect the scheduler's pending list with its lock
@ 2025-06-02 13:22 Maíra Canal
2025-06-02 15:33 ` Philipp Stanner
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Maíra Canal @ 2025-06-02 13:22 UTC (permalink / raw)
To: Lucas Stach, Philipp Zabel, Russell King, Christian Gmeiner,
Philipp Stanner
Cc: dri-devel, etnaviv, kernel-dev, Maíra Canal, stable
Commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still
active") ensured that active jobs are returned to the pending list when
extending the timeout. However, it didn't use the pending list's lock to
manipulate the list, which causes a race condition as the scheduler's
workqueues are running.
Hold the lock while manipulating the scheduler's pending list to prevent
a race.
Cc: stable@vger.kernel.org
Fixes: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
Hi,
I'm proposing this workaround patch to address the race-condition caused
by manipulating the pending list without using its lock. Although I
understand this isn't a complete solution (see [1]), it's not reasonable
to backport the new DRM stat series [2] to the stable branches.
Therefore, I believe the best solution is backporting this fix to the
stable branches, which will fix the race and will keep adding the job
back to the pending list (which will avoid most memory leaks).
[1] https://lore.kernel.org/dri-devel/bcc0ed477f8a6f3bb06665b1756bcb98fb7af871.camel@mailbox.org/
[2] https://lore.kernel.org/dri-devel/20250530-sched-skip-reset-v2-0-c40a8d2d8daa@igalia.com/
Best Regards,
- Maíra
---
drivers/gpu/drm/etnaviv/etnaviv_sched.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 76a3a3e517d8..71e2e6b9d713 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -35,6 +35,7 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job
*sched_job)
{
struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
+ struct drm_gpu_scheduler *sched = sched_job->sched;
struct etnaviv_gpu *gpu = submit->gpu;
u32 dma_addr, primid = 0;
int change;
@@ -89,7 +90,9 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job
return DRM_GPU_SCHED_STAT_NOMINAL;
out_no_timeout:
- list_add(&sched_job->list, &sched_job->sched->pending_list);
+ spin_lock(&sched->job_list_lock);
+ list_add(&sched_job->list, &sched->pending_list);
+ spin_unlock(&sched->job_list_lock);
return DRM_GPU_SCHED_STAT_NOMINAL;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/etnaviv: Protect the scheduler's pending list with its lock
2025-06-02 13:22 [PATCH] drm/etnaviv: Protect the scheduler's pending list with its lock Maíra Canal
@ 2025-06-02 15:33 ` Philipp Stanner
2025-06-16 14:06 ` Lucas Stach
2025-06-16 23:41 ` Maíra Canal
2 siblings, 0 replies; 4+ messages in thread
From: Philipp Stanner @ 2025-06-02 15:33 UTC (permalink / raw)
To: Maíra Canal, Lucas Stach, Philipp Zabel, Russell King,
Christian Gmeiner, Philipp Stanner
Cc: dri-devel, etnaviv, kernel-dev, stable
On Mon, 2025-06-02 at 10:22 -0300, Maíra Canal wrote:
> Commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is
> still
> active") ensured that active jobs are returned to the pending list
> when
> extending the timeout. However, it didn't use the pending list's lock
> to
> manipulate the list, which causes a race condition as the scheduler's
> workqueues are running.
>
> Hold the lock while manipulating the scheduler's pending list to
> prevent
> a race.
>
> Cc: stable@vger.kernel.org
> Fixes: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is
> still active")
Could also contain a "Closes: " with the link to the appropriate
message from thread [1] from below.
You might also include "Reported-by: Philipp" since I technically first
described the problem. But no hard feelings on that
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
> ---
> Hi,
>
> I'm proposing this workaround patch to address the race-condition
> caused
> by manipulating the pending list without using its lock. Although I
> understand this isn't a complete solution (see [1]), it's not
> reasonable
> to backport the new DRM stat series [2] to the stable branches.
>
> Therefore, I believe the best solution is backporting this fix to the
> stable branches, which will fix the race and will keep adding the job
> back to the pending list (which will avoid most memory leaks).
>
> [1]
> https://lore.kernel.org/dri-devel/bcc0ed477f8a6f3bb06665b1756bcb98fb7af871.camel@mailbox.org/
> [2]
> https://lore.kernel.org/dri-devel/20250530-sched-skip-reset-v2-0-c40a8d2d8daa@igalia.com/
>
> Best Regards,
> - Maíra
> ---
> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 76a3a3e517d8..71e2e6b9d713 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -35,6 +35,7 @@ static enum drm_gpu_sched_stat
> etnaviv_sched_timedout_job(struct drm_sched_job
>
> *sched_job)
> {
> struct etnaviv_gem_submit *submit =
> to_etnaviv_submit(sched_job);
> + struct drm_gpu_scheduler *sched = sched_job->sched;
> struct etnaviv_gpu *gpu = submit->gpu;
> u32 dma_addr, primid = 0;
> int change;
> @@ -89,7 +90,9 @@ static enum drm_gpu_sched_stat
> etnaviv_sched_timedout_job(struct drm_sched_job
> return DRM_GPU_SCHED_STAT_NOMINAL;
>
> out_no_timeout:
> - list_add(&sched_job->list, &sched_job->sched->pending_list);
> + spin_lock(&sched->job_list_lock);
> + list_add(&sched_job->list, &sched->pending_list);
> + spin_unlock(&sched->job_list_lock);
> return DRM_GPU_SCHED_STAT_NOMINAL;
> }
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/etnaviv: Protect the scheduler's pending list with its lock
2025-06-02 13:22 [PATCH] drm/etnaviv: Protect the scheduler's pending list with its lock Maíra Canal
2025-06-02 15:33 ` Philipp Stanner
@ 2025-06-16 14:06 ` Lucas Stach
2025-06-16 23:41 ` Maíra Canal
2 siblings, 0 replies; 4+ messages in thread
From: Lucas Stach @ 2025-06-16 14:06 UTC (permalink / raw)
To: Maíra Canal, Philipp Zabel, Russell King, Christian Gmeiner,
Philipp Stanner
Cc: dri-devel, etnaviv, kernel-dev, stable
Hi Maíra,
Am Montag, dem 02.06.2025 um 10:22 -0300 schrieb Maíra Canal:
> Commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still
> active") ensured that active jobs are returned to the pending list when
> extending the timeout. However, it didn't use the pending list's lock to
> manipulate the list, which causes a race condition as the scheduler's
> workqueues are running.
>
> Hold the lock while manipulating the scheduler's pending list to prevent
> a race.
>
> Cc: stable@vger.kernel.org
> Fixes: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active")
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Feel free to take this through drm-misc.
This code is really ugly and I'm really happy to see your work on a
better solution for extending the timeouts. Thanks for working on that!
Regards,
Lucas
> ---
> Hi,
>
> I'm proposing this workaround patch to address the race-condition caused
> by manipulating the pending list without using its lock. Although I
> understand this isn't a complete solution (see [1]), it's not reasonable
> to backport the new DRM stat series [2] to the stable branches.
>
> Therefore, I believe the best solution is backporting this fix to the
> stable branches, which will fix the race and will keep adding the job
> back to the pending list (which will avoid most memory leaks).
>
> [1] https://lore.kernel.org/dri-devel/bcc0ed477f8a6f3bb06665b1756bcb98fb7af871.camel@mailbox.org/
> [2] https://lore.kernel.org/dri-devel/20250530-sched-skip-reset-v2-0-c40a8d2d8daa@igalia.com/
>
> Best Regards,
> - Maíra
> ---
> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 76a3a3e517d8..71e2e6b9d713 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -35,6 +35,7 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job
> *sched_job)
> {
> struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> + struct drm_gpu_scheduler *sched = sched_job->sched;
> struct etnaviv_gpu *gpu = submit->gpu;
> u32 dma_addr, primid = 0;
> int change;
> @@ -89,7 +90,9 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job
> return DRM_GPU_SCHED_STAT_NOMINAL;
>
> out_no_timeout:
> - list_add(&sched_job->list, &sched_job->sched->pending_list);
> + spin_lock(&sched->job_list_lock);
> + list_add(&sched_job->list, &sched->pending_list);
> + spin_unlock(&sched->job_list_lock);
> return DRM_GPU_SCHED_STAT_NOMINAL;
> }
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/etnaviv: Protect the scheduler's pending list with its lock
2025-06-02 13:22 [PATCH] drm/etnaviv: Protect the scheduler's pending list with its lock Maíra Canal
2025-06-02 15:33 ` Philipp Stanner
2025-06-16 14:06 ` Lucas Stach
@ 2025-06-16 23:41 ` Maíra Canal
2 siblings, 0 replies; 4+ messages in thread
From: Maíra Canal @ 2025-06-16 23:41 UTC (permalink / raw)
To: Lucas Stach, Philipp Zabel, Russell King, Christian Gmeiner,
Philipp Stanner
Cc: dri-devel, etnaviv, kernel-dev, stable
On 02/06/25 10:22, Maíra Canal wrote:
> Commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still
> active") ensured that active jobs are returned to the pending list when
> extending the timeout. However, it didn't use the pending list's lock to
> manipulate the list, which causes a race condition as the scheduler's
> workqueues are running.
>
> Hold the lock while manipulating the scheduler's pending list to prevent
> a race.
>
> Cc: stable@vger.kernel.org
> Fixes: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active")
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
The patch was applied to misc/kernel.git (drm-misc-fixes) with Philipp's
suggestions.
Thanks for the review!
Best Regards,
- Maíra
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-06-16 23:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-02 13:22 [PATCH] drm/etnaviv: Protect the scheduler's pending list with its lock Maíra Canal
2025-06-02 15:33 ` Philipp Stanner
2025-06-16 14:06 ` Lucas Stach
2025-06-16 23:41 ` Maíra Canal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).