* [PATCH] drm/sched: Fix dynamic job-flow control race
@ 2024-09-13 16:53 Rob Clark
2024-09-13 17:03 ` Michel Dänzer
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Rob Clark @ 2024-09-13 16:53 UTC (permalink / raw)
To: dri-devel
Cc: Rob Clark, Asahi Lina, Luben Tuikov, Matthew Brost,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Daniel Vetter, Danilo Krummrich, open list
From: Rob Clark <robdclark@chromium.org>
Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609
The whole premise of lockless access to a single-producer-single-
consumer queue is that there is just a single producer and single
consumer. That means we can't call drm_sched_can_queue() (which is
about queueing more work to the hw, not to the spsc queue) from
anywhere other than the consumer (wq).
This call in the producer is just an optimization to avoid scheduling
the consuming worker if it cannot yet queue more work to the hw. It
is safe to drop this optimization to avoid the race condition.
Suggested-by: Asahi Lina <lina@asahilina.net>
Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control")
Signed-off-by: Rob Clark <robdclark@chromium.org>
---
drivers/gpu/drm/scheduler/sched_main.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index ab53ab486fe6..1af1dbe757d5 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1020,8 +1020,7 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
struct drm_sched_entity *entity)
{
- if (drm_sched_can_queue(sched, entity))
- drm_sched_run_job_queue(sched);
+ drm_sched_run_job_queue(sched);
}
/**
--
2.46.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/sched: Fix dynamic job-flow control race
2024-09-13 16:53 [PATCH] drm/sched: Fix dynamic job-flow control race Rob Clark
@ 2024-09-13 17:03 ` Michel Dänzer
2024-09-13 17:28 ` Rob Clark
2024-09-13 18:18 ` Danilo Krummrich
2024-09-16 8:21 ` Philipp Stanner
2 siblings, 1 reply; 7+ messages in thread
From: Michel Dänzer @ 2024-09-13 17:03 UTC (permalink / raw)
To: Rob Clark, dri-devel
Cc: Rob Clark, Asahi Lina, Luben Tuikov, Matthew Brost,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Daniel Vetter, Danilo Krummrich, open list
On 2024-09-13 18:53, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
>
> Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609
>
> The whole premise of lockless access to a single-producer-single-
> consumer queue is that there is just a single producer and single
> consumer. That means we can't call drm_sched_can_queue() (which is
> about queueing more work to the hw, not to the spsc queue) from
> anywhere other than the consumer (wq).
>
> This call in the producer is just an optimization to avoid scheduling
> the consuming worker if it cannot yet queue more work to the hw. It
> is safe to drop this optimization to avoid the race condition.
>
> Suggested-by: Asahi Lina <lina@asahilina.net>
> Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control")
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index ab53ab486fe6..1af1dbe757d5 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1020,8 +1020,7 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
> void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
> struct drm_sched_entity *entity)
> {
> - if (drm_sched_can_queue(sched, entity))
> - drm_sched_run_job_queue(sched);
> + drm_sched_run_job_queue(sched);
> }
>
> /**
The entity parameter is unused now.
--
Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer
https://redhat.com \ Libre software enthusiast
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/sched: Fix dynamic job-flow control race
2024-09-13 17:03 ` Michel Dänzer
@ 2024-09-13 17:28 ` Rob Clark
2024-09-14 14:44 ` Michel Dänzer
0 siblings, 1 reply; 7+ messages in thread
From: Rob Clark @ 2024-09-13 17:28 UTC (permalink / raw)
To: Michel Dänzer
Cc: dri-devel, Rob Clark, Asahi Lina, Luben Tuikov, Matthew Brost,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Daniel Vetter, Danilo Krummrich, open list
On Fri, Sep 13, 2024 at 10:03 AM Michel Dänzer
<michel.daenzer@mailbox.org> wrote:
>
> On 2024-09-13 18:53, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609
> >
> > The whole premise of lockless access to a single-producer-single-
> > consumer queue is that there is just a single producer and single
> > consumer. That means we can't call drm_sched_can_queue() (which is
> > about queueing more work to the hw, not to the spsc queue) from
> > anywhere other than the consumer (wq).
> >
> > This call in the producer is just an optimization to avoid scheduling
> > the consuming worker if it cannot yet queue more work to the hw. It
> > is safe to drop this optimization to avoid the race condition.
> >
> > Suggested-by: Asahi Lina <lina@asahilina.net>
> > Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control")
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> > drivers/gpu/drm/scheduler/sched_main.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index ab53ab486fe6..1af1dbe757d5 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1020,8 +1020,7 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
> > void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
> > struct drm_sched_entity *entity)
> > {
> > - if (drm_sched_can_queue(sched, entity))
> > - drm_sched_run_job_queue(sched);
> > + drm_sched_run_job_queue(sched);
> > }
> >
> > /**
>
> The entity parameter is unused now.
Right.. and we probably should collapse drm_sched_wakeup() and
drm_sched_run_job_queue()..
But this fix needs to be cherry picked back to a bunch of release
branches, so I intentionally avoided refactoring as part of the fix.
BR,
-R
>
>
> --
> Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer
> https://redhat.com \ Libre software enthusiast
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/sched: Fix dynamic job-flow control race
2024-09-13 16:53 [PATCH] drm/sched: Fix dynamic job-flow control race Rob Clark
2024-09-13 17:03 ` Michel Dänzer
@ 2024-09-13 18:18 ` Danilo Krummrich
2024-09-16 8:21 ` Philipp Stanner
2 siblings, 0 replies; 7+ messages in thread
From: Danilo Krummrich @ 2024-09-13 18:18 UTC (permalink / raw)
To: Rob Clark
Cc: dri-devel, Rob Clark, Asahi Lina, Luben Tuikov, Matthew Brost,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Daniel Vetter, Danilo Krummrich, open list
On Fri, Sep 13, 2024 at 09:53:25AM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
>
> Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609
Good catch! Please add a 'Closes' tag with this link.
>
> The whole premise of lockless access to a single-producer-single-
> consumer queue is that there is just a single producer and single
> consumer. That means we can't call drm_sched_can_queue() (which is
> about queueing more work to the hw, not to the spsc queue) from
> anywhere other than the consumer (wq).
>
> This call in the producer is just an optimization to avoid scheduling
> the consuming worker if it cannot yet queue more work to the hw. It
> is safe to drop this optimization to avoid the race condition.
>
> Suggested-by: Asahi Lina <lina@asahilina.net>
> Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control")
You may want to explicitly CC stable.
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index ab53ab486fe6..1af1dbe757d5 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1020,8 +1020,7 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
> void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
> struct drm_sched_entity *entity)
Please also remove the entity parameter. For the other refactoring, I agree it
should be in a different patch.
With that,
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Thanks for fixing this.
- Danilo
> {
> - if (drm_sched_can_queue(sched, entity))
> - drm_sched_run_job_queue(sched);
> + drm_sched_run_job_queue(sched);
> }
>
> /**
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/sched: Fix dynamic job-flow control race
2024-09-13 17:28 ` Rob Clark
@ 2024-09-14 14:44 ` Michel Dänzer
0 siblings, 0 replies; 7+ messages in thread
From: Michel Dänzer @ 2024-09-14 14:44 UTC (permalink / raw)
To: Rob Clark
Cc: dri-devel, Rob Clark, Asahi Lina, Luben Tuikov, Matthew Brost,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Daniel Vetter, Danilo Krummrich, open list
On 2024-09-13 19:28, Rob Clark wrote:
> On Fri, Sep 13, 2024 at 10:03 AM Michel Dänzer
> <michel.daenzer@mailbox.org> wrote:
>>
>> On 2024-09-13 18:53, Rob Clark wrote:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609
>>>
>>> The whole premise of lockless access to a single-producer-single-
>>> consumer queue is that there is just a single producer and single
>>> consumer. That means we can't call drm_sched_can_queue() (which is
>>> about queueing more work to the hw, not to the spsc queue) from
>>> anywhere other than the consumer (wq).
>>>
>>> This call in the producer is just an optimization to avoid scheduling
>>> the consuming worker if it cannot yet queue more work to the hw. It
>>> is safe to drop this optimization to avoid the race condition.
>>>
>>> Suggested-by: Asahi Lina <lina@asahilina.net>
>>> Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control")
>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_main.c | 3 +--
>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index ab53ab486fe6..1af1dbe757d5 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -1020,8 +1020,7 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
>>> void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
>>> struct drm_sched_entity *entity)
>>> {
>>> - if (drm_sched_can_queue(sched, entity))
>>> - drm_sched_run_job_queue(sched);
>>> + drm_sched_run_job_queue(sched);
>>> }
>>>
>>> /**
>>
>> The entity parameter is unused now.
>
> Right.. and we probably should collapse drm_sched_wakeup() and
> drm_sched_run_job_queue()..
I looked into that as well, seems fine to leave them separate for now though.
> But this fix needs to be cherry picked back to a bunch of release
> branches, so I intentionally avoided refactoring as part of the fix.
Fixing up the drm_sched_wakeup caller(s) when backporting doesn't seem like a big deal.
--
Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer
https://redhat.com \ Libre software enthusiast
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/sched: Fix dynamic job-flow control race
2024-09-13 16:53 [PATCH] drm/sched: Fix dynamic job-flow control race Rob Clark
2024-09-13 17:03 ` Michel Dänzer
2024-09-13 18:18 ` Danilo Krummrich
@ 2024-09-16 8:21 ` Philipp Stanner
2024-09-16 9:28 ` Danilo Krummrich
2 siblings, 1 reply; 7+ messages in thread
From: Philipp Stanner @ 2024-09-16 8:21 UTC (permalink / raw)
To: Rob Clark, dri-devel
Cc: Rob Clark, Asahi Lina, Luben Tuikov, Matthew Brost,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Daniel Vetter, Danilo Krummrich, open list
On Fri, 2024-09-13 at 09:53 -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
>
> Fixes a race condition reported here:
> https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609
As Danilo suggested before, I'd put this in a Fixes: section at the
bottom and instead have a sentence here detailing what the race
consists of, i.e., who is racing with whom.
P.
>
> The whole premise of lockless access to a single-producer-single-
> consumer queue is that there is just a single producer and single
> consumer. That means we can't call drm_sched_can_queue() (which is
> about queueing more work to the hw, not to the spsc queue) from
> anywhere other than the consumer (wq).
>
> This call in the producer is just an optimization to avoid scheduling
> the consuming worker if it cannot yet queue more work to the hw. It
> is safe to drop this optimization to avoid the race condition.
>
> Suggested-by: Asahi Lina <lina@asahilina.net>
> Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control")
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index ab53ab486fe6..1af1dbe757d5 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1020,8 +1020,7 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
> void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
> struct drm_sched_entity *entity)
> {
> - if (drm_sched_can_queue(sched, entity))
> - drm_sched_run_job_queue(sched);
> + drm_sched_run_job_queue(sched);
> }
>
> /**
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/sched: Fix dynamic job-flow control race
2024-09-16 8:21 ` Philipp Stanner
@ 2024-09-16 9:28 ` Danilo Krummrich
0 siblings, 0 replies; 7+ messages in thread
From: Danilo Krummrich @ 2024-09-16 9:28 UTC (permalink / raw)
To: Philipp Stanner
Cc: Rob Clark, dri-devel, Rob Clark, Asahi Lina, Luben Tuikov,
Matthew Brost, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Daniel Vetter, open list
On Mon, Sep 16, 2024 at 10:21:21AM +0200, Philipp Stanner wrote:
> On Fri, 2024-09-13 at 09:53 -0700, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Fixes a race condition reported here:
> > https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609
>
> As Danilo suggested before, I'd put this in a Fixes: section at the
I think you mean 'Closes'.
Please note that there's a v2 [1] already, which does that.
[1] https://lore.kernel.org/dri-devel/20240913202301.16772-1-robdclark@gmail.com/
> bottom and instead have a sentence here detailing what the race
> consists of, i.e., who is racing with whom.
That's written right below, isn't it?
>
> P.
>
> >
> > The whole premise of lockless access to a single-producer-single-
> > consumer queue is that there is just a single producer and single
> > consumer. That means we can't call drm_sched_can_queue() (which is
> > about queueing more work to the hw, not to the spsc queue) from
> > anywhere other than the consumer (wq).
> >
> > This call in the producer is just an optimization to avoid scheduling
> > the consuming worker if it cannot yet queue more work to the hw. It
> > is safe to drop this optimization to avoid the race condition.
> >
> > Suggested-by: Asahi Lina <lina@asahilina.net>
> > Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control")
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> > drivers/gpu/drm/scheduler/sched_main.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index ab53ab486fe6..1af1dbe757d5 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1020,8 +1020,7 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
> > void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
> > struct drm_sched_entity *entity)
> > {
> > - if (drm_sched_can_queue(sched, entity))
> > - drm_sched_run_job_queue(sched);
> > + drm_sched_run_job_queue(sched);
> > }
> >
> > /**
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-09-16 9:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-13 16:53 [PATCH] drm/sched: Fix dynamic job-flow control race Rob Clark
2024-09-13 17:03 ` Michel Dänzer
2024-09-13 17:28 ` Rob Clark
2024-09-14 14:44 ` Michel Dänzer
2024-09-13 18:18 ` Danilo Krummrich
2024-09-16 8:21 ` Philipp Stanner
2024-09-16 9:28 ` Danilo Krummrich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox