public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/sched: drm_sched_job_cleanup(): correct false doc
@ 2025-03-03 14:32 Philipp Stanner
  2025-03-03 17:13 ` Tvrtko Ursulin
  0 siblings, 1 reply; 3+ messages in thread
From: Philipp Stanner @ 2025-03-03 14:32 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Tvrtko Ursulin
  Cc: dri-devel, linux-kernel

drm_sched_job_cleanup()'s documentation claims that calling
drm_sched_job_arm() is a "point of no return", implying that afterwards
a job cannot be cancelled anymore.

This is not correct, as proven by the function's code itself, which
takes a previous call to drm_sched_job_arm() into account. In truth, the
decisive factors are whether fences have been shared (e.g., with other
processes) and if the job has been submitted to an entity already.

Correct the wrong docstring.

Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_main.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index c634993f1346..db9e08e6e455 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1013,11 +1013,13 @@ EXPORT_SYMBOL(drm_sched_job_has_dependency);
  * Cleans up the resources allocated with drm_sched_job_init().
  *
  * Drivers should call this from their error unwind code if @job is aborted
- * before drm_sched_job_arm() is called.
+ * before it was submitted to an entity with drm_sched_entity_push_job().
  *
- * After that point of no return @job is committed to be executed by the
- * scheduler, and this function should be called from the
- * &drm_sched_backend_ops.free_job callback.
+ * Since calling drm_sched_job_arm() causes the job's fences to be initialized,
+ * it is up to the driver to ensure that fences that were exposed to external
+ * parties get signaled. drm_sched_job_cleanup() does not ensure this.
+ *
+ * This function must also be called in &struct drm_sched_backend_ops.free_job
  */
 void drm_sched_job_cleanup(struct drm_sched_job *job)
 {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/sched: drm_sched_job_cleanup(): correct false doc
  2025-03-03 14:32 [PATCH] drm/sched: drm_sched_job_cleanup(): correct false doc Philipp Stanner
@ 2025-03-03 17:13 ` Tvrtko Ursulin
  2025-03-04  7:27   ` Philipp Stanner
  0 siblings, 1 reply; 3+ messages in thread
From: Tvrtko Ursulin @ 2025-03-03 17:13 UTC (permalink / raw)
  To: Philipp Stanner, Matthew Brost, Danilo Krummrich,
	Christian König, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: dri-devel, linux-kernel


On 03/03/2025 14:32, Philipp Stanner wrote:
> drm_sched_job_cleanup()'s documentation claims that calling
> drm_sched_job_arm() is a "point of no return", implying that afterwards
> a job cannot be cancelled anymore.
> 
> This is not correct, as proven by the function's code itself, which
> takes a previous call to drm_sched_job_arm() into account. In truth, the
> decisive factors are whether fences have been shared (e.g., with other
> processes) and if the job has been submitted to an entity already.
> 
> Correct the wrong docstring.
> 
> Signed-off-by: Philipp Stanner <phasta@kernel.org>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index c634993f1346..db9e08e6e455 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1013,11 +1013,13 @@ EXPORT_SYMBOL(drm_sched_job_has_dependency);
>    * Cleans up the resources allocated with drm_sched_job_init().
>    *
>    * Drivers should call this from their error unwind code if @job is aborted
> - * before drm_sched_job_arm() is called.
> + * before it was submitted to an entity with drm_sched_entity_push_job().
>    *
> - * After that point of no return @job is committed to be executed by the
> - * scheduler, and this function should be called from the
> - * &drm_sched_backend_ops.free_job callback.
> + * Since calling drm_sched_job_arm() causes the job's fences to be initialized,
> + * it is up to the driver to ensure that fences that were exposed to external
> + * parties get signaled. drm_sched_job_cleanup() does not ensure this.
> + *
> + * This function must also be called in &struct drm_sched_backend_ops.free_job
>    */
>   void drm_sched_job_cleanup(struct drm_sched_job *job)
>   {

I also do not see anything so special in the flows which would disallow 
"aborting" jobs after arming. And it definitely can happen in the 
practice. It was probably just unclear kerneldoc.

What I would suggest to add to the patch is to change the comment in 
drm_sched_job_cleanup() implementation:

This one:

		/* aborted job before committing to run it */

Probably just change to:

		/* aborted before arming */

With that you can add my r-b.

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/sched: drm_sched_job_cleanup(): correct false doc
  2025-03-03 17:13 ` Tvrtko Ursulin
@ 2025-03-04  7:27   ` Philipp Stanner
  0 siblings, 0 replies; 3+ messages in thread
From: Philipp Stanner @ 2025-03-04  7:27 UTC (permalink / raw)
  To: Tvrtko Ursulin, Philipp Stanner, Matthew Brost, Danilo Krummrich,
	Christian König, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: dri-devel, linux-kernel

On Mon, 2025-03-03 at 17:13 +0000, Tvrtko Ursulin wrote:
> 
> On 03/03/2025 14:32, Philipp Stanner wrote:
> > drm_sched_job_cleanup()'s documentation claims that calling
> > drm_sched_job_arm() is a "point of no return", implying that
> > afterwards
> > a job cannot be cancelled anymore.
> > 
> > This is not correct, as proven by the function's code itself, which
> > takes a previous call to drm_sched_job_arm() into account. In
> > truth, the
> > decisive factors are whether fences have been shared (e.g., with
> > other
> > processes) and if the job has been submitted to an entity already.
> > 
> > Correct the wrong docstring.
> > 
> > Signed-off-by: Philipp Stanner <phasta@kernel.org>
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 10 ++++++----
> >   1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index c634993f1346..db9e08e6e455 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1013,11 +1013,13 @@
> > EXPORT_SYMBOL(drm_sched_job_has_dependency);
> >    * Cleans up the resources allocated with drm_sched_job_init().
> >    *
> >    * Drivers should call this from their error unwind code if @job
> > is aborted
> > - * before drm_sched_job_arm() is called.
> > + * before it was submitted to an entity with
> > drm_sched_entity_push_job().
> >    *
> > - * After that point of no return @job is committed to be executed
> > by the
> > - * scheduler, and this function should be called from the
> > - * &drm_sched_backend_ops.free_job callback.
> > + * Since calling drm_sched_job_arm() causes the job's fences to be
> > initialized,
> > + * it is up to the driver to ensure that fences that were exposed
> > to external
> > + * parties get signaled. drm_sched_job_cleanup() does not ensure
> > this.
> > + *
> > + * This function must also be called in &struct
> > drm_sched_backend_ops.free_job
> >    */
> >   void drm_sched_job_cleanup(struct drm_sched_job *job)
> >   {
> 
> I also do not see anything so special in the flows which would
> disallow 
> "aborting" jobs after arming. And it definitely can happen in the 
> practice. It was probably just unclear kerneldoc.

I thought about why they wrote that for a few months and the only
reason I can imagine was that they were worried about the initialized
fences being shared.

> 
> What I would suggest to add to the patch is to change the comment in 
> drm_sched_job_cleanup() implementation:
> 
> This one:
> 
> 		/* aborted job before committing to run it */
> 
> Probably just change to:
> 
> 		/* aborted before arming */

Yes, good catch. Will do.

P.

> 
> With that you can add my r-b.
> 
> Regards,
> 
> Tvrtko
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-03-04  7:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-03 14:32 [PATCH] drm/sched: drm_sched_job_cleanup(): correct false doc Philipp Stanner
2025-03-03 17:13 ` Tvrtko Ursulin
2025-03-04  7:27   ` Philipp Stanner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox