public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/2] drm/sched: warn about drm_sched_job_init()'s partial init
  2024-08-06 14:38 Philipp Stanner
@ 2024-08-06 14:38 ` Philipp Stanner
  0 siblings, 0 replies; 5+ messages in thread
From: Philipp Stanner @ 2024-08-06 14:38 UTC (permalink / raw)
  To: Luben Tuikov, Matthew Brost, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Danilo Krummrich
  Cc: dri-devel, linux-kernel, Philipp Stanner

drm_sched_job_init()'s name suggests that after the function succeeded,
parameter "job" will be fully initialized. This is not the case; some
members are only later set, notably "job->sched" by drm_sched_job_arm().

Document that drm_sched_job_init() does not set all struct members.

Document that job->sched in particular is uninitialized before
drm_sched_job_arm().

Signed-off-by: Philipp Stanner <pstanner@redhat.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 4 ++++
 include/drm/gpu_scheduler.h            | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 1498ee3cbf39..2adb13745500 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -911,6 +911,10 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  * Drivers must make sure drm_sched_job_cleanup() if this function returns
  * successfully, even when @job is aborted before drm_sched_job_arm() is called.
  *
+ * Note that this function does not assign a valid value to each struct member
+ * of struct drm_sched_job. Take a look at that struct's documentation to see
+ * who sets which struct member with what lifetime.
+ *
  * WARNING: amdgpu abuses &drm_sched.ready to signal when the hardware
  * has died, which can mean that there's no valid runqueue for a @entity.
  * This function returns -ENOENT in this case (which probably should be -EIO as
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index ce15c50d8a10..7df81a07f1f9 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -337,6 +337,13 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
 struct drm_sched_job {
 	struct spsc_node		queue_node;
 	struct list_head		list;
+
+	/*
+	 * The scheduler this job is or will be scheduled on.
+	 *
+	 * Gets set by drm_sched_arm(). Valid until the scheduler's backend_ops
+	 * callback "free_job()" is  called.
+	 */
 	struct drm_gpu_scheduler	*sched;
 	struct drm_sched_fence		*s_fence;
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init()
@ 2024-10-21 10:50 Philipp Stanner
  2024-10-21 10:50 ` [PATCH 2/2] drm/sched: warn about drm_sched_job_init()'s partial init Philipp Stanner
  2024-10-21 13:05 ` [PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init() Christian König
  0 siblings, 2 replies; 5+ messages in thread
From: Philipp Stanner @ 2024-10-21 10:50 UTC (permalink / raw)
  To: Luben Tuikov, Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Christian König, Tvrtko Ursulin
  Cc: dri-devel, linux-kernel

drm_sched_job_init() has no control over how users allocate struct
drm_sched_job. Unfortunately, the function can also not set some struct
members such as job->sched.

This could theoretically lead to UB by users dereferencing the struct's
pointer members too early.

It is easier to debug such issues if these pointers are initialized to
NULL, so dereferencing them causes a NULL pointer exception.
Accordingly, drm_sched_entity_init() does precisely that and initializes
its struct with memset().

Initialize parameter "job" to 0 in drm_sched_job_init().

Signed-off-by: Philipp Stanner <pstanner@redhat.com>
---
No changes in v2.

+CC Christian and Tvrtko in this thread.
Would be cool if someone can do a review.
---
 drivers/gpu/drm/scheduler/sched_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index dab8cca79eb7..2e0e5a9577d1 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -796,6 +796,14 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		return -EINVAL;
 	}
 
+	/*
+	 * We don't know for sure how the user has allocated. Thus, zero the
+	 * struct so that unallowed (i.e., too early) usage of pointers that
+	 * this function does not set is guaranteed to lead to a NULL pointer
+	 * exception instead of UB.
+	 */
+	memset(job, 0, sizeof(*job));
+
 	job->entity = entity;
 	job->credits = credits;
 	job->s_fence = drm_sched_fence_alloc(entity, owner);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] drm/sched: warn about drm_sched_job_init()'s partial init
  2024-10-21 10:50 [PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init() Philipp Stanner
@ 2024-10-21 10:50 ` Philipp Stanner
  2024-10-21 13:05 ` [PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init() Christian König
  1 sibling, 0 replies; 5+ messages in thread
From: Philipp Stanner @ 2024-10-21 10:50 UTC (permalink / raw)
  To: Luben Tuikov, Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Christian König, Tvrtko Ursulin
  Cc: dri-devel, linux-kernel

drm_sched_job_init()'s name suggests that after the function succeeded,
parameter "job" will be fully initialized. This is not the case; some
members are only later set, notably "job->sched" by drm_sched_job_arm().

Document that drm_sched_job_init() does not set all struct members.

Document that job->sched in particular is uninitialized before
drm_sched_job_arm().

Signed-off-by: Philipp Stanner <pstanner@redhat.com>
---
Changes in v2:
  - Change grammar in the new comments a bit.
---
 drivers/gpu/drm/scheduler/sched_main.c | 4 ++++
 include/drm/gpu_scheduler.h            | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 2e0e5a9577d1..2f1b514ff4cf 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -771,6 +771,10 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  * Drivers must make sure drm_sched_job_cleanup() if this function returns
  * successfully, even when @job is aborted before drm_sched_job_arm() is called.
  *
+ * Note that this function does not assign a valid value to each struct member
+ * of struct drm_sched_job. Take a look at that struct's documentation to see
+ * who sets which struct member with what lifetime.
+ *
  * WARNING: amdgpu abuses &drm_sched.ready to signal when the hardware
  * has died, which can mean that there's no valid runqueue for a @entity.
  * This function returns -ENOENT in this case (which probably should be -EIO as
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index ab161289d1bf..f7d9bdd0fb6b 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -340,6 +340,13 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
 struct drm_sched_job {
 	struct spsc_node		queue_node;
 	struct list_head		list;
+
+	/*
+	 * The scheduler this job is or will be scheduled on.
+	 *
+	 * Gets set by drm_sched_arm(). Valid until the scheduler's backend_ops
+	 * callback "free_job()" has been called.
+	 */
 	struct drm_gpu_scheduler	*sched;
 	struct drm_sched_fence		*s_fence;
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init()
  2024-10-21 10:50 [PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init() Philipp Stanner
  2024-10-21 10:50 ` [PATCH 2/2] drm/sched: warn about drm_sched_job_init()'s partial init Philipp Stanner
@ 2024-10-21 13:05 ` Christian König
  2024-10-22 14:17   ` Philipp Stanner
  1 sibling, 1 reply; 5+ messages in thread
From: Christian König @ 2024-10-21 13:05 UTC (permalink / raw)
  To: Philipp Stanner, Luben Tuikov, Matthew Brost, Danilo Krummrich,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Tvrtko Ursulin
  Cc: dri-devel, linux-kernel

Am 21.10.24 um 12:50 schrieb Philipp Stanner:
> drm_sched_job_init() has no control over how users allocate struct
> drm_sched_job. Unfortunately, the function can also not set some struct
> members such as job->sched.
>
> This could theoretically lead to UB by users dereferencing the struct's
> pointer members too early.
>
> It is easier to debug such issues if these pointers are initialized to
> NULL, so dereferencing them causes a NULL pointer exception.
> Accordingly, drm_sched_entity_init() does precisely that and initializes
> its struct with memset().
>
> Initialize parameter "job" to 0 in drm_sched_job_init().
>
> Signed-off-by: Philipp Stanner <pstanner@redhat.com>
> ---
> No changes in v2.
>
> +CC Christian and Tvrtko in this thread.
> Would be cool if someone can do a review.
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index dab8cca79eb7..2e0e5a9577d1 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -796,6 +796,14 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   		return -EINVAL;
>   	}
>   
> +	/*
> +	 * We don't know for sure how the user has allocated. Thus, zero the
> +	 * struct so that unallowed (i.e., too early) usage of pointers that
> +	 * this function does not set is guaranteed to lead to a NULL pointer
> +	 * exception instead of UB.
> +	 */
> +	memset(job, 0, sizeof(*job));
> +

Maybe just implicitly set the sched pointer to NULL here?

On the other hand compilers these days are really good at optimizing 
that away anyway, so feel free to add Reviewed-by: Christian König 
<christian.koenig@amd.com> to the series as is as well.

Regards,
Christian.

>   	job->entity = entity;
>   	job->credits = credits;
>   	job->s_fence = drm_sched_fence_alloc(entity, owner);


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init()
  2024-10-21 13:05 ` [PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init() Christian König
@ 2024-10-22 14:17   ` Philipp Stanner
  0 siblings, 0 replies; 5+ messages in thread
From: Philipp Stanner @ 2024-10-22 14:17 UTC (permalink / raw)
  To: Christian König, Luben Tuikov, Matthew Brost,
	Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Tvrtko Ursulin
  Cc: dri-devel, linux-kernel

On Mon, 2024-10-21 at 15:05 +0200, Christian König wrote:
> Am 21.10.24 um 12:50 schrieb Philipp Stanner:
> > drm_sched_job_init() has no control over how users allocate struct
> > drm_sched_job. Unfortunately, the function can also not set some
> > struct
> > members such as job->sched.
> > 
> > This could theoretically lead to UB by users dereferencing the
> > struct's
> > pointer members too early.
> > 
> > It is easier to debug such issues if these pointers are initialized
> > to
> > NULL, so dereferencing them causes a NULL pointer exception.
> > Accordingly, drm_sched_entity_init() does precisely that and
> > initializes
> > its struct with memset().
> > 
> > Initialize parameter "job" to 0 in drm_sched_job_init().
> > 
> > Signed-off-by: Philipp Stanner <pstanner@redhat.com>
> > ---
> > No changes in v2.
> > 
> > +CC Christian and Tvrtko in this thread.
> > Would be cool if someone can do a review.
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 8 ++++++++
> >   1 file changed, 8 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index dab8cca79eb7..2e0e5a9577d1 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -796,6 +796,14 @@ int drm_sched_job_init(struct drm_sched_job
> > *job,
> >   		return -EINVAL;
> >   	}
> >   
> > +	/*
> > +	 * We don't know for sure how the user has allocated.
> > Thus, zero the
> > +	 * struct so that unallowed (i.e., too early) usage of
> > pointers that
> > +	 * this function does not set is guaranteed to lead to a
> > NULL pointer
> > +	 * exception instead of UB.
> > +	 */
> > +	memset(job, 0, sizeof(*job));
> > +
> 
> Maybe just implicitly set the sched pointer to NULL here?
> 
> On the other hand compilers these days are really good at optimizing 
> that away anyway, so feel free to add Reviewed-by: Christian König 
> <christian.koenig@amd.com> to the series as is as well.

(I had performance-tested it with several million jobs and couldn't
detect a performance regression that was measurable)

Applied #1 to drm-misc-next, thanks.

Regarding patch #2, I just noticed that it violates the docstring
style. I therefore hereby reject my own patch and will resubmit it in a
cleaner form ^^'

P.

> 
> Regards,
> Christian.
> 
> >   	job->entity = entity;
> >   	job->credits = credits;
> >   	job->s_fence = drm_sched_fence_alloc(entity, owner);
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-10-22 14:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-21 10:50 [PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init() Philipp Stanner
2024-10-21 10:50 ` [PATCH 2/2] drm/sched: warn about drm_sched_job_init()'s partial init Philipp Stanner
2024-10-21 13:05 ` [PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init() Christian König
2024-10-22 14:17   ` Philipp Stanner
  -- strict thread matches above, loose matches on Subject: below --
2024-08-06 14:38 Philipp Stanner
2024-08-06 14:38 ` [PATCH 2/2] drm/sched: warn about drm_sched_job_init()'s partial init Philipp Stanner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox