[PATCH] drm/xe/guc_submit: fix UAF in run

Linux kernel -stable discussions
 help / color / mirror / Atom feed

* [PATCH] drm/xe/guc_submit: fix UAF in run_job()
@ 2024-09-20 12:38 Matthew Auld
  2024-09-20 19:07 ` Matthew Brost
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Auld @ 2024-09-20 12:38 UTC (permalink / raw)
  To: intel-xe; +Cc: Matthew Brost, stable

The initial kref from dma_fence_init() should match up with whatever
signals the fence, however here we are submitting the job first to the
hw and only then grabbing the extra ref and even then we touch some
fence state before this. This might be too late if the fence is
signalled before we can grab the extra ref. Rather always grab the
refcount early before we do the submission part.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2811
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index fbbe6a487bbb..b33f3d23a068 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -766,12 +766,15 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 	struct xe_guc *guc = exec_queue_to_guc(q);
 	struct xe_device *xe = guc_to_xe(guc);
 	bool lr = xe_exec_queue_is_lr(q);
+	struct dma_fence *fence;
 
 	xe_assert(xe, !(exec_queue_destroyed(q) || exec_queue_pending_disable(q)) ||
 		  exec_queue_banned(q) || exec_queue_suspended(q));
 
 	trace_xe_sched_job_run(job);
 
+	dma_fence_get(job->fence);
+
 	if (!exec_queue_killed_or_banned_or_wedged(q) && !xe_sched_job_is_error(job)) {
 		if (!exec_queue_registered(q))
 			register_exec_queue(q);
@@ -782,12 +785,16 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 
 	if (lr) {
 		xe_sched_job_set_error(job, -EOPNOTSUPP);
-		return NULL;
+		fence = NULL;
 	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
-		return job->fence;
+		fence = job->fence;
 	} else {
-		return dma_fence_get(job->fence);
+		fence = dma_fence_get(job->fence);
 	}
+
+	dma_fence_put(job->fence);
+
+	return fence;
 }
 
 static void guc_exec_queue_free_job(struct drm_sched_job *drm_job)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/xe/guc_submit: fix UAF in run_job()
  2024-09-20 12:38 [PATCH] drm/xe/guc_submit: fix UAF in run_job() Matthew Auld
@ 2024-09-20 19:07 ` Matthew Brost
  2024-09-21  1:58   ` Matthew Brost
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Brost @ 2024-09-20 19:07 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe, stable

On Fri, Sep 20, 2024 at 01:38:07PM +0100, Matthew Auld wrote:
> The initial kref from dma_fence_init() should match up with whatever
> signals the fence, however here we are submitting the job first to the
> hw and only then grabbing the extra ref and even then we touch some
> fence state before this. This might be too late if the fence is
> signalled before we can grab the extra ref. Rather always grab the
> refcount early before we do the submission part.
> 

I think I see the race. Let me make sure I understand.

Current flow:

1. guc_exec_queue_run_job enters
2. guc_exec_queue_run_job submits job to hardware
3. job finishes on hardware
4. irq handler for job completion fires, signals job->fence, does last
   put on job->fence freeing the memory
5. guc_exec_queue_run_job takes a ref job->fence and BOOM UAF

The extra ref between steps 1/2 dropped after 5 prevents this. Is that
right?

Assuming my understanding is correct:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2811
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index fbbe6a487bbb..b33f3d23a068 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -766,12 +766,15 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>  	struct xe_guc *guc = exec_queue_to_guc(q);
>  	struct xe_device *xe = guc_to_xe(guc);
>  	bool lr = xe_exec_queue_is_lr(q);
> +	struct dma_fence *fence;
>  
>  	xe_assert(xe, !(exec_queue_destroyed(q) || exec_queue_pending_disable(q)) ||
>  		  exec_queue_banned(q) || exec_queue_suspended(q));
>  
>  	trace_xe_sched_job_run(job);
>  
> +	dma_fence_get(job->fence);
> +
>  	if (!exec_queue_killed_or_banned_or_wedged(q) && !xe_sched_job_is_error(job)) {
>  		if (!exec_queue_registered(q))
>  			register_exec_queue(q);
> @@ -782,12 +785,16 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>  
>  	if (lr) {
>  		xe_sched_job_set_error(job, -EOPNOTSUPP);
> -		return NULL;
> +		fence = NULL;
>  	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
> -		return job->fence;
> +		fence = job->fence;
>  	} else {
> -		return dma_fence_get(job->fence);
> +		fence = dma_fence_get(job->fence);
>  	}
> +
> +	dma_fence_put(job->fence);
> +
> +	return fence;
>  }
>  
>  static void guc_exec_queue_free_job(struct drm_sched_job *drm_job)
> -- 
> 2.46.0
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/xe/guc_submit: fix UAF in run_job()
  2024-09-20 19:07 ` Matthew Brost
@ 2024-09-21  1:58   ` Matthew Brost
  0 siblings, 0 replies; 3+ messages in thread
From: Matthew Brost @ 2024-09-21  1:58 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe, stable

On Fri, Sep 20, 2024 at 07:07:36PM +0000, Matthew Brost wrote:
> On Fri, Sep 20, 2024 at 01:38:07PM +0100, Matthew Auld wrote:
> > The initial kref from dma_fence_init() should match up with whatever
> > signals the fence, however here we are submitting the job first to the
> > hw and only then grabbing the extra ref and even then we touch some
> > fence state before this. This might be too late if the fence is
> > signalled before we can grab the extra ref. Rather always grab the
> > refcount early before we do the submission part.
> > 
> 
> I think I see the race. Let me make sure I understand.
> 
> Current flow:
> 
> 1. guc_exec_queue_run_job enters
> 2. guc_exec_queue_run_job submits job to hardware
> 3. job finishes on hardware
> 4. irq handler for job completion fires, signals job->fence, does last
>    put on job->fence freeing the memory
> 5. guc_exec_queue_run_job takes a ref job->fence and BOOM UAF
> 
> The extra ref between steps 1/2 dropped after 5 prevents this. Is that
> right?
> 
> Assuming my understanding is correct:
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> 

Revoking this RB. I should've looked a bit deeper here before the
initial reply, but I think I have this fixed here [1]. Let me know if
that patch makes sense or if you have have any questions, concerns, or
comments.

Matt

[1] https://patchwork.freedesktop.org/patch/615304/?series=138939&rev=1

> > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2811
> > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.8+
> > ---
> >  drivers/gpu/drm/xe/xe_guc_submit.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index fbbe6a487bbb..b33f3d23a068 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -766,12 +766,15 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> >  	struct xe_guc *guc = exec_queue_to_guc(q);
> >  	struct xe_device *xe = guc_to_xe(guc);
> >  	bool lr = xe_exec_queue_is_lr(q);
> > +	struct dma_fence *fence;
> >  
> >  	xe_assert(xe, !(exec_queue_destroyed(q) || exec_queue_pending_disable(q)) ||
> >  		  exec_queue_banned(q) || exec_queue_suspended(q));
> >  
> >  	trace_xe_sched_job_run(job);
> >  
> > +	dma_fence_get(job->fence);
> > +
> >  	if (!exec_queue_killed_or_banned_or_wedged(q) && !xe_sched_job_is_error(job)) {
> >  		if (!exec_queue_registered(q))
> >  			register_exec_queue(q);
> > @@ -782,12 +785,16 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> >  
> >  	if (lr) {
> >  		xe_sched_job_set_error(job, -EOPNOTSUPP);
> > -		return NULL;
> > +		fence = NULL;
> >  	} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
> > -		return job->fence;
> > +		fence = job->fence;
> >  	} else {
> > -		return dma_fence_get(job->fence);
> > +		fence = dma_fence_get(job->fence);
> >  	}
> > +
> > +	dma_fence_put(job->fence);
> > +
> > +	return fence;
> >  }
> >  
> >  static void guc_exec_queue_free_job(struct drm_sched_job *drm_job)
> > -- 
> > 2.46.0
> > 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-09-21  2:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-20 12:38 [PATCH] drm/xe/guc_submit: fix UAF in run_job() Matthew Auld
2024-09-20 19:07 ` Matthew Brost
2024-09-21  1:58   ` Matthew Brost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox