[PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry
@ 2026-04-02  9:15 Thomas Hellström
  2026-04-02  9:27 ` Matthew Auld
  2026-04-03  2:42 ` Matthew Brost
  0 siblings, 2 replies; 5+ messages in thread
From: Thomas Hellström @ 2026-04-02  9:15 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Brost, Matthew Auld, stable

xe_pt_update_ops_prepare() calls xe_pt_update_ops_init() at the start of
each invocation to reset per-attempt state, but current_op was not
included in that reset. When vm_bind_ioctl_ops_execute() retries due to
ww-mutex contention (drm_exec_retry_on_contention), ops_execute() calls
xe_pt_update_ops_prepare() again. The second call walks the same op list
and fills ops[] starting from current_op, which still holds the value
from the first attempt. This indexes past the end of the ops array
allocated by xe_vma_ops_alloc(), whose size was computed for a single
pass.

KASAN reported:
  BUG: KASAN: slab-out-of-bounds in bind_op_prepare+0x89c/0xae0 [xe]
  Write of size 8 at addr ffff88812e72bae8 by task xe_evict/2848
  [...]
  bind_op_prepare+0x89c/0xae0 [xe]
  xe_pt_update_ops_prepare+0xbd0/0x1570 [xe]
  ops_execute+0x3ae/0x2030 [xe]
  vm_bind_ioctl_ops_execute+0x4d5/0xed0 [xe]

The write lands at ops[1].vma (offset 360 into the second element of a
one-element 384-byte allocation) because entries[] is exactly 360 bytes
and current_op was 1 at the start of the retried prepare pass.

Fix by resetting current_op to 0 in xe_pt_update_ops_init().

Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 8e5f4f0dea3f..3607cd57fc4c 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -2291,6 +2291,7 @@ xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops)
 	init_llist_head(&pt_update_ops->deferred);
 	pt_update_ops->start = ~0x0ull;
 	pt_update_ops->last = 0x0ull;
+	pt_update_ops->current_op = 0;
 	xe_page_reclaim_list_init(&pt_update_ops->prl);
 }

-- 
2.53.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry
  2026-04-02  9:15 [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry Thomas Hellström
@ 2026-04-02  9:27 ` Matthew Auld
  2026-04-03  2:42 ` Matthew Brost
  1 sibling, 0 replies; 5+ messages in thread
From: Matthew Auld @ 2026-04-02  9:27 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe; +Cc: Matthew Brost, stable

On 02/04/2026 10:15, Thomas Hellström wrote:
> xe_pt_update_ops_prepare() calls xe_pt_update_ops_init() at the start of
> each invocation to reset per-attempt state, but current_op was not
> included in that reset. When vm_bind_ioctl_ops_execute() retries due to
> ww-mutex contention (drm_exec_retry_on_contention), ops_execute() calls
> xe_pt_update_ops_prepare() again. The second call walks the same op list
> and fills ops[] starting from current_op, which still holds the value
> from the first attempt. This indexes past the end of the ops array
> allocated by xe_vma_ops_alloc(), whose size was computed for a single
> pass.
> 
> KASAN reported:
>    BUG: KASAN: slab-out-of-bounds in bind_op_prepare+0x89c/0xae0 [xe]
>    Write of size 8 at addr ffff88812e72bae8 by task xe_evict/2848
>    [...]
>    bind_op_prepare+0x89c/0xae0 [xe]
>    xe_pt_update_ops_prepare+0xbd0/0x1570 [xe]
>    ops_execute+0x3ae/0x2030 [xe]
>    vm_bind_ioctl_ops_execute+0x4d5/0xed0 [xe]
> 
> The write lands at ops[1].vma (offset 360 into the second element of a
> one-element 384-byte allocation) because entries[] is exactly 360 bytes
> and current_op was 1 at the start of the retried prepare pass.
> 
> Fix by resetting current_op to 0 in xe_pt_update_ops_init().
> 
> Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job")
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: <stable@vger.kernel.org> # v6.12+
> Assisted-by: GitHub Copilot:claude-sonnet-4.6

Out of curiosity, was it able to suggest the fix given the KASAN splat?

> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> ---
>   drivers/gpu/drm/xe/xe_pt.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 8e5f4f0dea3f..3607cd57fc4c 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -2291,6 +2291,7 @@ xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops)
>   	init_llist_head(&pt_update_ops->deferred);
>   	pt_update_ops->start = ~0x0ull;
>   	pt_update_ops->last = 0x0ull;
> +	pt_update_ops->current_op = 0;
>   	xe_page_reclaim_list_init(&pt_update_ops->prl);
>   }
>   


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry
  2026-04-02  9:15 [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry Thomas Hellström
  2026-04-02  9:27 ` Matthew Auld
@ 2026-04-03  2:42 ` Matthew Brost
  2026-04-03  2:43   ` Matthew Brost
  1 sibling, 1 reply; 5+ messages in thread
From: Matthew Brost @ 2026-04-03  2:42 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, Matthew Auld, stable

On Thu, Apr 02, 2026 at 11:15:39AM +0200, Thomas Hellström wrote:
> xe_pt_update_ops_prepare() calls xe_pt_update_ops_init() at the start of
> each invocation to reset per-attempt state, but current_op was not
> included in that reset. When vm_bind_ioctl_ops_execute() retries due to
> ww-mutex contention (drm_exec_retry_on_contention), ops_execute() calls

I'm falling to see retry path around vm_bind_ioctl_ops_execute related
to drm_exec_retry_on_contention... Also by the time we get to
vm_bind_ioctl_ops_execute we have all dma-resv, right?

I believe the Kasan report but I just can't spot the bug - can you point
out the retry path to me?

Matt

> xe_pt_update_ops_prepare() again. The second call walks the same op list
> and fills ops[] starting from current_op, which still holds the value
> from the first attempt. This indexes past the end of the ops array
> allocated by xe_vma_ops_alloc(), whose size was computed for a single
> pass.
> 
> KASAN reported:
>   BUG: KASAN: slab-out-of-bounds in bind_op_prepare+0x89c/0xae0 [xe]
>   Write of size 8 at addr ffff88812e72bae8 by task xe_evict/2848
>   [...]
>   bind_op_prepare+0x89c/0xae0 [xe]
>   xe_pt_update_ops_prepare+0xbd0/0x1570 [xe]
>   ops_execute+0x3ae/0x2030 [xe]
>   vm_bind_ioctl_ops_execute+0x4d5/0xed0 [xe]
> 
> The write lands at ops[1].vma (offset 360 into the second element of a
> one-element 384-byte allocation) because entries[] is exactly 360 bytes
> and current_op was 1 at the start of the retried prepare pass.
> 
> Fix by resetting current_op to 0 in xe_pt_update_ops_init().
> 
> Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job")
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: <stable@vger.kernel.org> # v6.12+
> Assisted-by: GitHub Copilot:claude-sonnet-4.6
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_pt.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 8e5f4f0dea3f..3607cd57fc4c 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -2291,6 +2291,7 @@ xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops)
>  	init_llist_head(&pt_update_ops->deferred);
>  	pt_update_ops->start = ~0x0ull;
>  	pt_update_ops->last = 0x0ull;
> +	pt_update_ops->current_op = 0;
>  	xe_page_reclaim_list_init(&pt_update_ops->prl);
>  }
>  
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry
  2026-04-03  2:42 ` Matthew Brost
@ 2026-04-03  2:43   ` Matthew Brost
  2026-04-03 10:00     ` Thomas Hellström
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Brost @ 2026-04-03  2:43 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe, Matthew Auld, stable

On Thu, Apr 02, 2026 at 07:42:06PM -0700, Matthew Brost wrote:
> On Thu, Apr 02, 2026 at 11:15:39AM +0200, Thomas Hellström wrote:
> > xe_pt_update_ops_prepare() calls xe_pt_update_ops_init() at the start of
> > each invocation to reset per-attempt state, but current_op was not
> > included in that reset. When vm_bind_ioctl_ops_execute() retries due to
> > ww-mutex contention (drm_exec_retry_on_contention), ops_execute() calls
> 
> I'm falling to see retry path around vm_bind_ioctl_ops_execute related
> to drm_exec_retry_on_contention... Also by the time we get to
> vm_bind_ioctl_ops_execute we have all dma-resv, right?

s/vm_bind_ioctl_ops_execute/ops_execute here...

Matt

> 
> I believe the Kasan report but I just can't spot the bug - can you point
> out the retry path to me?
> 
> Matt
> 
> > xe_pt_update_ops_prepare() again. The second call walks the same op list
> > and fills ops[] starting from current_op, which still holds the value
> > from the first attempt. This indexes past the end of the ops array
> > allocated by xe_vma_ops_alloc(), whose size was computed for a single
> > pass.
> > 
> > KASAN reported:
> >   BUG: KASAN: slab-out-of-bounds in bind_op_prepare+0x89c/0xae0 [xe]
> >   Write of size 8 at addr ffff88812e72bae8 by task xe_evict/2848
> >   [...]
> >   bind_op_prepare+0x89c/0xae0 [xe]
> >   xe_pt_update_ops_prepare+0xbd0/0x1570 [xe]
> >   ops_execute+0x3ae/0x2030 [xe]
> >   vm_bind_ioctl_ops_execute+0x4d5/0xed0 [xe]
> > 
> > The write lands at ops[1].vma (offset 360 into the second element of a
> > one-element 384-byte allocation) because entries[] is exactly 360 bytes
> > and current_op was 1 at the start of the retried prepare pass.
> > 
> > Fix by resetting current_op to 0 in xe_pt_update_ops_init().
> > 
> > Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job")
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.12+
> > Assisted-by: GitHub Copilot:claude-sonnet-4.6
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_pt.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > index 8e5f4f0dea3f..3607cd57fc4c 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -2291,6 +2291,7 @@ xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops)
> >  	init_llist_head(&pt_update_ops->deferred);
> >  	pt_update_ops->start = ~0x0ull;
> >  	pt_update_ops->last = 0x0ull;
> > +	pt_update_ops->current_op = 0;
> >  	xe_page_reclaim_list_init(&pt_update_ops->prl);
> >  }
> >  
> > -- 
> > 2.53.0
> > 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry
  2026-04-03  2:43   ` Matthew Brost
@ 2026-04-03 10:00     ` Thomas Hellström
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Hellström @ 2026-04-03 10:00 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, Matthew Auld, stable

On Thu, 2026-04-02 at 19:43 -0700, Matthew Brost wrote:
> On Thu, Apr 02, 2026 at 07:42:06PM -0700, Matthew Brost wrote:
> > On Thu, Apr 02, 2026 at 11:15:39AM +0200, Thomas Hellström wrote:
> > > xe_pt_update_ops_prepare() calls xe_pt_update_ops_init() at the
> > > start of
> > > each invocation to reset per-attempt state, but current_op was
> > > not
> > > included in that reset. When vm_bind_ioctl_ops_execute() retries
> > > due to
> > > ww-mutex contention (drm_exec_retry_on_contention), ops_execute()
> > > calls
> > 
> > I'm falling to see retry path around vm_bind_ioctl_ops_execute
> > related
> > to drm_exec_retry_on_contention... Also by the time we get to
> > vm_bind_ioctl_ops_execute we have all dma-resv, right?
> 
> s/vm_bind_ioctl_ops_execute/ops_execute here...
> 
> Matt

So indeed the error commit message states that the retry happens
earlier, but the KASAN message indicates that ops_execute() was already
started with the same vops. The patch indeed fixes the KASAN splat.

We might be looking at a bigger issue here, since when we
xe_vm_set_validation_exec() we need to be prepared to handle -EDEADLK
(and -ENOMEM) for that matter.

I guess in this situation those would primarily come from allocating
and validating page-table bos, and if there is a contention arising
from *any* ww lock (like in the future eviction) in ops_execute(), that
contention affects the __until_all_locked() and causes an implicit
rerun.

so I need to dig down into what's actually causing the rerun in this
case, and we need to ensure to properly handle -EDEADLKS and -ENOMEMS
after the xe_set_validation_exec() enclosed regions.

/Thomas.


> 
> > 
> > I believe the Kasan report but I just can't spot the bug - can you
> > point
> > out the retry path to me?
> > 
> > Matt
> > 
> > > xe_pt_update_ops_prepare() again. The second call walks the same
> > > op list
> > > and fills ops[] starting from current_op, which still holds the
> > > value
> > > from the first attempt. This indexes past the end of the ops
> > > array
> > > allocated by xe_vma_ops_alloc(), whose size was computed for a
> > > single
> > > pass.
> > > 
> > > KASAN reported:
> > >   BUG: KASAN: slab-out-of-bounds in bind_op_prepare+0x89c/0xae0
> > > [xe]
> > >   Write of size 8 at addr ffff88812e72bae8 by task xe_evict/2848
> > >   [...]
> > >   bind_op_prepare+0x89c/0xae0 [xe]
> > >   xe_pt_update_ops_prepare+0xbd0/0x1570 [xe]
> > >   ops_execute+0x3ae/0x2030 [xe]
> > >   vm_bind_ioctl_ops_execute+0x4d5/0xed0 [xe]
> > > 
> > > The write lands at ops[1].vma (offset 360 into the second element
> > > of a
> > > one-element 384-byte allocation) because entries[] is exactly 360
> > > bytes
> > > and current_op was 1 at the start of the retried prepare pass.
> > > 
> > > Fix by resetting current_op to 0 in xe_pt_update_ops_init().
> > > 
> > > Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into
> > > single job")
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > Cc: <stable@vger.kernel.org> # v6.12+
> > > Assisted-by: GitHub Copilot:claude-sonnet-4.6
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_pt.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > b/drivers/gpu/drm/xe/xe_pt.c
> > > index 8e5f4f0dea3f..3607cd57fc4c 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > @@ -2291,6 +2291,7 @@ xe_pt_update_ops_init(struct
> > > xe_vm_pgtable_update_ops *pt_update_ops)
> > >  	init_llist_head(&pt_update_ops->deferred);
> > >  	pt_update_ops->start = ~0x0ull;
> > >  	pt_update_ops->last = 0x0ull;
> > > +	pt_update_ops->current_op = 0;
> > >  	xe_page_reclaim_list_init(&pt_update_ops->prl);
> > >  }
> > >  
> > > -- 
> > > 2.53.0
> > > 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-03 10:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02  9:15 [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry Thomas Hellström
2026-04-02  9:27 ` Matthew Auld
2026-04-03  2:42 ` Matthew Brost
2026-04-03  2:43   ` Matthew Brost
2026-04-03 10:00     ` Thomas Hellström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox