* [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry
@ 2026-04-02 9:15 Thomas Hellström
2026-04-02 9:27 ` Matthew Auld
2026-04-03 2:42 ` Matthew Brost
0 siblings, 2 replies; 5+ messages in thread
From: Thomas Hellström @ 2026-04-02 9:15 UTC (permalink / raw)
To: intel-xe; +Cc: Thomas Hellström, Matthew Brost, Matthew Auld, stable
xe_pt_update_ops_prepare() calls xe_pt_update_ops_init() at the start of
each invocation to reset per-attempt state, but current_op was not
included in that reset. When vm_bind_ioctl_ops_execute() retries due to
ww-mutex contention (drm_exec_retry_on_contention), ops_execute() calls
xe_pt_update_ops_prepare() again. The second call walks the same op list
and fills ops[] starting from current_op, which still holds the value
from the first attempt. This indexes past the end of the ops array
allocated by xe_vma_ops_alloc(), whose size was computed for a single
pass.
KASAN reported:
BUG: KASAN: slab-out-of-bounds in bind_op_prepare+0x89c/0xae0 [xe]
Write of size 8 at addr ffff88812e72bae8 by task xe_evict/2848
[...]
bind_op_prepare+0x89c/0xae0 [xe]
xe_pt_update_ops_prepare+0xbd0/0x1570 [xe]
ops_execute+0x3ae/0x2030 [xe]
vm_bind_ioctl_ops_execute+0x4d5/0xed0 [xe]
The write lands at ops[1].vma (offset 360 into the second element of a
one-element 384-byte allocation) because entries[] is exactly 360 bytes
and current_op was 1 at the start of the retried prepare pass.
Fix by resetting current_op to 0 in xe_pt_update_ops_init().
Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
drivers/gpu/drm/xe/xe_pt.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 8e5f4f0dea3f..3607cd57fc4c 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -2291,6 +2291,7 @@ xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops)
init_llist_head(&pt_update_ops->deferred);
pt_update_ops->start = ~0x0ull;
pt_update_ops->last = 0x0ull;
+ pt_update_ops->current_op = 0;
xe_page_reclaim_list_init(&pt_update_ops->prl);
}
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry
2026-04-02 9:15 [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry Thomas Hellström
@ 2026-04-02 9:27 ` Matthew Auld
2026-04-03 2:42 ` Matthew Brost
1 sibling, 0 replies; 5+ messages in thread
From: Matthew Auld @ 2026-04-02 9:27 UTC (permalink / raw)
To: Thomas Hellström, intel-xe; +Cc: Matthew Brost, stable
On 02/04/2026 10:15, Thomas Hellström wrote:
> xe_pt_update_ops_prepare() calls xe_pt_update_ops_init() at the start of
> each invocation to reset per-attempt state, but current_op was not
> included in that reset. When vm_bind_ioctl_ops_execute() retries due to
> ww-mutex contention (drm_exec_retry_on_contention), ops_execute() calls
> xe_pt_update_ops_prepare() again. The second call walks the same op list
> and fills ops[] starting from current_op, which still holds the value
> from the first attempt. This indexes past the end of the ops array
> allocated by xe_vma_ops_alloc(), whose size was computed for a single
> pass.
>
> KASAN reported:
> BUG: KASAN: slab-out-of-bounds in bind_op_prepare+0x89c/0xae0 [xe]
> Write of size 8 at addr ffff88812e72bae8 by task xe_evict/2848
> [...]
> bind_op_prepare+0x89c/0xae0 [xe]
> xe_pt_update_ops_prepare+0xbd0/0x1570 [xe]
> ops_execute+0x3ae/0x2030 [xe]
> vm_bind_ioctl_ops_execute+0x4d5/0xed0 [xe]
>
> The write lands at ops[1].vma (offset 360 into the second element of a
> one-element 384-byte allocation) because entries[] is exactly 360 bytes
> and current_op was 1 at the start of the retried prepare pass.
>
> Fix by resetting current_op to 0 in xe_pt_update_ops_init().
>
> Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job")
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: <stable@vger.kernel.org> # v6.12+
> Assisted-by: GitHub Copilot:claude-sonnet-4.6
Out of curiosity, was it able to suggest the fix given the KASAN splat?
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> ---
> drivers/gpu/drm/xe/xe_pt.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 8e5f4f0dea3f..3607cd57fc4c 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -2291,6 +2291,7 @@ xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops)
> init_llist_head(&pt_update_ops->deferred);
> pt_update_ops->start = ~0x0ull;
> pt_update_ops->last = 0x0ull;
> + pt_update_ops->current_op = 0;
> xe_page_reclaim_list_init(&pt_update_ops->prl);
> }
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry
2026-04-02 9:15 [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry Thomas Hellström
2026-04-02 9:27 ` Matthew Auld
@ 2026-04-03 2:42 ` Matthew Brost
2026-04-03 2:43 ` Matthew Brost
1 sibling, 1 reply; 5+ messages in thread
From: Matthew Brost @ 2026-04-03 2:42 UTC (permalink / raw)
To: Thomas Hellström; +Cc: intel-xe, Matthew Auld, stable
On Thu, Apr 02, 2026 at 11:15:39AM +0200, Thomas Hellström wrote:
> xe_pt_update_ops_prepare() calls xe_pt_update_ops_init() at the start of
> each invocation to reset per-attempt state, but current_op was not
> included in that reset. When vm_bind_ioctl_ops_execute() retries due to
> ww-mutex contention (drm_exec_retry_on_contention), ops_execute() calls
I'm falling to see retry path around vm_bind_ioctl_ops_execute related
to drm_exec_retry_on_contention... Also by the time we get to
vm_bind_ioctl_ops_execute we have all dma-resv, right?
I believe the Kasan report but I just can't spot the bug - can you point
out the retry path to me?
Matt
> xe_pt_update_ops_prepare() again. The second call walks the same op list
> and fills ops[] starting from current_op, which still holds the value
> from the first attempt. This indexes past the end of the ops array
> allocated by xe_vma_ops_alloc(), whose size was computed for a single
> pass.
>
> KASAN reported:
> BUG: KASAN: slab-out-of-bounds in bind_op_prepare+0x89c/0xae0 [xe]
> Write of size 8 at addr ffff88812e72bae8 by task xe_evict/2848
> [...]
> bind_op_prepare+0x89c/0xae0 [xe]
> xe_pt_update_ops_prepare+0xbd0/0x1570 [xe]
> ops_execute+0x3ae/0x2030 [xe]
> vm_bind_ioctl_ops_execute+0x4d5/0xed0 [xe]
>
> The write lands at ops[1].vma (offset 360 into the second element of a
> one-element 384-byte allocation) because entries[] is exactly 360 bytes
> and current_op was 1 at the start of the retried prepare pass.
>
> Fix by resetting current_op to 0 in xe_pt_update_ops_init().
>
> Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job")
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: <stable@vger.kernel.org> # v6.12+
> Assisted-by: GitHub Copilot:claude-sonnet-4.6
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> drivers/gpu/drm/xe/xe_pt.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 8e5f4f0dea3f..3607cd57fc4c 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -2291,6 +2291,7 @@ xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops)
> init_llist_head(&pt_update_ops->deferred);
> pt_update_ops->start = ~0x0ull;
> pt_update_ops->last = 0x0ull;
> + pt_update_ops->current_op = 0;
> xe_page_reclaim_list_init(&pt_update_ops->prl);
> }
>
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry
2026-04-03 2:42 ` Matthew Brost
@ 2026-04-03 2:43 ` Matthew Brost
2026-04-03 10:00 ` Thomas Hellström
0 siblings, 1 reply; 5+ messages in thread
From: Matthew Brost @ 2026-04-03 2:43 UTC (permalink / raw)
To: Thomas Hellström; +Cc: intel-xe, Matthew Auld, stable
On Thu, Apr 02, 2026 at 07:42:06PM -0700, Matthew Brost wrote:
> On Thu, Apr 02, 2026 at 11:15:39AM +0200, Thomas Hellström wrote:
> > xe_pt_update_ops_prepare() calls xe_pt_update_ops_init() at the start of
> > each invocation to reset per-attempt state, but current_op was not
> > included in that reset. When vm_bind_ioctl_ops_execute() retries due to
> > ww-mutex contention (drm_exec_retry_on_contention), ops_execute() calls
>
> I'm falling to see retry path around vm_bind_ioctl_ops_execute related
> to drm_exec_retry_on_contention... Also by the time we get to
> vm_bind_ioctl_ops_execute we have all dma-resv, right?
s/vm_bind_ioctl_ops_execute/ops_execute here...
Matt
>
> I believe the Kasan report but I just can't spot the bug - can you point
> out the retry path to me?
>
> Matt
>
> > xe_pt_update_ops_prepare() again. The second call walks the same op list
> > and fills ops[] starting from current_op, which still holds the value
> > from the first attempt. This indexes past the end of the ops array
> > allocated by xe_vma_ops_alloc(), whose size was computed for a single
> > pass.
> >
> > KASAN reported:
> > BUG: KASAN: slab-out-of-bounds in bind_op_prepare+0x89c/0xae0 [xe]
> > Write of size 8 at addr ffff88812e72bae8 by task xe_evict/2848
> > [...]
> > bind_op_prepare+0x89c/0xae0 [xe]
> > xe_pt_update_ops_prepare+0xbd0/0x1570 [xe]
> > ops_execute+0x3ae/0x2030 [xe]
> > vm_bind_ioctl_ops_execute+0x4d5/0xed0 [xe]
> >
> > The write lands at ops[1].vma (offset 360 into the second element of a
> > one-element 384-byte allocation) because entries[] is exactly 360 bytes
> > and current_op was 1 at the start of the retried prepare pass.
> >
> > Fix by resetting current_op to 0 in xe_pt_update_ops_init().
> >
> > Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job")
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.12+
> > Assisted-by: GitHub Copilot:claude-sonnet-4.6
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_pt.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > index 8e5f4f0dea3f..3607cd57fc4c 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -2291,6 +2291,7 @@ xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops)
> > init_llist_head(&pt_update_ops->deferred);
> > pt_update_ops->start = ~0x0ull;
> > pt_update_ops->last = 0x0ull;
> > + pt_update_ops->current_op = 0;
> > xe_page_reclaim_list_init(&pt_update_ops->prl);
> > }
> >
> > --
> > 2.53.0
> >
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry
2026-04-03 2:43 ` Matthew Brost
@ 2026-04-03 10:00 ` Thomas Hellström
0 siblings, 0 replies; 5+ messages in thread
From: Thomas Hellström @ 2026-04-03 10:00 UTC (permalink / raw)
To: Matthew Brost; +Cc: intel-xe, Matthew Auld, stable
On Thu, 2026-04-02 at 19:43 -0700, Matthew Brost wrote:
> On Thu, Apr 02, 2026 at 07:42:06PM -0700, Matthew Brost wrote:
> > On Thu, Apr 02, 2026 at 11:15:39AM +0200, Thomas Hellström wrote:
> > > xe_pt_update_ops_prepare() calls xe_pt_update_ops_init() at the
> > > start of
> > > each invocation to reset per-attempt state, but current_op was
> > > not
> > > included in that reset. When vm_bind_ioctl_ops_execute() retries
> > > due to
> > > ww-mutex contention (drm_exec_retry_on_contention), ops_execute()
> > > calls
> >
> > I'm falling to see retry path around vm_bind_ioctl_ops_execute
> > related
> > to drm_exec_retry_on_contention... Also by the time we get to
> > vm_bind_ioctl_ops_execute we have all dma-resv, right?
>
> s/vm_bind_ioctl_ops_execute/ops_execute here...
>
> Matt
So indeed the error commit message states that the retry happens
earlier, but the KASAN message indicates that ops_execute() was already
started with the same vops. The patch indeed fixes the KASAN splat.
We might be looking at a bigger issue here, since when we
xe_vm_set_validation_exec() we need to be prepared to handle -EDEADLK
(and -ENOMEM) for that matter.
I guess in this situation those would primarily come from allocating
and validating page-table bos, and if there is a contention arising
from *any* ww lock (like in the future eviction) in ops_execute(), that
contention affects the __until_all_locked() and causes an implicit
rerun.
so I need to dig down into what's actually causing the rerun in this
case, and we need to ensure to properly handle -EDEADLKS and -ENOMEMS
after the xe_set_validation_exec() enclosed regions.
/Thomas.
>
> >
> > I believe the Kasan report but I just can't spot the bug - can you
> > point
> > out the retry path to me?
> >
> > Matt
> >
> > > xe_pt_update_ops_prepare() again. The second call walks the same
> > > op list
> > > and fills ops[] starting from current_op, which still holds the
> > > value
> > > from the first attempt. This indexes past the end of the ops
> > > array
> > > allocated by xe_vma_ops_alloc(), whose size was computed for a
> > > single
> > > pass.
> > >
> > > KASAN reported:
> > > BUG: KASAN: slab-out-of-bounds in bind_op_prepare+0x89c/0xae0
> > > [xe]
> > > Write of size 8 at addr ffff88812e72bae8 by task xe_evict/2848
> > > [...]
> > > bind_op_prepare+0x89c/0xae0 [xe]
> > > xe_pt_update_ops_prepare+0xbd0/0x1570 [xe]
> > > ops_execute+0x3ae/0x2030 [xe]
> > > vm_bind_ioctl_ops_execute+0x4d5/0xed0 [xe]
> > >
> > > The write lands at ops[1].vma (offset 360 into the second element
> > > of a
> > > one-element 384-byte allocation) because entries[] is exactly 360
> > > bytes
> > > and current_op was 1 at the start of the retried prepare pass.
> > >
> > > Fix by resetting current_op to 0 in xe_pt_update_ops_init().
> > >
> > > Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into
> > > single job")
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > Cc: <stable@vger.kernel.org> # v6.12+
> > > Assisted-by: GitHub Copilot:claude-sonnet-4.6
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@linux.intel.com>
> > > ---
> > > drivers/gpu/drm/xe/xe_pt.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > b/drivers/gpu/drm/xe/xe_pt.c
> > > index 8e5f4f0dea3f..3607cd57fc4c 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > @@ -2291,6 +2291,7 @@ xe_pt_update_ops_init(struct
> > > xe_vm_pgtable_update_ops *pt_update_ops)
> > > init_llist_head(&pt_update_ops->deferred);
> > > pt_update_ops->start = ~0x0ull;
> > > pt_update_ops->last = 0x0ull;
> > > + pt_update_ops->current_op = 0;
> > > xe_page_reclaim_list_init(&pt_update_ops->prl);
> > > }
> > >
> > > --
> > > 2.53.0
> > >
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-03 10:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02 9:15 [PATCH] drm/xe: Fix slab-out-of-bounds on PT update ops retry Thomas Hellström
2026-04-02 9:27 ` Matthew Auld
2026-04-03 2:42 ` Matthew Brost
2026-04-03 2:43 ` Matthew Brost
2026-04-03 10:00 ` Thomas Hellström
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox