Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.auld@intel.com>
To: "Tapani Pälli" <tapani.palli@intel.com>, intel-xe@lists.freedesktop.org
Cc: matthew.brost@intel.com
Subject: Re: [PATCH] drm/xe: Fix NULL pointer dereference in xe_exec_ioctl
Date: Wed, 17 Dec 2025 11:51:55 +0000	[thread overview]
Message-ID: <7c5458b3-8f14-4495-9e0f-f2feb99f411e@intel.com> (raw)
In-Reply-To: <dfaff8d8-9158-411c-8feb-b2dfddf85e29@intel.com>

On 17/12/2025 10:51, Tapani Pälli wrote:
> 
> On 12/17/25 12:31, Matthew Auld wrote:
>> On 17/12/2025 10:27, Matthew Auld wrote:
>>> On 17/12/2025 06:17, Tapani Pälli wrote:
>>>> Helper function xe_sync_needs_wait expects sync->fence when accessing
>>>> flags, patch makes sure we call only when sync->fence exists.
>>>>
>>>> Fixes NULL pointer dereference seen with Vulkan workloads:
>>>>
>>>> [  118.410401] RIP: 0010:xe_sync_needs_wait+0x27/0x50 [xe]
>>>>
>>>> Fixes: 4ac9048d0501 ("drm/xe: Wait on in-syncs when swicthing to 
>>>> dma- fence mode")
>>>> Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/xe/xe_exec.c | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/ 
>>>> xe_exec.c
>>>> index 730a5c9c2637..ea368f02cb9f 100644
>>>> --- a/drivers/gpu/drm/xe/xe_exec.c
>>>> +++ b/drivers/gpu/drm/xe/xe_exec.c
>>>> @@ -184,7 +184,8 @@ int xe_exec_ioctl(struct drm_device *dev, void 
>>>> *data, struct drm_file *file)
>>>>           if (xe_sync_is_ufence(&syncs[num_syncs]))
>>>>               num_ufence++;
>>>> -        if (!num_in_sync && xe_sync_needs_wait(&syncs[num_syncs]))
>>>> +        if (!num_in_sync && syncs[num_syncs].fence &&
>>>> +            xe_sync_needs_wait(&syncs[num_syncs]))
>>>
>>> In xe_sync_entry_parse() it looks like it will always populate the 
>>> fence for the !signal case, otherwise throwing an error if that is 
>>> not possible. And xe_sync_needs_wait() will only touch the fence if 
>>> it's the signal case? So it seems like this should not be possible?
>>
>> Sorry meant to type:
>>
>> s/touch the fence if it's the signal/touch the fence if it's the !signal/
>>
> I'm not sure of the complete flow but it is quite easy to reproduce.

Oh, I think I see it now.

I assume the below also fixes it? There is also the case in 
xe_sync_entry_wait() which has the same bug, I think, so would also need 
to handle that.

@@ -178,6 +179,9 @@ int xe_sync_entry_parse(struct xe_device *xe, struct 
xe_file *xef,
  
sync_in.timeline_value);
                         if (err)
                                 return err;
+
+                       if (!sync->fence)
+                               sync->fence = dma_fence_get_stub();

So it looks like dma_fence_chain_find_seqno() can return NULL if the 
seqno has already signalled. Also means we are missing some IGT coverage 
for this.

Can you resend your patch but perhaps we move the check into 
needs_wait() and then add a similar check in xe_sync_entry_wait() also? 
Or perhaps might be cleaner to have xe_sync_entry_wait() first check 
needs_wait(), which will now also check for a NULL fence?

> 
> Here is the stacktrace snippet:
> 
> [  118.410401] RIP: 0010:xe_sync_needs_wait+0x27/0x50 [xe]
> [  118.410940] Code: 90 90 90 0f 1f 44 00 00 55 48 89 f1 48 89 e5 48 83 
> ec 08 48 83 7e 08 00 0f 84 79 be 1c 00 31 c0 f6 41 4c 01 75 11 48 8b 41 
> 08 <48> 8b 40 30 48 d1 e8 83 e0 01 83 f0 01 c9 31 d2 31 c9 31 f6 31 ff
> [  118.410949] RSP: 0018:ffffccec8ea7bb18 EFLAGS: 00010246
> [  118.410957] RAX: 0000000000000000 RBX: 00007ff23812f460 RCX: 
> ffff8c1790db4800
> [  118.410964] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> 0000000000000000
> [  118.410968] RBP: ffffccec8ea7bb20 R08: 0000000000000000 R09: 
> 0000000000000000
> [  118.410973] R10: 0000000000000000 R11: 0000000000000000 R12: 
> ffff8c1790db4800
> [  118.410978] R13: 0000000000000000 R14: ffff8c178b67e000 R15: 
> 0000000000000000
> [  118.410984] FS:  0000000101aff6c0(0000) GS:ffff8c27350d1000(0000) 
> knlGS:000000007fe20000
> [  118.410992] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  118.410998] CR2: 0000000000000030 CR3: 00000001037ef005 CR4: 
> 0000000000772ef0
> [  118.411004] PKRU: 55555554
> [  118.411008] Call Trace:
> [  118.411013]  <TASK>
> [  118.411022]  xe_exec_ioctl+0x375/0xea0 [xe]
> [  118.411377]  ? __x64_sys_ioctl+0xbd/0x100
> [  118.411392]  ? do_syscall_64+0xa7/0x580
> [  118.411403]  ? dma_fence_free+0x1a/0x30
> [  118.411419]  ? __pfx_xe_exec_fn+0x10/0x10 [xe]
> [  118.411752]  ? drm_syncobj_array_free+0x56/0x80 [drm]
> [  118.411915]  ? drm_syncobj_query_ioctl+0x20f/0x460 [drm]
> [  118.412035]  ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
> [  118.412372]  drm_ioctl_kernel+0xae/0x110 [drm]
> [  118.412524]  drm_ioctl+0x2ee/0x5d0 [drm]
> [  118.412649]  ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
> [  118.412989]  ? __pm_runtime_resume+0x5f/0x90
> [  118.413002]  xe_drm_ioctl+0x61/0xb0 [xe]
> [  118.413329]  __x64_sys_ioctl+0xa3/0x100
> [  118.413338]  x64_sys_call+0x1060/0x2360
> [  118.413347]  do_syscall_64+0x74/0x580
> [  118.413355]  ? x64_sys_call+0x1060/0x2360
> [  118.413360]  ? do_syscall_64+0xa7/0x580
> [  118.413368]  ? __do_sys_getpid+0x1d/0x30
> [  118.413377]  ? x64_sys_call+0xf9d/0x2360
> [  118.413383]  ? do_syscall_64+0xa7/0x580
> [  118.413389]  ? x64_sys_call+0x1060/0x2360
> [  118.413395]  ? do_syscall_64+0xa7/0x580
> [  118.413401]  ? sysvec_apic_timer_interrupt+0x54/0xd0
> [  118.413410]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  118.413418] RIP: 0033:0x7ff29c93287d
> 
> 
> 
>>>
>>>>               num_in_sync++;
>>>>       }
>>>
>>


  parent reply	other threads:[~2025-12-17 11:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-17  6:17 [PATCH] drm/xe: Fix NULL pointer dereference in xe_exec_ioctl Tapani Pälli
2025-12-17  8:00 ` ✓ CI.KUnit: success for " Patchwork
2025-12-17  9:01 ` ✓ Xe.CI.BAT: " Patchwork
2025-12-17 10:27 ` [PATCH] " Matthew Auld
2025-12-17 10:31   ` Matthew Auld
2025-12-17 10:51     ` Tapani Pälli
2025-12-17 11:42       ` Tapani Pälli
2025-12-17 11:51       ` Matthew Auld [this message]
2025-12-17 12:22         ` Tapani Pälli
2025-12-17 12:32           ` [PATCH v2] " Tapani Pälli
2025-12-17 12:57             ` Matthew Auld
2025-12-17 13:21               ` Tapani Pälli
2025-12-17 13:24                 ` [PATCH v3] " Tapani Pälli
2025-12-17 14:56                   ` Matthew Auld
2025-12-17 21:04                     ` Matthew Brost
2025-12-17 21:16           ` [PATCH] " Matthew Brost
2025-12-17 13:10 ` ✓ CI.KUnit: success for drm/xe: Fix NULL pointer dereference in xe_exec_ioctl (rev2) Patchwork
2025-12-17 13:48 ` ✓ Xe.CI.BAT: " Patchwork
2025-12-17 15:43 ` ✓ CI.KUnit: success for drm/xe: Fix NULL pointer dereference in xe_exec_ioctl (rev3) Patchwork
2025-12-17 16:23 ` ✓ Xe.CI.BAT: " Patchwork
2025-12-18 14:16 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c5458b3-8f14-4495-9e0f-f2feb99f411e@intel.com \
    --to=matthew.auld@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=tapani.palli@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox