All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] drm/i915: Use rcu_dereference() in hwsp_offset()
@ 2025-11-04 11:33 Sebastian Brzezinka
  2025-11-04 11:42 ` Sebastian Brzezinka
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Sebastian Brzezinka @ 2025-11-04 11:33 UTC (permalink / raw)
  To: intel-gfx, chris.p.wilson, andi.shyti, krzysztof.karas,
	krzysztof.niemiec, sebastian.brzezinka
  Cc: chris.p.wilson, andi.shyti, krzysztof.karas, krzysztof.niemiec,
	Sebastian Brzezinka

Replace rcu_dereference_protected() with rcu_dereference() in
hwsp_offset() since the function is called within an RCU read-side
critical section. Using rcu_dereference() avoids unnecessary
protection checks and aligns with correct RCU usage patterns.

Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
---
I noticed that the current implementation of hwsp_offset() uses rcu_dereference_protected()
when accessing rq->timeline. This seems to be a slight misuse of the API.
rcu_dereference_protected() is intended for updater-side code, where we are not holding
rcu_read_lock() but instead rely on another lock that guarantees safety. The condition argument
in this function acts more like an assertion that the caller holds the required lock.
In our case, hwsp_offset() is called inside an RCU read-side critical section, which means
the correct primitive is rcu_dereference(). The original intent of the condition argument
seems to have been to guard against use-after-free scenarios for timeline(?). However,
rcu_dereference_protected() does not enforce that, it simply returns the pointer regardless
of i915_request_signaled(), and in rare cases this pattern has led to issues such as:
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15181
'''
...
<4> [281.246503] drivers/gpu/drm/i915/gt/gen8_engine_cs.c:427 suspicious rcu_dereference_protected() usage!
<4> [281.246506]
other info that might help us debug this:
<4> [281.246507]
rcu_scheduler_active = 2, debug_locks = 1
<4> [281.246509] 5 locks held by gem_exec_whispe/2308:
<4> [281.246511]  #0: ffffc90002ae77c8 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_gem_do_execbuffer+0xd2c/0x3710 [i915]
<4> [281.246852]  #1: ffffc90002ae77f0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_do_execbuffer+0xd2c/0x3710 [i915]
<4> [281.247073]  #2: ffff8881e8a4a878 (&timeline->mutex){+.+.}-{3:3}, at: i915_request_create+0x61/0x200 [i915]
<4> [281.247403]  #3: ffffffff83595560 (rcu_read_lock){....}-{1:2}, at: execlists_submission_tasklet+0x44/0x27b0 [i915]
<4> [281.247592]  #4: ffff88812f0c2020 (&sched_engine->lock){-.-.}-{2:2}, at: execlists_submission_tasklet+0x20d/0x27b0 [i915]
<4> [281.247787]
stack backtrace:
<4> [281.247789] CPU: 9 UID: 0 PID: 2308 Comm: gem_exec_whispe Tainted: G     U              6.17.0-CI_DRM_17306-gb3f121acbde4+ #1 PREEMPT(voluntary)
<4> [281.247792] Tainted: [U]=USER
<4> [281.247792] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.6062.A00.2502050210 02/05/2025
<4> [281.247793] Call Trace:
<4> [281.247794]  <IRQ>
<4> [281.247796]  dump_stack_lvl+0x91/0xf0
<4> [281.247802]  dump_stack+0x10/0x20
<4> [281.247804]  lockdep_rcu_suspicious+0x151/0x1e0
<4> [281.247811]  ? __i915_request_submit+0xb0/0x430 [i915]
<4> [281.248010]  hwsp_offset+0x90/0xa0 [i915]
<4> [281.248199]  gen12_emit_fini_breadcrumb_rcs+0xdf/0x480 [i915]
<4> [281.248388]  ? __i915_request_submit+0xb0/0x430 [i915]
<4> [281.248584]  __i915_request_submit+0x15b/0x430 [i915]
<4> [281.248781]  execlists_submission_tasklet+0xdfa/0x27b0 [i915]
<4> [281.248974]  ? mark_held_locks+0x46/0x90
<4> [281.248982]  tasklet_action_common+0x166/0x410
<4> [281.248988]  tasklet_hi_action+0x29/0x40
<4> [281.248990]  handle_softirqs+0xd7/0x4d0
<4> [281.248994]  ? __i915_request_queue+0x3f/0x80 [i915]
<4> [281.249194]  __do_softirq+0x10/0x18
<4> [281.249197]  do_softirq.part.0+0x47/0xd0
...
'''

This issue reproduces very rarely, and I haven’t been able to reproduce it myself, so
I’m not entirely sure why this scenario occurs why we attempt to emit a breadcrumb even
when the request’s fence is already signaled. However, the correct approach seems to be:
 - Drop the use of rcu_dereference_protected() in this context, since it’s not providing
   real safety here.
 - Avoid emitting a breadcrumb at all when the request is already signaled, as
   doing so appears unnecessary.

My concern is that breadcrumbs seem to be emitted in __i915_request_submit, which appears
to be well-guarded against processing retried requests. This leaves me puzzled about what’s
actually happening here.
---
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 3 +--
 drivers/gpu/drm/i915/i915_request.c      | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index e9f65f27b53f..b799d423d306 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -424,8 +424,7 @@ static u32 hwsp_offset(const struct i915_request *rq)
 	const struct intel_timeline *tl;
 
 	/* Before the request is executed, the timeline is fixed */
-	tl = rcu_dereference_protected(rq->timeline,
-				       !i915_request_signaled(rq));
+	tl = rcu_dereference(rq->timeline);
 
 	/* See the comment in i915_request_active_seqno(). */
 	return page_mask_bits(tl->hwsp_offset) + offset_in_page(rq->hwsp_seqno);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index b9a2b2194c8f..25a9e574149e 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -657,7 +657,7 @@ bool __i915_request_submit(struct i915_request *request)
 	if (request->sched.semaphores &&
 	    i915_sw_fence_signaled(&request->semaphore))
 		engine->saturated |= request->sched.semaphores;
-
+	/*It seems that breadcrumbs are being emitted here.*/
 	engine->emit_fini_breadcrumb(request,
 				     request->ring->vaddr + request->postfix);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-11-19 15:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-04 11:33 [RFC PATCH] drm/i915: Use rcu_dereference() in hwsp_offset() Sebastian Brzezinka
2025-11-04 11:42 ` Sebastian Brzezinka
2025-11-05  1:59 ` ✗ i915.CI.BAT: failure for " Patchwork
2025-11-19 10:29 ` ✗ i915.CI.BAT: failure for drm/i915: Use rcu_dereference() in hwsp_offset() (rev2) Patchwork
2025-11-19 15:58 ` [RFC PATCH] drm/i915: Use rcu_dereference() in hwsp_offset() Janusz Krzysztofik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.