From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Maarten Lankhorst <dev@lankhorst.se>
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
linux-rt-devel@lists.linux.dev,
Mario Kleiner <mario.kleiner.de@gmail.com>,
Mike Galbraith <umgwanakikbuti@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Clark Williams <clrkwllms@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH v2 0/7] drm/i915/display: Handle vblank evasion with CONFIG_PREEMPT_RT
Date: Wed, 5 Nov 2025 14:47:40 +0100 [thread overview]
Message-ID: <20251105134740.NseZnpeZ@linutronix.de> (raw)
In-Reply-To: <20251104083634.670753-1-dev@lankhorst.se>
Thank you for looking into this. I applied this on top of
drm-intel-next-2025-11-04 for testing.
On 2025-11-04 09:36:24 [+0100], Maarten Lankhorst wrote:
> There is a critical section between intel_pipe_update_start() and
> intel_pipe_update_end() where we only program hardware registers,
> should not take any lock and complete as fast as possible.
>
> The previous approach used to remove the local_irq_enable/disable()
> in this critical, but that increases the probability that the time
> sensitive section does not complete in 100 µs, potentially causing
> the hardware to hang.
>
> I went through all the lockdep splats that occurred in CI, and fixed
> them 1 by 1 until there were none left. This additionally improves
> latency by not removing any locks in the fastpath.
>
> In intel_de.h the implicit DMC wakelock was added, ideally we can get rid
> of it, but for now we can simply use the same _fw variants as are
> needed on i915.
>
> I believe this series is enough to make xe and perhaps good enough to make
> i915's display RT safe.
I've been playing with it:
- DRM_XE_DEBUG_GUC ended in a segfault:
| xe 0000:00:02.0: enabling device (0006 -> 0007)
| Console: switching to colour dummy device 80x25
| xe 0000:00:02.0: vgaarb: deactivate vga console
| xe 0000:00:02.0: [drm] Running in SR-IOV PF mode
| xe 0000:00:02.0: [drm] Found tigerlake/uy (device ID 9a49) integrated display version 12.00 stepping C0
| xe 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=mem
| xe 0000:00:02.0: [drm] Finished loading DMC firmware i915/tgl_dmc_ver2_12.bin (v2.12)
| xe 0000:00:02.0: [drm] Tile0: GT0: Using GuC firmware from i915/tgl_guc_70.bin version 70.49.4
| xe 0000:00:02.0: [drm] Tile0: GT0: Using HuC firmware from i915/tgl_huc.bin version 7.9.3
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs1 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs3 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs4 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs5 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs6 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs7 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vecs1 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vecs2 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vecs3 fused off
| BUG: kernel NULL pointer dereference, address: 0000000000000000
| #PF: supervisor read access in kernel mode
| #PF: error_code(0x0000) - not-present page
| PGD 0 P4D 0
| Oops: Oops: 0000 [#1] SMP NOPTI
| CPU: 1 UID: 0 PID: 631 Comm: kworker/u32:9 Tainted: G U E 6.18.0-rc1+ #31 PREEMPT_{RT,(lazy)}
| Tainted: [U]=USER, [E]=UNSIGNED_MODULE
| Hardware name: LENOVO 20TD00GLGE/20TD00GLGE, BIOS R1EET64W(1.64 ) 03/18/2025
| Workqueue: drm_sched_run_job_work [gpu_sched]
| RIP: 0010:stack_depot_save_flags+0x168/0xb00
| Code: c2 44 31 d0 41 c1 ca 08 44 29 d0 41 89 c0 45 89 c2 44 23 15 82 68 a1 03 49 c1 e2 04 4c 03 15 7f 68 a1 03 65 ff 05 f8 b9 f6 01 <4d> 8b 0a 4d 39 ca 75 1c e9 ae 00 00 00 66 66 2e 0f 1f 84 00 00 00
| RSP: 0018:ffffaa58c1a9f6a8 EFLAGS: 00010282
| RAX: 0000000006425a49 RBX: 0000000000000001 RCX: 000000000000000e
| RDX: 000000000000000e RSI: 00000000be7f6f1a RDI: 0000000000000001
| RBP: ffffaa58c1a9f700 R08: 0000000006425a49 R09: 000000000739c857
| R10: 0000000000000000 R11: 00000000000025b3 R12: 0000000000004502
| R13: ffff9c3a8f2b18b8 R14: 0000000000002800 R15: 000000000000000d
| FS: 0000000000000000(0000) GS:ffff9c3e55884000(0000) knlGS:0000000000000000
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 0000000000000000 CR3: 000000003e854003 CR4: 0000000000f72ef0
| PKRU: 55555554
| Call Trace:
| Call Trace:
| <TASK>
| fast_req_track+0x58/0xb0 [xe]
| h2g_write+0x39f/0x720 [xe]
| __guc_ct_send_locked+0x1e4/0x10f0 [xe]
| guc_ct_send_locked+0xa4/0x690 [xe]
| guc_ct_send+0x74/0x250 [xe]
| xe_guc_ct_send+0x19/0x50 [xe]
| __register_exec_queue.isra.0+0x7e/0xa0 [xe]
| register_exec_queue+0x2f4/0x750 [xe]
| guc_exec_queue_run_job+0x4f6/0x8f0 [xe]
| drm_sched_run_job_work+0x1ef/0x450 [gpu_sched]
This happens also without the series and without PREEMPT_RT enabled. I
just to a while to figure this one out on the hardware in question since
it all just froze…
Other than that, XE seems fine.
i915.
- drm/i915/display: Move vblank put until after critical section
- intel_vblank_evade() is invoked with irq-off
- within its callchain intel_vblank_section_enter() does spin_lock()
in I915. XE does nothing so it is fine.
- intel_crtc_scanlines_since_frame_timestamp() does a while loop
What is the expected/ possible worst case here and when does it happen?
While at it, I noticed:
local_irq_disable();
if (new_plane_state->uapi.visible) {
intel_plane_update_noarm(NULL, plane, crtc_state, new_plane_state);
if (plane->fbc)
intel_fbc_dirty_rect_update_noarm(dsb, plane);
if (!HAS_FBC_DIRTY_RECT(display))
return;
mutex_lock(&fbc->lock); <----
Haven't checked the callbacks but it feels like a lot of code with
disabled interrupts.
- The GEM_BUG_ON(!irqs_disabled) in __i915_request_submit()/
__i915_request_unsubmit() case bugs/ warnings:
| WARNING: CPU: 3 PID: 2115 at drivers/gpu/drm/i915/i915_request.c:611 __i915_request_submit+0x1db/0x1f0 [i915]
| CPU: 3 UID: 0 PID: 2115 Comm: modprobe Not tainted 6.18.0-rc1+ #17 PREEMPT_{RT,(lazy)}
| Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3-M, BIOS P2.30 06/29/2012
| RIP: 0010:__i915_request_submit+0x1db/0x1f0 [i915]
| Call Trace:
| <TASK>
| i915_request_submit+0x29/0x40 [i915]
| i9xx_submit_request+0xe/0x70 [i915]
| submit_notify+0xc1/0x230 [i915]
| __i915_sw_fence_complete+0x88/0x290 [i915]
| __engine_park+0x2d2/0x410 [i915]
| ____intel_wakeref_put_last+0x25/0x90 [i915]
| intel_gt_resume.part.0+0x2ec/0x380 [i915]
| intel_gt_init+0x14d/0x3d0 [i915]
| i915_gem_init+0x14b/0x290 [i915]
| i915_driver_probe+0x74a/0xc10 [i915]
| i915_pci_probe+0xd7/0x190 [i915]
| local_pci_probe+0x41/0x80
- The change of irq-enable is not enough:
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
| in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 2115, name: modprobe
| preempt_count: 0, expected: 0
| RCU nest depth: 0, expected: 0
| 4 locks held by modprobe/2115:
| #0: ffff99b9425161a0 (&dev->mutex){....}-{4:4}, at: __driver_attach+0xaf/0x1c0
| #1: ffffaa224810f6c0 (crtc_ww_class_acquire){+.+.}-{0:0}, at: intel_initial_commit+0x4c/0x200 [i915]
| #2: ffffaa224810f6e8 (crtc_ww_class_mutex){+.+.}-{4:4}, at: intel_initial_commit+0x4c/0x200 [i915]
| #3: ffff99b94a6c9030 (&uncore->lock){+.+.}-{3:3}, at: gen6_write32+0x50/0x290 [i915]
| irq event stamp: 513344
| hardirqs last enabled at (513343): [<ffffffff8ba8d84c>] _raw_spin_unlock_irqrestore+0x4c/0x60
| hardirqs last disabled at (513344): [<ffffffffc1543646>] intel_pipe_update_start+0x216/0x2c0 [i915]
| softirqs last enabled at (512766): [<ffffffff8af045cf>] __local_bh_enable_ip+0x10f/0x170
| softirqs last disabled at (512712): [<ffffffffc14dfb6a>] __i915_request_queue+0x3a/0x70 [i915]
| CPU: 3 UID: 0 PID: 2115 Comm: modprobe Tainted: G W 6.18.0-rc1+ #17 PREEMPT_{RT,(lazy)}
| Tainted: [W]=WARN
| Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3-M, BIOS P2.30 06/29/2012
| Call Trace:
| <TASK>
| dump_stack_lvl+0x68/0x90
| __might_resched.cold+0xf0/0x12b
| rt_spin_lock+0x5f/0x200
| gen6_write32+0x50/0x290 [i915]
| ilk_set_pipeconf+0x12d/0x230 [i915]
| ilk_color_commit_arm+0x2d/0x70 [i915]
| intel_update_crtc+0x15b/0x690 [i915]
| intel_commit_modeset_enables+0xa6/0xd0 [i915]
| intel_atomic_commit_tail+0xd55/0x19a0 [i915]
| intel_atomic_commit+0x25d/0x2a0 [i915]
| drm_atomic_commit+0xad/0xe0 [drm]
| intel_initial_commit+0x16c/0x200 [i915]
| intel_display_driver_probe+0x2e/0x80 [i915]
| i915_driver_probe+0x791/0xc10 [i915]
| i915_pci_probe+0xd7/0x190 [i915]
This is the intel_pipe_update_start()/ intel_pipe_update_end() part in intel_update_crtc().
Sebastian
next prev parent reply other threads:[~2025-11-05 13:47 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-04 8:36 [PATCH v2 0/7] drm/i915/display: Handle vblank evasion with CONFIG_PREEMPT_RT Maarten Lankhorst
2025-11-04 8:36 ` [PATCH v2 1/7] drm/i915/display: Make get_vblank_counter use intel_de_read_fw() Maarten Lankhorst
2025-11-04 8:36 ` [PATCH v2 2/7] drm/i915/display: Use intel_de_write_fw in intel_pipe_fastset Maarten Lankhorst
2025-11-26 19:19 ` Shankar, Uma
2025-11-26 19:42 ` Ville Syrjälä
2025-11-26 19:56 ` Shankar, Uma
2025-12-01 17:43 ` Maarten Lankhorst
2025-12-11 14:35 ` Maarten Lankhorst
2025-11-04 8:36 ` [PATCH v2 3/7] drm/i915/display: Move vblank put until after critical section Maarten Lankhorst
2025-11-26 19:25 ` Shankar, Uma
2025-11-04 8:36 ` [PATCH v2 4/7] drm/i915/display: Remove locking from intel_vblank_evade " Maarten Lankhorst
2025-11-26 20:03 ` Shankar, Uma
2025-12-01 14:13 ` Maarten Lankhorst
2025-11-26 21:17 ` Ville Syrjälä
2025-12-11 9:59 ` Maarten Lankhorst
2025-11-04 8:36 ` [PATCH v2 5/7] drm/i915/display: Make icl_dsi_frame_update use _fw too Maarten Lankhorst
2025-11-04 8:36 ` [PATCH v2 6/7] drm/i915/display: Enable interrupts earlier on PREEMPT_RT Maarten Lankhorst
2025-11-26 20:45 ` Shankar, Uma
2025-11-26 22:57 ` Ville Syrjälä
2025-12-01 17:39 ` Maarten Lankhorst
2025-11-04 8:36 ` [PATCH v2 7/7] drm/i915: Use preempt_disable/enable_rt() where recommended Maarten Lankhorst
2025-11-05 13:47 ` Sebastian Andrzej Siewior [this message]
2025-11-05 20:42 ` [PATCH v2 0/7] drm/i915/display: Handle vblank evasion with CONFIG_PREEMPT_RT Maarten Lankhorst
2025-11-10 16:09 ` Sebastian Andrzej Siewior
2025-11-10 18:17 ` Maarten Lankhorst
2025-11-11 9:47 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251105134740.NseZnpeZ@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=clrkwllms@kernel.org \
--cc=dev@lankhorst.se \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=mario.kleiner.de@gmail.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=umgwanakikbuti@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).