linux-rt-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Maarten Lankhorst <dev@lankhorst.se>
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	linux-rt-devel@lists.linux.dev,
	Mario Kleiner <mario.kleiner.de@gmail.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Clark Williams <clrkwllms@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH v2 0/7] drm/i915/display: Handle vblank evasion with CONFIG_PREEMPT_RT
Date: Wed, 5 Nov 2025 14:47:40 +0100	[thread overview]
Message-ID: <20251105134740.NseZnpeZ@linutronix.de> (raw)
In-Reply-To: <20251104083634.670753-1-dev@lankhorst.se>

Thank you for looking into this. I applied this on top of
drm-intel-next-2025-11-04 for testing.

On 2025-11-04 09:36:24 [+0100], Maarten Lankhorst wrote:
> There is a critical section between intel_pipe_update_start() and
> intel_pipe_update_end() where we only program hardware registers,
> should not take any lock and complete as fast as possible.
> 
> The previous approach used to remove the local_irq_enable/disable()
> in this critical, but that increases the probability that the time
> sensitive section does not complete in 100 µs, potentially causing
> the hardware to hang.
> 
> I went through all the lockdep splats that occurred in CI, and fixed
> them 1 by 1 until there were none left. This additionally improves
> latency by not removing any locks in the fastpath.
> 
> In intel_de.h the implicit DMC wakelock was added, ideally we can get rid
> of it, but for now we can simply use the same _fw variants as are
> needed on i915.
> 
> I believe this series is enough to make xe and perhaps good enough to make
> i915's display RT safe.

I've been playing with it:
- DRM_XE_DEBUG_GUC ended in a segfault:
| xe 0000:00:02.0: enabling device (0006 -> 0007)
| Console: switching to colour dummy device 80x25
| xe 0000:00:02.0: vgaarb: deactivate vga console
| xe 0000:00:02.0: [drm] Running in SR-IOV PF mode
| xe 0000:00:02.0: [drm] Found tigerlake/uy (device ID 9a49) integrated display version 12.00 stepping C0
| xe 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=mem
| xe 0000:00:02.0: [drm] Finished loading DMC firmware i915/tgl_dmc_ver2_12.bin (v2.12)
| xe 0000:00:02.0: [drm] Tile0: GT0: Using GuC firmware from i915/tgl_guc_70.bin version 70.49.4
| xe 0000:00:02.0: [drm] Tile0: GT0: Using HuC firmware from i915/tgl_huc.bin version 7.9.3
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs1 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs3 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs4 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs5 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs6 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vcs7 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vecs1 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vecs2 fused off
| xe 0000:00:02.0: [drm] Tile0: GT0: vecs3 fused off
| BUG: kernel NULL pointer dereference, address: 0000000000000000
| #PF: supervisor read access in kernel mode
| #PF: error_code(0x0000) - not-present page
| PGD 0 P4D 0
| Oops: Oops: 0000 [#1] SMP NOPTI
| CPU: 1 UID: 0 PID: 631 Comm: kworker/u32:9 Tainted: G     U      E       6.18.0-rc1+ #31 PREEMPT_{RT,(lazy)}
| Tainted: [U]=USER, [E]=UNSIGNED_MODULE
| Hardware name: LENOVO 20TD00GLGE/20TD00GLGE, BIOS R1EET64W(1.64 ) 03/18/2025
| Workqueue:  drm_sched_run_job_work [gpu_sched]
| RIP: 0010:stack_depot_save_flags+0x168/0xb00
| Code: c2 44 31 d0 41 c1 ca 08 44 29 d0 41 89 c0 45 89 c2 44 23 15 82 68 a1 03 49 c1 e2 04 4c 03 15 7f 68 a1 03 65 ff 05 f8 b9 f6 01 <4d> 8b 0a 4d 39 ca 75 1c e9 ae  00 00 00 66 66 2e 0f 1f 84 00 00 00
| RSP: 0018:ffffaa58c1a9f6a8 EFLAGS: 00010282
| RAX: 0000000006425a49 RBX: 0000000000000001 RCX: 000000000000000e
| RDX: 000000000000000e RSI: 00000000be7f6f1a RDI: 0000000000000001
| RBP: ffffaa58c1a9f700 R08: 0000000006425a49 R09: 000000000739c857
| R10: 0000000000000000 R11: 00000000000025b3 R12: 0000000000004502
| R13: ffff9c3a8f2b18b8 R14: 0000000000002800 R15: 000000000000000d
| FS:  0000000000000000(0000) GS:ffff9c3e55884000(0000) knlGS:0000000000000000
| CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 0000000000000000 CR3: 000000003e854003 CR4: 0000000000f72ef0
| PKRU: 55555554
| Call Trace:
| Call Trace:
|  <TASK>
|  fast_req_track+0x58/0xb0 [xe]
|  h2g_write+0x39f/0x720 [xe]
|  __guc_ct_send_locked+0x1e4/0x10f0 [xe]
|  guc_ct_send_locked+0xa4/0x690 [xe]
|  guc_ct_send+0x74/0x250 [xe]
|  xe_guc_ct_send+0x19/0x50 [xe]
|  __register_exec_queue.isra.0+0x7e/0xa0 [xe]
|  register_exec_queue+0x2f4/0x750 [xe]
|  guc_exec_queue_run_job+0x4f6/0x8f0 [xe]
|  drm_sched_run_job_work+0x1ef/0x450 [gpu_sched]

This happens also without the series and without PREEMPT_RT enabled. I
just to a while to figure this one out on the hardware in question since
it all just froze…

Other than that, XE seems fine.

i915.
- drm/i915/display: Move vblank put until after critical section
 - intel_vblank_evade() is invoked with irq-off
   - within its callchain intel_vblank_section_enter() does spin_lock()
     in I915. XE does nothing so it is fine.
   - intel_crtc_scanlines_since_frame_timestamp() does a while loop
     What is the expected/ possible worst case here and when does it happen?
   While at it, I noticed:
   local_irq_disable();
   if (new_plane_state->uapi.visible) {
           intel_plane_update_noarm(NULL, plane, crtc_state, new_plane_state);
           if (plane->fbc)
                  intel_fbc_dirty_rect_update_noarm(dsb, plane);
                     if (!HAS_FBC_DIRTY_RECT(display))
                         return;
                     mutex_lock(&fbc->lock); <----

   Haven't checked the callbacks but it feels like a lot of code with
   disabled interrupts.

- The GEM_BUG_ON(!irqs_disabled) in __i915_request_submit()/
  __i915_request_unsubmit() case bugs/ warnings:
| WARNING: CPU: 3 PID: 2115 at drivers/gpu/drm/i915/i915_request.c:611 __i915_request_submit+0x1db/0x1f0 [i915]
| CPU: 3 UID: 0 PID: 2115 Comm: modprobe Not tainted 6.18.0-rc1+ #17 PREEMPT_{RT,(lazy)}
| Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3-M, BIOS P2.30 06/29/2012
| RIP: 0010:__i915_request_submit+0x1db/0x1f0 [i915]
| Call Trace:
|  <TASK>
|  i915_request_submit+0x29/0x40 [i915]
|  i9xx_submit_request+0xe/0x70 [i915]
|  submit_notify+0xc1/0x230 [i915]
|  __i915_sw_fence_complete+0x88/0x290 [i915]
|  __engine_park+0x2d2/0x410 [i915]
|  ____intel_wakeref_put_last+0x25/0x90 [i915]
|  intel_gt_resume.part.0+0x2ec/0x380 [i915]
|  intel_gt_init+0x14d/0x3d0 [i915]
|  i915_gem_init+0x14b/0x290 [i915]
|  i915_driver_probe+0x74a/0xc10 [i915]
|  i915_pci_probe+0xd7/0x190 [i915]
|  local_pci_probe+0x41/0x80

- The change of irq-enable is not enough:
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
| in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 2115, name: modprobe
| preempt_count: 0, expected: 0
| RCU nest depth: 0, expected: 0
| 4 locks held by modprobe/2115:
|  #0: ffff99b9425161a0 (&dev->mutex){....}-{4:4}, at: __driver_attach+0xaf/0x1c0
|  #1: ffffaa224810f6c0 (crtc_ww_class_acquire){+.+.}-{0:0}, at: intel_initial_commit+0x4c/0x200 [i915]
|  #2: ffffaa224810f6e8 (crtc_ww_class_mutex){+.+.}-{4:4}, at: intel_initial_commit+0x4c/0x200 [i915]
|  #3: ffff99b94a6c9030 (&uncore->lock){+.+.}-{3:3}, at: gen6_write32+0x50/0x290 [i915]
| irq event stamp: 513344
| hardirqs last  enabled at (513343): [<ffffffff8ba8d84c>] _raw_spin_unlock_irqrestore+0x4c/0x60
| hardirqs last disabled at (513344): [<ffffffffc1543646>] intel_pipe_update_start+0x216/0x2c0 [i915]
| softirqs last  enabled at (512766): [<ffffffff8af045cf>] __local_bh_enable_ip+0x10f/0x170
| softirqs last disabled at (512712): [<ffffffffc14dfb6a>] __i915_request_queue+0x3a/0x70 [i915]
| CPU: 3 UID: 0 PID: 2115 Comm: modprobe Tainted: G        W           6.18.0-rc1+ #17 PREEMPT_{RT,(lazy)}
| Tainted: [W]=WARN
| Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3-M, BIOS P2.30 06/29/2012
| Call Trace:
|  <TASK>
|  dump_stack_lvl+0x68/0x90
|  __might_resched.cold+0xf0/0x12b
|  rt_spin_lock+0x5f/0x200
|  gen6_write32+0x50/0x290 [i915]
|  ilk_set_pipeconf+0x12d/0x230 [i915]
|  ilk_color_commit_arm+0x2d/0x70 [i915]
|  intel_update_crtc+0x15b/0x690 [i915]
|  intel_commit_modeset_enables+0xa6/0xd0 [i915]
|  intel_atomic_commit_tail+0xd55/0x19a0 [i915]
|  intel_atomic_commit+0x25d/0x2a0 [i915]
|  drm_atomic_commit+0xad/0xe0 [drm]
|  intel_initial_commit+0x16c/0x200 [i915]
|  intel_display_driver_probe+0x2e/0x80 [i915]
|  i915_driver_probe+0x791/0xc10 [i915]
|  i915_pci_probe+0xd7/0x190 [i915]

This is the intel_pipe_update_start()/ intel_pipe_update_end() part in intel_update_crtc().

Sebastian

  parent reply	other threads:[~2025-11-05 13:47 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-04  8:36 [PATCH v2 0/7] drm/i915/display: Handle vblank evasion with CONFIG_PREEMPT_RT Maarten Lankhorst
2025-11-04  8:36 ` [PATCH v2 1/7] drm/i915/display: Make get_vblank_counter use intel_de_read_fw() Maarten Lankhorst
2025-11-04  8:36 ` [PATCH v2 2/7] drm/i915/display: Use intel_de_write_fw in intel_pipe_fastset Maarten Lankhorst
2025-11-26 19:19   ` Shankar, Uma
2025-11-26 19:42     ` Ville Syrjälä
2025-11-26 19:56       ` Shankar, Uma
2025-12-01 17:43         ` Maarten Lankhorst
2025-12-11 14:35       ` Maarten Lankhorst
2025-11-04  8:36 ` [PATCH v2 3/7] drm/i915/display: Move vblank put until after critical section Maarten Lankhorst
2025-11-26 19:25   ` Shankar, Uma
2025-11-04  8:36 ` [PATCH v2 4/7] drm/i915/display: Remove locking from intel_vblank_evade " Maarten Lankhorst
2025-11-26 20:03   ` Shankar, Uma
2025-12-01 14:13     ` Maarten Lankhorst
2025-11-26 21:17   ` Ville Syrjälä
2025-12-11  9:59     ` Maarten Lankhorst
2025-11-04  8:36 ` [PATCH v2 5/7] drm/i915/display: Make icl_dsi_frame_update use _fw too Maarten Lankhorst
2025-11-04  8:36 ` [PATCH v2 6/7] drm/i915/display: Enable interrupts earlier on PREEMPT_RT Maarten Lankhorst
2025-11-26 20:45   ` Shankar, Uma
2025-11-26 22:57     ` Ville Syrjälä
2025-12-01 17:39       ` Maarten Lankhorst
2025-11-04  8:36 ` [PATCH v2 7/7] drm/i915: Use preempt_disable/enable_rt() where recommended Maarten Lankhorst
2025-11-05 13:47 ` Sebastian Andrzej Siewior [this message]
2025-11-05 20:42   ` [PATCH v2 0/7] drm/i915/display: Handle vblank evasion with CONFIG_PREEMPT_RT Maarten Lankhorst
2025-11-10 16:09     ` Sebastian Andrzej Siewior
2025-11-10 18:17       ` Maarten Lankhorst
2025-11-11  9:47         ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251105134740.NseZnpeZ@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=clrkwllms@kernel.org \
    --cc=dev@lankhorst.se \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=mario.kleiner.de@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).