linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PREEMPT_RT vs i915
@ 2025-07-08 19:35 Ben Hutchings
  2025-07-09 17:30 ` Ville Syrjälä
  0 siblings, 1 reply; 20+ messages in thread
From: Ben Hutchings @ 2025-07-08 19:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-rt-users, intel-gfx
  Cc: Debian kernel maintainers

[-- Attachment #1: Type: text/plain, Size: 1636 bytes --]

Hi all,

Debian currently provides non-default kernel packages with the
PREEMPT_RT patchset applied and enabled.  However, for the Debian 14
"forky" release the plan is to use only the upstream RT support.

One result of this is that the i915 driver will not be included in our
RT kernel package on amd64 because the upstream version lacks the
patches to make it compatible with PREEMPT_RT.  This was not a surprise
to us, but may disappoint some of our users (for example see
<https://bugs.debian.org/1108324>).

I see that Sebastian submitted the i915 fixes upstream in October 2024.
If I understand the explanation in
<https://lore.kernel.org/intel-gfx/Zv-n2h0gsquKOvXu@intel.com/> rightly,
much of these changes are unsafe because i915 has its own hard timing
requirement for reprogramming multiple display controller registers
within a single frame.  Is that still the sticking point?

It seems like the critical uncore lock is currently held in a lot of
places and potentially for a long time.  Would it be practical to split
this lock into:

1. raw spinlock protecting only state needed for the atomic (within-one-
frame) update
2. regular spinlock protecting everything in uncore

or is almost all the uncore state potentially used during an atomic
update?

Would it help to offload the atomic updates to a kthread that runs with
RT priority but still with hard interrupts enabled?

Would it make things easier if setting CONFIG_PREEMPT_RT=y limited i915
to not run on some older hardware?

Ben.

-- 
Ben Hutchings
All the simple programs have been written, and all the good names taken

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-08 19:35 PREEMPT_RT vs i915 Ben Hutchings
@ 2025-07-09 17:30 ` Ville Syrjälä
  2025-07-09 19:44   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 20+ messages in thread
From: Ville Syrjälä @ 2025-07-09 17:30 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Sebastian Andrzej Siewior, linux-rt-users, intel-gfx,
	Debian kernel maintainers

On Tue, Jul 08, 2025 at 09:35:21PM +0200, Ben Hutchings wrote:
> Hi all,
> 
> Debian currently provides non-default kernel packages with the
> PREEMPT_RT patchset applied and enabled.  However, for the Debian 14
> "forky" release the plan is to use only the upstream RT support.
> 
> One result of this is that the i915 driver will not be included in our
> RT kernel package on amd64 because the upstream version lacks the
> patches to make it compatible with PREEMPT_RT.  This was not a surprise
> to us, but may disappoint some of our users (for example see
> <https://bugs.debian.org/1108324>).
> 
> I see that Sebastian submitted the i915 fixes upstream in October 2024.
> If I understand the explanation in
> <https://lore.kernel.org/intel-gfx/Zv-n2h0gsquKOvXu@intel.com/> rightly,
> much of these changes are unsafe because i915 has its own hard timing
> requirement for reprogramming multiple display controller registers
> within a single frame.  Is that still the sticking point?
> 
> It seems like the critical uncore lock is currently held in a lot of
> places and potentially for a long time.

It shouldn't be held for that long. I think it should just be
a raw spinlock.

The only edge case I know is the weird retry hack in
__intel_get_crtc_scanline() which I suspect is just due to PSR
and could potentially be handled in a nicer way by actually
checking the PSR state.

> Would it be practical to split
> this lock into:
> 
> 1. raw spinlock protecting only state needed for the atomic (within-one-
> frame) update

Spinlocks aren't involved in that. It is achieved by racing against
the beam, with interrupts disabled to make it more likely the CPU
wins the race.

> 2. regular spinlock protecting everything in uncore
> 
> or is almost all the uncore state potentially used during an atomic
> update?
> 
> Would it help to offload the atomic updates to a kthread that runs with
> RT priority but still with hard interrupts enabled?

Not sure what another thread would specifically get us, as opposed
to eg. just boosting the priority of the existing thread? But whatever
thread does the work needs to not be interrupted for any significant
amount of time.

The interrupt disabling part I suppose is rather hardware/workload
specific, so hard to say anything general about it.

> 
> Would it make things easier if setting CONFIG_PREEMPT_RT=y limited i915
> to not run on some older hardware?

No. All hardware needs this.

Anyways, all of this is rather academic at this point. Someone
needs to try and see what works, and hammer it hard while doing so
to make sure it doesn't fall over easily.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-09 17:30 ` Ville Syrjälä
@ 2025-07-09 19:44   ` Sebastian Andrzej Siewior
  2025-07-09 20:09     ` Ville Syrjälä
  0 siblings, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-07-09 19:44 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Ben Hutchings, linux-rt-users, intel-gfx,
	Debian kernel maintainers

On 2025-07-09 20:30:26 [+0300], Ville Syrjälä wrote:
> > 
> > It seems like the critical uncore lock is currently held in a lot of
> > places and potentially for a long time.
> 
> It shouldn't be held for that long. I think it should just be
> a raw spinlock.

What about I resubmit the series and we look again? I don't think the
lock should be made raw just to be done with it.

Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-09 19:44   ` Sebastian Andrzej Siewior
@ 2025-07-09 20:09     ` Ville Syrjälä
  2025-07-09 22:04       ` Matthew Brost
                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Ville Syrjälä @ 2025-07-09 20:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Ben Hutchings, linux-rt-users, intel-gfx,
	Debian kernel maintainers

On Wed, Jul 09, 2025 at 09:44:43PM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-07-09 20:30:26 [+0300], Ville Syrjälä wrote:
> > > 
> > > It seems like the critical uncore lock is currently held in a lot of
> > > places and potentially for a long time.
> > 
> > It shouldn't be held for that long. I think it should just be
> > a raw spinlock.
> 
> What about I resubmit the series and we look again? I don't think the
> lock should be made raw just to be done with it.

Until someone actually does the work to confirm the thing is working
reliably there's no point in posting anything.

And IIRC the other remaining problem with RT was the spinlocks used
inside tracepoints (which is uncore lock, and probably some vblank
locks). So that too needs some kind of solution because it's going to
very hard to debug the timing sensitive parts without the tracepoints.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-09 20:09     ` Ville Syrjälä
@ 2025-07-09 22:04       ` Matthew Brost
  2025-07-10  6:30         ` Sebastian Andrzej Siewior
  2025-07-10 15:21         ` Ville Syrjälä
  2025-07-10  4:52       ` Mike Galbraith
  2025-07-10  6:41       ` PREEMPT_RT vs i915 Sebastian Andrzej Siewior
  2 siblings, 2 replies; 20+ messages in thread
From: Matthew Brost @ 2025-07-09 22:04 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Sebastian Andrzej Siewior, Ben Hutchings, linux-rt-users,
	intel-gfx, Debian kernel maintainers

On Wed, Jul 09, 2025 at 11:09:22PM +0300, Ville Syrjälä wrote:
> On Wed, Jul 09, 2025 at 09:44:43PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2025-07-09 20:30:26 [+0300], Ville Syrjälä wrote:
> > > > 
> > > > It seems like the critical uncore lock is currently held in a lot of
> > > > places and potentially for a long time.
> > > 
> > > It shouldn't be held for that long. I think it should just be
> > > a raw spinlock.
> > 
> > What about I resubmit the series and we look again? I don't think the
> > lock should be made raw just to be done with it.
> 
> Until someone actually does the work to confirm the thing is working
> reliably there's no point in posting anything.
> 
> And IIRC the other remaining problem with RT was the spinlocks used
> inside tracepoints (which is uncore lock, and probably some vblank
> locks). So that too needs some kind of solution because it's going to
> very hard to debug the timing sensitive parts without the tracepoints.

A bit of a drive-by comment, but taking locks inside tracepoints seems
like a pretty horrible idea in general. We've managed to write an entire
driver (Xe) from scratch and bring it up without doing this. I'd be very
surprised if this is truly necessary in i915.

Matt

> 
> -- 
> Ville Syrjälä
> Intel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-09 20:09     ` Ville Syrjälä
  2025-07-09 22:04       ` Matthew Brost
@ 2025-07-10  4:52       ` Mike Galbraith
  2025-07-10 15:50         ` Ville Syrjälä
  2025-07-10  6:41       ` PREEMPT_RT vs i915 Sebastian Andrzej Siewior
  2 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2025-07-10  4:52 UTC (permalink / raw)
  To: Ville Syrjälä, Sebastian Andrzej Siewior
  Cc: Ben Hutchings, linux-rt-users, intel-gfx,
	Debian kernel maintainers

On Wed, 2025-07-09 at 23:09 +0300, Ville Syrjälä wrote:
> On Wed, Jul 09, 2025 at 09:44:43PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2025-07-09 20:30:26 [+0300], Ville Syrjälä wrote:
> > > > 
> > > > It seems like the critical uncore lock is currently held in a lot of
> > > > places and potentially for a long time.
> > > 
> > > It shouldn't be held for that long. I think it should just be
> > > a raw spinlock.
> > 
> > What about I resubmit the series and we look again? I don't think the
> > lock should be made raw just to be done with it.
> 
> Until someone actually does the work to confirm the thing is working
> reliably there's no point in posting anything.

What does that entail?

My little i915 equipped lappy seems perfectly happy with the latest RT
kernel+DEBUG_ATOMIC_SLEEP.  The thing is an awful RT platform, but as
far as display handling goes go, it does fine at browsing as long as
you're not picky about hires frame drop (not RT related), nor does the
not very impressive looking rendering glmark2 does trigger splats.

WRT reliability, it's primary mission is to be a console, can and does
idle for days on end without dying of boredom, waking for an occasional
ssh session or OS update.. so.. check? :)

	-Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-09 22:04       ` Matthew Brost
@ 2025-07-10  6:30         ` Sebastian Andrzej Siewior
  2025-07-10 15:21         ` Ville Syrjälä
  1 sibling, 0 replies; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-07-10  6:30 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Ville Syrjälä, Ben Hutchings, linux-rt-users, intel-gfx,
	Debian kernel maintainers

On 2025-07-09 15:04:27 [-0700], Matthew Brost wrote:
> > And IIRC the other remaining problem with RT was the spinlocks used
> > inside tracepoints (which is uncore lock, and probably some vblank
> > locks). So that too needs some kind of solution because it's going to
> > very hard to debug the timing sensitive parts without the tracepoints.
> 
> A bit of a drive-by comment, but taking locks inside tracepoints seems
> like a pretty horrible idea in general. We've managed to write an entire
> driver (Xe) from scratch and bring it up without doing this. I'd be very
> surprised if this is truly necessary in i915.

Steven made suggestions how to get around it make it work but my
impression was that this was not appreciated by the i915 side.
The general unwritten rule is to not to take any locks but simply assign
variables.

> Matt

Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-09 20:09     ` Ville Syrjälä
  2025-07-09 22:04       ` Matthew Brost
  2025-07-10  4:52       ` Mike Galbraith
@ 2025-07-10  6:41       ` Sebastian Andrzej Siewior
  2025-07-10 15:04         ` Ville Syrjälä
  2 siblings, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-07-10  6:41 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Ben Hutchings, linux-rt-users, intel-gfx,
	Debian kernel maintainers

On 2025-07-09 23:09:22 [+0300], Ville Syrjälä wrote:
> On Wed, Jul 09, 2025 at 09:44:43PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2025-07-09 20:30:26 [+0300], Ville Syrjälä wrote:
> > > > 
> > > > It seems like the critical uncore lock is currently held in a lot of
> > > > places and potentially for a long time.
> > > 
> > > It shouldn't be held for that long. I think it should just be
> > > a raw spinlock.
> > 
> > What about I resubmit the series and we look again? I don't think the
> > lock should be made raw just to be done with it.
> 
> Until someone actually does the work to confirm the thing is working
> reliably there's no point in posting anything.

Well it works on my machine and this machine dose not pass the code
paths that I patch.

Every patch made was done because someone reported an error/ warning and
confirmed afterwards that the patch fixes it for them and they can use
the machine and don't observe anything.

> And IIRC the other remaining problem with RT was the spinlocks used
> inside tracepoints (which is uncore lock, and probably some vblank
> locks). So that too needs some kind of solution because it's going to
> very hard to debug the timing sensitive parts without the tracepoints.

no, not just that. Making the lock raw led to latency spikes in simple
spikes and I just disabled trace points. It could be worked around by
taking the lock if the tracepoint is enabled and then invoking the
tracepoint unconditionally and not taking the lock anymore. Steven made
a suggestion a while ago how to put this in macro as far as I remember.

Looking at series there are things like execlists_dequeue_irq() which do
local_irq_disable() follwed by spin_lock() which is a no-no and
explained in Documentation/locking/locktypes.rst. There is also
intel_guc_send_busy_loop() no considering RCU read-locks for their
atomic detection. I have 8 patches in total and one in the pipe.

I have patches with Acks by Tvrtko Ursulin but I remain caring them.
I can repost it but usually the bot complains, the bot gets retriggered.
The first patch in series is vblank detection and I've been told this is
old code and scares people. So that might be why the remaining part is
ignored.

Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-10  6:41       ` PREEMPT_RT vs i915 Sebastian Andrzej Siewior
@ 2025-07-10 15:04         ` Ville Syrjälä
  2025-07-10 15:20           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 20+ messages in thread
From: Ville Syrjälä @ 2025-07-10 15:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Ben Hutchings, linux-rt-users, intel-gfx,
	Debian kernel maintainers

On Thu, Jul 10, 2025 at 08:41:36AM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-07-09 23:09:22 [+0300], Ville Syrjälä wrote:
> > On Wed, Jul 09, 2025 at 09:44:43PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2025-07-09 20:30:26 [+0300], Ville Syrjälä wrote:
> > > > > 
> > > > > It seems like the critical uncore lock is currently held in a lot of
> > > > > places and potentially for a long time.
> > > > 
> > > > It shouldn't be held for that long. I think it should just be
> > > > a raw spinlock.
> > > 
> > > What about I resubmit the series and we look again? I don't think the
> > > lock should be made raw just to be done with it.
> > 
> > Until someone actually does the work to confirm the thing is working
> > reliably there's no point in posting anything.
> 
> Well it works on my machine and this machine dose not pass the code
> paths that I patch.
> 
> Every patch made was done because someone reported an error/ warning and
> confirmed afterwards that the patch fixes it for them and they can use
> the machine and don't observe anything.
> 
> > And IIRC the other remaining problem with RT was the spinlocks used
> > inside tracepoints (which is uncore lock, and probably some vblank
> > locks). So that too needs some kind of solution because it's going to
> > very hard to debug the timing sensitive parts without the tracepoints.
> 
> no, not just that. Making the lock raw led to latency spikes in simple
> spikes and I just disabled trace points. It could be worked around by
> taking the lock if the tracepoint is enabled and then invoking the
> tracepoint unconditionally and not taking the lock anymore. Steven made
> a suggestion a while ago how to put this in macro as far as I remember.

When this was last discussed I suggested that there should be a
versions of the tracepoint macros that do the sampling outside
the lock, but that wasn't deemed acceptable for whatever reason.
I don't even know why the current macros are doing the
sampling while holding the lock...

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-10 15:04         ` Ville Syrjälä
@ 2025-07-10 15:20           ` Sebastian Andrzej Siewior
  2025-07-11 12:35             ` Maarten Lankhorst
  0 siblings, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-07-10 15:20 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Ben Hutchings, linux-rt-users, intel-gfx,
	Debian kernel maintainers

On 2025-07-10 18:04:42 [+0300], Ville Syrjälä wrote:
> 
> When this was last discussed I suggested that there should be a
> versions of the tracepoint macros that do the sampling outside
> the lock, but that wasn't deemed acceptable for whatever reason.
> I don't even know why the current macros are doing the
> sampling while holding the lock...

Any objections to me sending the batch and we figure out later how get
the tracepoints for i915 enabled again on RT?
It would be an improvement because you could take a vanilla kernel and
enable PREEMPT_RT and you would only miss the tracepoints while now you
can't enable i915 at all and XE either doesn't compile or spills
warnings at runtime due to the code it shares with i915.

Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-09 22:04       ` Matthew Brost
  2025-07-10  6:30         ` Sebastian Andrzej Siewior
@ 2025-07-10 15:21         ` Ville Syrjälä
  2025-07-10 18:04           ` Matthew Brost
  1 sibling, 1 reply; 20+ messages in thread
From: Ville Syrjälä @ 2025-07-10 15:21 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Sebastian Andrzej Siewior, Ben Hutchings, linux-rt-users,
	intel-gfx, Debian kernel maintainers

On Wed, Jul 09, 2025 at 03:04:27PM -0700, Matthew Brost wrote:
> On Wed, Jul 09, 2025 at 11:09:22PM +0300, Ville Syrjälä wrote:
> > On Wed, Jul 09, 2025 at 09:44:43PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2025-07-09 20:30:26 [+0300], Ville Syrjälä wrote:
> > > > > 
> > > > > It seems like the critical uncore lock is currently held in a lot of
> > > > > places and potentially for a long time.
> > > > 
> > > > It shouldn't be held for that long. I think it should just be
> > > > a raw spinlock.
> > > 
> > > What about I resubmit the series and we look again? I don't think the
> > > lock should be made raw just to be done with it.
> > 
> > Until someone actually does the work to confirm the thing is working
> > reliably there's no point in posting anything.
> > 
> > And IIRC the other remaining problem with RT was the spinlocks used
> > inside tracepoints (which is uncore lock, and probably some vblank
> > locks). So that too needs some kind of solution because it's going to
> > very hard to debug the timing sensitive parts without the tracepoints.
> 
> A bit of a drive-by comment, but taking locks inside tracepoints seems
> like a pretty horrible idea in general. We've managed to write an entire
> driver (Xe) from scratch and bring it up without doing this.

For xe gt stuff specifically the one reason for needing a lock
could be forcewake. Ie. if you read a register that needs
forcewake from a tracepoint you might need some kind of protection
against concurrent access. But xe lacks any kind of forcewake
sanity checker, so no one would likely even know if you get that
wrong. Unless they notice a bogus register value in the
trace that is. But maybe xe doesn't use such registers in its
tracepoints atm, who knows.

And speaking for hardware in general, indexed registers aren't
exactly an uncommon thing. So the tracing stuff really should
have a sane standard way to deal with them...

> I'd be very
> surprised if this is truly necessary in i915.

The most fundemental reason is the hardware issue present on
some platforms which can hang the machine if you access the
same cacheline from multiple threads simultaneously.

The other reason is that some machines lack the hardware
frame counter so we have to cook one up based on cpu side
timestamps, and that involves the vblank machinery locks.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-10  4:52       ` Mike Galbraith
@ 2025-07-10 15:50         ` Ville Syrjälä
  2025-07-11  2:36           ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Ville Syrjälä @ 2025-07-10 15:50 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, Ben Hutchings, linux-rt-users,
	intel-gfx, Debian kernel maintainers

On Thu, Jul 10, 2025 at 06:52:58AM +0200, Mike Galbraith wrote:
> On Wed, 2025-07-09 at 23:09 +0300, Ville Syrjälä wrote:
> > On Wed, Jul 09, 2025 at 09:44:43PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2025-07-09 20:30:26 [+0300], Ville Syrjälä wrote:
> > > > > 
> > > > > It seems like the critical uncore lock is currently held in a lot of
> > > > > places and potentially for a long time.
> > > > 
> > > > It shouldn't be held for that long. I think it should just be
> > > > a raw spinlock.
> > > 
> > > What about I resubmit the series and we look again? I don't think the
> > > lock should be made raw just to be done with it.
> > 
> > Until someone actually does the work to confirm the thing is working
> > reliably there's no point in posting anything.
> 
> What does that entail?

Basic testing would be something like this:
- enable CONFIG_DRM_I915_DEBUG_VBLANK_EVADE
- set i915.enable_dsb=0 to make sure everything takes the
  mmio path
- stress the heck out of it and make sure the histogram
  doesn't look significantly worse than on !RT
  (kms_atomic_transition --extended might take care of the display
  side here, but it should probably be accompanied with some
  horrendous system loads which is a less well defined part)
- ideally do that on a potato
  (some VLV/CHV (Atom) thing would probably be a good candidate)
- repeat with lockdep enabled to make everything even harder

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-10 15:21         ` Ville Syrjälä
@ 2025-07-10 18:04           ` Matthew Brost
  2025-07-10 18:15             ` Ville Syrjälä
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Brost @ 2025-07-10 18:04 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Sebastian Andrzej Siewior, Ben Hutchings, linux-rt-users,
	intel-gfx, Debian kernel maintainers

On Thu, Jul 10, 2025 at 06:21:09PM +0300, Ville Syrjälä wrote:
> On Wed, Jul 09, 2025 at 03:04:27PM -0700, Matthew Brost wrote:
> > On Wed, Jul 09, 2025 at 11:09:22PM +0300, Ville Syrjälä wrote:
> > > On Wed, Jul 09, 2025 at 09:44:43PM +0200, Sebastian Andrzej Siewior wrote:
> > > > On 2025-07-09 20:30:26 [+0300], Ville Syrjälä wrote:
> > > > > > 
> > > > > > It seems like the critical uncore lock is currently held in a lot of
> > > > > > places and potentially for a long time.
> > > > > 
> > > > > It shouldn't be held for that long. I think it should just be
> > > > > a raw spinlock.
> > > > 
> > > > What about I resubmit the series and we look again? I don't think the
> > > > lock should be made raw just to be done with it.
> > > 
> > > Until someone actually does the work to confirm the thing is working
> > > reliably there's no point in posting anything.
> > > 
> > > And IIRC the other remaining problem with RT was the spinlocks used
> > > inside tracepoints (which is uncore lock, and probably some vblank
> > > locks). So that too needs some kind of solution because it's going to
> > > very hard to debug the timing sensitive parts without the tracepoints.
> > 
> > A bit of a drive-by comment, but taking locks inside tracepoints seems
> > like a pretty horrible idea in general. We've managed to write an entire
> > driver (Xe) from scratch and bring it up without doing this.
> 
> For xe gt stuff specifically the one reason for needing a lock
> could be forcewake. Ie. if you read a register that needs

Yes, we don't forcewake on demand in Xe; instead, we expect the upper
layers to hold these.

> forcewake from a tracepoint you might need some kind of protection
> against concurrent access. But xe lacks any kind of forcewake
> sanity checker, so no one would likely even know if you get that

We do have some asserts to catch missing forcewake—see
xe_force_wake_assert_held—but this is incomplete compared to the i915
checker.

> wrong. Unless they notice a bogus register value in the
> trace that is. But maybe xe doesn't use such registers in its
> tracepoints atm, who knows.
> 

I just audited the Xe tracepoints. We mostly just assign values from
software state, or in some cases, read memory from the VRAM BAR. The
latter again doesn't require any locks—just asserts that the device is
awake.

> And speaking for hardware in general, indexed registers aren't
> exactly an uncommon thing. So the tracing stuff really should
> have a sane standard way to deal with them...
> 
> > I'd be very
> > surprised if this is truly necessary in i915.
> 
> The most fundemental reason is the hardware issue present on
> some platforms which can hang the machine if you access the
> same cacheline from multiple threads simultaneously.
> 

Ooo, yeah—yikes.

One novel solution here: move the MMIO reads out of the tracepoint code
and into the caller, passing the values as arguments to the tracepoint.
Then, just assign the values inside the tracepoint. I think that would
work and be functionally equivalent.

Matt

> The other reason is that some machines lack the hardware
> frame counter so we have to cook one up based on cpu side
> timestamps, and that involves the vblank machinery locks.
> 
> -- 
> Ville Syrjälä
> Intel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-10 18:04           ` Matthew Brost
@ 2025-07-10 18:15             ` Ville Syrjälä
  0 siblings, 0 replies; 20+ messages in thread
From: Ville Syrjälä @ 2025-07-10 18:15 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Sebastian Andrzej Siewior, Ben Hutchings, linux-rt-users,
	intel-gfx, Debian kernel maintainers

On Thu, Jul 10, 2025 at 11:04:22AM -0700, Matthew Brost wrote:
> On Thu, Jul 10, 2025 at 06:21:09PM +0300, Ville Syrjälä wrote:
> > On Wed, Jul 09, 2025 at 03:04:27PM -0700, Matthew Brost wrote:
> > > On Wed, Jul 09, 2025 at 11:09:22PM +0300, Ville Syrjälä wrote:
> > > > On Wed, Jul 09, 2025 at 09:44:43PM +0200, Sebastian Andrzej Siewior wrote:
> > > > > On 2025-07-09 20:30:26 [+0300], Ville Syrjälä wrote:
> > > > > > > 
> > > > > > > It seems like the critical uncore lock is currently held in a lot of
> > > > > > > places and potentially for a long time.
> > > > > > 
> > > > > > It shouldn't be held for that long. I think it should just be
> > > > > > a raw spinlock.
> > > > > 
> > > > > What about I resubmit the series and we look again? I don't think the
> > > > > lock should be made raw just to be done with it.
> > > > 
> > > > Until someone actually does the work to confirm the thing is working
> > > > reliably there's no point in posting anything.
> > > > 
> > > > And IIRC the other remaining problem with RT was the spinlocks used
> > > > inside tracepoints (which is uncore lock, and probably some vblank
> > > > locks). So that too needs some kind of solution because it's going to
> > > > very hard to debug the timing sensitive parts without the tracepoints.
> > > 
> > > A bit of a drive-by comment, but taking locks inside tracepoints seems
> > > like a pretty horrible idea in general. We've managed to write an entire
> > > driver (Xe) from scratch and bring it up without doing this.
> > 
> > For xe gt stuff specifically the one reason for needing a lock
> > could be forcewake. Ie. if you read a register that needs
> 
> Yes, we don't forcewake on demand in Xe; instead, we expect the upper
> layers to hold these.
> 
> > forcewake from a tracepoint you might need some kind of protection
> > against concurrent access. But xe lacks any kind of forcewake
> > sanity checker, so no one would likely even know if you get that
> 
> We do have some asserts to catch missing forcewake—see
> xe_force_wake_assert_held—but this is incomplete compared to the i915
> checker.
> 
> > wrong. Unless they notice a bogus register value in the
> > trace that is. But maybe xe doesn't use such registers in its
> > tracepoints atm, who knows.
> > 
> 
> I just audited the Xe tracepoints. We mostly just assign values from
> software state, or in some cases, read memory from the VRAM BAR. The
> latter again doesn't require any locks—just asserts that the device is
> awake.
> 
> > And speaking for hardware in general, indexed registers aren't
> > exactly an uncommon thing. So the tracing stuff really should
> > have a sane standard way to deal with them...
> > 
> > > I'd be very
> > > surprised if this is truly necessary in i915.
> > 
> > The most fundemental reason is the hardware issue present on
> > some platforms which can hang the machine if you access the
> > same cacheline from multiple threads simultaneously.
> > 
> 
> Ooo, yeah—yikes.
> 
> One novel solution here: move the MMIO reads out of the tracepoint code
> and into the caller, passing the values as arguments to the tracepoint.
> Then, just assign the values inside the tracepoint. I think that would
> work and be functionally equivalent.

I don't want a silly open coded solution like that. The tracepoint
should just do the right thing on its own. But I haven't actually
looked how hard it would to add that into the existing macro
jungle that is tracepoints...

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-10 15:50         ` Ville Syrjälä
@ 2025-07-11  2:36           ` Mike Galbraith
  2025-07-11  3:33             ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2025-07-11  2:36 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Sebastian Andrzej Siewior, Ben Hutchings, linux-rt-users,
	intel-gfx, Debian kernel maintainers

On Thu, 2025-07-10 at 18:50 +0300, Ville Syrjälä wrote:
> On Thu, Jul 10, 2025 at 06:52:58AM +0200, Mike Galbraith wrote:
> > > 
> > > Until someone actually does the work to confirm the thing is working
> > > reliably there's no point in posting anything.
> > 
> > What does that entail?
> 
> Basic testing would be something like this:
> - enable CONFIG_DRM_I915_DEBUG_VBLANK_EVADE
> - set i915.enable_dsb=0 to make sure everything takes the
>   mmio path

I can give that a go while tossing rocks to check for moaning..

> - stress the heck out of it and make sure the histogram
>   doesn't look significantly worse than on !RT
>   (kms_atomic_transition --extended might take care of the display
>   side here, but it should probably be accompanied with some
>   horrendous system loads which is a less well defined part)
> - ideally do that on a potato
>   (some VLV/CHV (Atom) thing would probably be a good candidate)
> - repeat with lockdep enabled to make everything even harder

..forwarding any performance goop emitted, but lockdep runs out of lock
counting fingers and turns itself off in short order with RT builds.

	-Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-11  2:36           ` Mike Galbraith
@ 2025-07-11  3:33             ` Mike Galbraith
  2025-07-11  8:05               ` block: lockdep splat with RT config v6.15+ Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2025-07-11  3:33 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Sebastian Andrzej Siewior, Ben Hutchings, linux-rt-users,
	intel-gfx, Debian kernel maintainers

On Fri, 2025-07-11 at 04:36 +0200, Mike Galbraith wrote:
> ..forwarding any performance goop emitted, but lockdep runs out of lock
> counting fingers and turns itself off in short order with RT builds.

Hohum, seems block-land called dibs on lockdep anyway.

[    6.761473] ======================================================
[    6.761474] WARNING: possible circular locking dependency detected
[    6.761475] 6.16.0.bc9ff192a-master-rt #46 Tainted: G          I        
[    6.761476] ------------------------------------------------------
[    6.761476] kworker/u16:0/12 is trying to acquire lock:
[    6.761478] ffffffff825aabd8 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x508/0x790
[    6.761486] 
               but task is already holding lock:
[    6.761486] ffff8881238f05c0 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0x5f/0x140
[    6.761491] 
               which lock already depends on the new lock.

[    6.761492] 
               the existing dependency chain (in reverse order) is:
[    6.761492] 
               -> #3 (&q->elevator_lock){+.+.}-{4:4}:
[    6.761494]        __lock_acquire+0x540/0xbb0
[    6.761496]        lock_acquire.part.0+0x94/0x1e0
[    6.761497]        mutex_lock_nested+0x4c/0xa0
[    6.761500]        elevator_change+0x5f/0x140
[    6.761502]        elv_iosched_store+0xe6/0x110
[    6.761505]        kernfs_fop_write_iter+0x14d/0x220
[    6.761508]        vfs_write+0x223/0x580
[    6.761510]        ksys_write+0x5f/0xe0
[    6.761512]        do_syscall_64+0x76/0x3d0
[    6.761514]        entry_SYSCALL_64_after_hwframe+0x4b/0x53
[    6.761516] 
               -> #2 (&q->q_usage_counter(io)){++++}-{0:0}:
[    6.761517]        __lock_acquire+0x540/0xbb0
[    6.761518]        lock_acquire.part.0+0x94/0x1e0
[    6.761519]        blk_alloc_queue+0x34c/0x390
[    6.761521]        blk_mq_alloc_queue+0x52/0xb0
[    6.761524]        __blk_mq_alloc_disk+0x18/0x60
[    6.761526]        loop_add+0x1d5/0x3d0 [loop]
[    6.761530]        stop_this_handle+0xc3/0x140 [jbd2]
[    6.761539]        do_one_initcall+0x4a/0x270
[    6.761542]        do_init_module+0x60/0x220
[    6.761544]        init_module_from_file+0x75/0xa0
[    6.761546]        idempotent_init_module+0xf3/0x2d0
[    6.761547]        __x64_sys_finit_module+0x6d/0xd0
[    6.761549]        do_syscall_64+0x76/0x3d0
[    6.761550]        entry_SYSCALL_64_after_hwframe+0x4b/0x53
[    6.761551] 
               -> #1 (fs_reclaim){+.+.}-{0:0}:
[    6.761553]        __lock_acquire+0x540/0xbb0
[    6.761554]        lock_acquire.part.0+0x94/0x1e0
[    6.761555]        fs_reclaim_acquire+0x95/0xd0
[    6.761558]        __kmalloc_noprof+0x87/0x300
[    6.761561]        pcpu_create_chunk+0x1a/0x1b0
[    6.761563]        pcpu_alloc_noprof+0x742/0x790
[    6.761564]        bts_init+0x61/0x100
[    6.761566]        do_one_initcall+0x4a/0x270
[    6.761568]        kernel_init_freeable+0x235/0x280
[    6.761571]        kernel_init+0x1a/0x120
[    6.761572]        ret_from_fork+0x213/0x270
[    6.761574]        ret_from_fork_asm+0x11/0x20
[    6.761576] 
               -> #0 (pcpu_alloc_mutex){+.+.}-{4:4}:
[    6.761578]        check_prev_add+0xe8/0xca0
[    6.761581]        validate_chain+0x468/0x500
[    6.761582]        __lock_acquire+0x540/0xbb0
[    6.761583]        lock_acquire.part.0+0x94/0x1e0
[    6.761584]        _mutex_lock_killable+0x55/0xc0
[    6.761586]        pcpu_alloc_noprof+0x508/0x790
[    6.761587]        sbitmap_init_node+0xee/0x200
[    6.761590]        sbitmap_queue_init_node+0x28/0x140
[    6.761592]        blk_mq_init_tags+0xa6/0x120
[    6.761594]        blk_mq_alloc_map_and_rqs+0x40/0x110
[    6.761596]        blk_mq_init_sched+0xea/0x1d0
[    6.761599]        elevator_switch+0xb5/0x300
[    6.761601]        elevator_change+0xe2/0x140
[    6.761603]        elevator_set_default+0xb0/0xd0
[    6.761606]        blk_register_queue+0xda/0x1c0
[    6.761608]        __add_disk+0x222/0x380
[    6.761609]        add_disk_fwnode+0x79/0x160
[    6.761610]        sd_probe+0x2f8/0x490 [sd_mod]
[    6.761613]        really_probe+0xd5/0x330
[    6.761615]        __driver_probe_device+0x78/0x110
[    6.761616]        driver_probe_device+0x1f/0xa0
[    6.761618]        __device_attach_driver+0x7a/0x100
[    6.761619]        bus_for_each_drv+0x75/0xb0
[    6.761622]        __device_attach_async_helper+0x83/0xc0
[    6.761624]        async_run_entry_fn+0x2c/0x110
[    6.761626]        process_one_work+0x1e6/0x550
[    6.761629]        worker_thread+0x1ce/0x3c0
[    6.761632]        kthread+0x10c/0x200
[    6.761634]        ret_from_fork+0x213/0x270
[    6.761636]        ret_from_fork_asm+0x11/0x20
[    6.761638] 
               other info that might help us debug this:

[    6.761638] Chain exists of:
                 pcpu_alloc_mutex --> &q->q_usage_counter(io) --> &q->elevator_lock

[    6.761640]  Possible unsafe locking scenario:

[    6.761641]        CPU0                    CPU1
[    6.761641]        ----                    ----
[    6.761641]   lock(&q->elevator_lock);
[    6.761642]                                lock(&q->q_usage_counter(io));
[    6.761643]                                lock(&q->elevator_lock);
[    6.761644]   lock(pcpu_alloc_mutex);
[    6.761645] 
                *** DEADLOCK ***

[    6.761645] 7 locks held by kworker/u16:0/12:
[    6.761646]  #0: ffff888100c64938 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x41a/0x550
[    6.761651]  #1: ffff888100afbe48 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1aa/0x550
[    6.761655]  #2: ffff8881019e53a0 (&dev->mutex){....}-{4:4}, at: __device_attach_async_helper+0x30/0xc0
[    6.761658]  #3: ffff888122acd3d0 (&set->update_nr_hwq_lock){.+.+}-{4:4}, at: add_disk_fwnode+0x68/0x160
[    6.761661]  #4: ffff8881238f0640 (&q->sysfs_lock){+.+.}-{4:4}, at: blk_register_queue+0x92/0x1c0
[    6.761665]  #5: ffff8881238f0120 (&q->q_usage_counter(queue)#65){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x20
[    6.761669]  #6: ffff8881238f05c0 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0x5f/0x140
[    6.761673] 
               stack backtrace:
[    6.761675] CPU: 1 UID: 0 PID: 12 Comm: kworker/u16:0 Tainted: G          I         6.16.0.bc9ff192a-master-rt #46 PREEMPT_{RT,(lazy)} 
[    6.761678] Tainted: [I]=FIRMWARE_WORKAROUND
[    6.761678] Hardware name: HP HP Spectre x360 Convertible/804F, BIOS F.47 11/22/2017
[    6.761680] Workqueue: async async_run_entry_fn
[    6.761683] Call Trace:
[    6.761684]  <TASK>
[    6.761686]  dump_stack_lvl+0x5b/0x80
[    6.761690]  print_circular_bug.cold+0x38/0x45
[    6.761694]  check_noncircular+0x109/0x120
[    6.761699]  check_prev_add+0xe8/0xca0
[    6.761703]  validate_chain+0x468/0x500
[    6.761706]  __lock_acquire+0x540/0xbb0
[    6.761707]  ? find_held_lock+0x2b/0x80
[    6.761711]  lock_acquire.part.0+0x94/0x1e0
[    6.761712]  ? pcpu_alloc_noprof+0x508/0x790
[    6.761715]  ? rcu_is_watching+0x11/0x40
[    6.761717]  ? lock_acquire+0xee/0x130
[    6.761718]  ? pcpu_alloc_noprof+0x508/0x790
[    6.761720]  ? pcpu_alloc_noprof+0x508/0x790
[    6.761721]  _mutex_lock_killable+0x55/0xc0
[    6.761724]  ? pcpu_alloc_noprof+0x508/0x790
[    6.761726]  pcpu_alloc_noprof+0x508/0x790
[    6.761728]  ? lock_is_held_type+0xca/0x120
[    6.761732]  sbitmap_init_node+0xee/0x200
[    6.761736]  sbitmap_queue_init_node+0x28/0x140
[    6.761739]  blk_mq_init_tags+0xa6/0x120
[    6.761743]  blk_mq_alloc_map_and_rqs+0x40/0x110
[    6.761746]  blk_mq_init_sched+0xea/0x1d0
[    6.761749]  elevator_switch+0xb5/0x300
[    6.761753]  elevator_change+0xe2/0x140
[    6.761756]  elevator_set_default+0xb0/0xd0
[    6.761759]  blk_register_queue+0xda/0x1c0
[    6.761763]  __add_disk+0x222/0x380
[    6.761765]  add_disk_fwnode+0x79/0x160
[    6.761768]  sd_probe+0x2f8/0x490 [sd_mod]
[    6.761771]  ? driver_probe_device+0xa0/0xa0
[    6.761773]  really_probe+0xd5/0x330
[    6.761774]  ? pm_runtime_barrier+0x54/0x90
[    6.761777]  __driver_probe_device+0x78/0x110
[    6.761779]  driver_probe_device+0x1f/0xa0
[    6.761781]  __device_attach_driver+0x7a/0x100
[    6.761783]  bus_for_each_drv+0x75/0xb0
[    6.761787]  __device_attach_async_helper+0x83/0xc0
[    6.761789]  async_run_entry_fn+0x2c/0x110
[    6.761792]  process_one_work+0x1e6/0x550
[    6.761797]  worker_thread+0x1ce/0x3c0
[    6.761800]  ? bh_worker+0x250/0x250
[    6.761803]  kthread+0x10c/0x200
[    6.761811]  ? kthreads_online_cpu+0xe0/0xe0
[    6.761814]  ret_from_fork+0x213/0x270
[    6.761817]  ? kthreads_online_cpu+0xe0/0xe0
[    6.761819]  ret_from_fork_asm+0x11/0x20
[    6.761825]  </TASK>
[    6.799951]  sda: sda1 sda2 sda3 sda4 sda5 sda6
[    6.801399] sd 1:0:0:0: [sda] Attached SCSI disk


^ permalink raw reply	[flat|nested] 20+ messages in thread

* block: lockdep splat with RT config v6.15+
  2025-07-11  3:33             ` Mike Galbraith
@ 2025-07-11  8:05               ` Mike Galbraith
  2025-07-11  8:59                 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2025-07-11  8:05 UTC (permalink / raw)
  To: linux-block, Jens Axboe; +Cc: Sebastian Andrzej Siewior, RT

(was PREEMPT_RT vs i915)

Ok, the splat below is not new to the 6.16 cycle, seems it landed in
the 6.15 cycle, as that also gripes but 6.14 boots clean.

On Fri, 2025-07-11 at 05:33 +0200, Mike Galbraith wrote:
...
> Hohum, seems block-land called dibs on lockdep anyway.
> 
> [    6.761473] ======================================================
> [    6.761474] WARNING: possible circular locking dependency detected
> [    6.761475] 6.16.0.bc9ff192a-master-rt #46 Tainted: G          I        
> [    6.761476] ------------------------------------------------------
> [    6.761476] kworker/u16:0/12 is trying to acquire lock:
> [    6.761478] ffffffff825aabd8 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x508/0x790
> [    6.761486] 
>                but task is already holding lock:
> [    6.761486] ffff8881238f05c0 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0x5f/0x140
> [    6.761491] 
>                which lock already depends on the new lock.
> 
> [    6.761492] 
>                the existing dependency chain (in reverse order) is:
> [    6.761492] 
>                -> #3 (&q->elevator_lock){+.+.}-{4:4}:
> [    6.761494]        __lock_acquire+0x540/0xbb0
> [    6.761496]        lock_acquire.part.0+0x94/0x1e0
> [    6.761497]        mutex_lock_nested+0x4c/0xa0
> [    6.761500]        elevator_change+0x5f/0x140
> [    6.761502]        elv_iosched_store+0xe6/0x110
> [    6.761505]        kernfs_fop_write_iter+0x14d/0x220
> [    6.761508]        vfs_write+0x223/0x580
> [    6.761510]        ksys_write+0x5f/0xe0
> [    6.761512]        do_syscall_64+0x76/0x3d0
> [    6.761514]        entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [    6.761516] 
>                -> #2 (&q->q_usage_counter(io)){++++}-{0:0}:
> [    6.761517]        __lock_acquire+0x540/0xbb0
> [    6.761518]        lock_acquire.part.0+0x94/0x1e0
> [    6.761519]        blk_alloc_queue+0x34c/0x390
> [    6.761521]        blk_mq_alloc_queue+0x52/0xb0
> [    6.761524]        __blk_mq_alloc_disk+0x18/0x60
> [    6.761526]        loop_add+0x1d5/0x3d0 [loop]
> [    6.761530]        stop_this_handle+0xc3/0x140 [jbd2]
> [    6.761539]        do_one_initcall+0x4a/0x270
> [    6.761542]        do_init_module+0x60/0x220
> [    6.761544]        init_module_from_file+0x75/0xa0
> [    6.761546]        idempotent_init_module+0xf3/0x2d0
> [    6.761547]        __x64_sys_finit_module+0x6d/0xd0
> [    6.761549]        do_syscall_64+0x76/0x3d0
> [    6.761550]        entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [    6.761551] 
>                -> #1 (fs_reclaim){+.+.}-{0:0}:
> [    6.761553]        __lock_acquire+0x540/0xbb0
> [    6.761554]        lock_acquire.part.0+0x94/0x1e0
> [    6.761555]        fs_reclaim_acquire+0x95/0xd0
> [    6.761558]        __kmalloc_noprof+0x87/0x300
> [    6.761561]        pcpu_create_chunk+0x1a/0x1b0
> [    6.761563]        pcpu_alloc_noprof+0x742/0x790
> [    6.761564]        bts_init+0x61/0x100
> [    6.761566]        do_one_initcall+0x4a/0x270
> [    6.761568]        kernel_init_freeable+0x235/0x280
> [    6.761571]        kernel_init+0x1a/0x120
> [    6.761572]        ret_from_fork+0x213/0x270
> [    6.761574]        ret_from_fork_asm+0x11/0x20
> [    6.761576] 
>                -> #0 (pcpu_alloc_mutex){+.+.}-{4:4}:
> [    6.761578]        check_prev_add+0xe8/0xca0
> [    6.761581]        validate_chain+0x468/0x500
> [    6.761582]        __lock_acquire+0x540/0xbb0
> [    6.761583]        lock_acquire.part.0+0x94/0x1e0
> [    6.761584]        _mutex_lock_killable+0x55/0xc0
> [    6.761586]        pcpu_alloc_noprof+0x508/0x790
> [    6.761587]        sbitmap_init_node+0xee/0x200
> [    6.761590]        sbitmap_queue_init_node+0x28/0x140
> [    6.761592]        blk_mq_init_tags+0xa6/0x120
> [    6.761594]        blk_mq_alloc_map_and_rqs+0x40/0x110
> [    6.761596]        blk_mq_init_sched+0xea/0x1d0
> [    6.761599]        elevator_switch+0xb5/0x300
> [    6.761601]        elevator_change+0xe2/0x140
> [    6.761603]        elevator_set_default+0xb0/0xd0
> [    6.761606]        blk_register_queue+0xda/0x1c0
> [    6.761608]        __add_disk+0x222/0x380
> [    6.761609]        add_disk_fwnode+0x79/0x160
> [    6.761610]        sd_probe+0x2f8/0x490 [sd_mod]
> [    6.761613]        really_probe+0xd5/0x330
> [    6.761615]        __driver_probe_device+0x78/0x110
> [    6.761616]        driver_probe_device+0x1f/0xa0
> [    6.761618]        __device_attach_driver+0x7a/0x100
> [    6.761619]        bus_for_each_drv+0x75/0xb0
> [    6.761622]        __device_attach_async_helper+0x83/0xc0
> [    6.761624]        async_run_entry_fn+0x2c/0x110
> [    6.761626]        process_one_work+0x1e6/0x550
> [    6.761629]        worker_thread+0x1ce/0x3c0
> [    6.761632]        kthread+0x10c/0x200
> [    6.761634]        ret_from_fork+0x213/0x270
> [    6.761636]        ret_from_fork_asm+0x11/0x20
> [    6.761638] 
>                other info that might help us debug this:
> 
> [    6.761638] Chain exists of:
>                  pcpu_alloc_mutex --> &q->q_usage_counter(io) --> &q->elevator_lock
> 
> [    6.761640]  Possible unsafe locking scenario:
> 
> [    6.761641]        CPU0                    CPU1
> [    6.761641]        ----                    ----
> [    6.761641]   lock(&q->elevator_lock);
> [    6.761642]                                lock(&q->q_usage_counter(io));
> [    6.761643]                                lock(&q->elevator_lock);
> [    6.761644]   lock(pcpu_alloc_mutex);
> [    6.761645] 
>                 *** DEADLOCK ***
> 
> [    6.761645] 7 locks held by kworker/u16:0/12:
> [    6.761646]  #0: ffff888100c64938 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x41a/0x550
> [    6.761651]  #1: ffff888100afbe48 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1aa/0x550
> [    6.761655]  #2: ffff8881019e53a0 (&dev->mutex){....}-{4:4}, at: __device_attach_async_helper+0x30/0xc0
> [    6.761658]  #3: ffff888122acd3d0 (&set->update_nr_hwq_lock){.+.+}-{4:4}, at: add_disk_fwnode+0x68/0x160
> [    6.761661]  #4: ffff8881238f0640 (&q->sysfs_lock){+.+.}-{4:4}, at: blk_register_queue+0x92/0x1c0
> [    6.761665]  #5: ffff8881238f0120 (&q->q_usage_counter(queue)#65){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x20
> [    6.761669]  #6: ffff8881238f05c0 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0x5f/0x140
> [    6.761673] 
>                stack backtrace:
> [    6.761675] CPU: 1 UID: 0 PID: 12 Comm: kworker/u16:0 Tainted: G          I         6.16.0.bc9ff192a-master-rt #46 PREEMPT_{RT,(lazy)} 
> [    6.761678] Tainted: [I]=FIRMWARE_WORKAROUND
> [    6.761678] Hardware name: HP HP Spectre x360 Convertible/804F, BIOS F.47 11/22/2017
> [    6.761680] Workqueue: async async_run_entry_fn
> [    6.761683] Call Trace:
> [    6.761684]  <TASK>
> [    6.761686]  dump_stack_lvl+0x5b/0x80
> [    6.761690]  print_circular_bug.cold+0x38/0x45
> [    6.761694]  check_noncircular+0x109/0x120
> [    6.761699]  check_prev_add+0xe8/0xca0
> [    6.761703]  validate_chain+0x468/0x500
> [    6.761706]  __lock_acquire+0x540/0xbb0
> [    6.761707]  ? find_held_lock+0x2b/0x80
> [    6.761711]  lock_acquire.part.0+0x94/0x1e0
> [    6.761712]  ? pcpu_alloc_noprof+0x508/0x790
> [    6.761715]  ? rcu_is_watching+0x11/0x40
> [    6.761717]  ? lock_acquire+0xee/0x130
> [    6.761718]  ? pcpu_alloc_noprof+0x508/0x790
> [    6.761720]  ? pcpu_alloc_noprof+0x508/0x790
> [    6.761721]  _mutex_lock_killable+0x55/0xc0
> [    6.761724]  ? pcpu_alloc_noprof+0x508/0x790
> [    6.761726]  pcpu_alloc_noprof+0x508/0x790
> [    6.761728]  ? lock_is_held_type+0xca/0x120
> [    6.761732]  sbitmap_init_node+0xee/0x200
> [    6.761736]  sbitmap_queue_init_node+0x28/0x140
> [    6.761739]  blk_mq_init_tags+0xa6/0x120
> [    6.761743]  blk_mq_alloc_map_and_rqs+0x40/0x110
> [    6.761746]  blk_mq_init_sched+0xea/0x1d0
> [    6.761749]  elevator_switch+0xb5/0x300
> [    6.761753]  elevator_change+0xe2/0x140
> [    6.761756]  elevator_set_default+0xb0/0xd0
> [    6.761759]  blk_register_queue+0xda/0x1c0
> [    6.761763]  __add_disk+0x222/0x380
> [    6.761765]  add_disk_fwnode+0x79/0x160
> [    6.761768]  sd_probe+0x2f8/0x490 [sd_mod]
> [    6.761771]  ? driver_probe_device+0xa0/0xa0
> [    6.761773]  really_probe+0xd5/0x330
> [    6.761774]  ? pm_runtime_barrier+0x54/0x90
> [    6.761777]  __driver_probe_device+0x78/0x110
> [    6.761779]  driver_probe_device+0x1f/0xa0
> [    6.761781]  __device_attach_driver+0x7a/0x100
> [    6.761783]  bus_for_each_drv+0x75/0xb0
> [    6.761787]  __device_attach_async_helper+0x83/0xc0
> [    6.761789]  async_run_entry_fn+0x2c/0x110
> [    6.761792]  process_one_work+0x1e6/0x550
> [    6.761797]  worker_thread+0x1ce/0x3c0
> [    6.761800]  ? bh_worker+0x250/0x250
> [    6.761803]  kthread+0x10c/0x200
> [    6.761811]  ? kthreads_online_cpu+0xe0/0xe0
> [    6.761814]  ret_from_fork+0x213/0x270
> [    6.761817]  ? kthreads_online_cpu+0xe0/0xe0
> [    6.761819]  ret_from_fork_asm+0x11/0x20
> [    6.761825]  </TASK>
> [    6.799951]  sda: sda1 sda2 sda3 sda4 sda5 sda6
> [    6.801399] sd 1:0:0:0: [sda] Attached SCSI disk
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: lockdep splat with RT config v6.15+
  2025-07-11  8:05               ` block: lockdep splat with RT config v6.15+ Mike Galbraith
@ 2025-07-11  8:59                 ` Sebastian Andrzej Siewior
  2025-07-11  9:03                   ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-07-11  8:59 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-block, Jens Axboe, RT

On 2025-07-11 10:05:18 [+0200], Mike Galbraith wrote:
> (was PREEMPT_RT vs i915)
> 
> Ok, the splat below is not new to the 6.16 cycle, seems it landed in
> the 6.15 cycle, as that also gripes but 6.14 boots clean.

I don't remember how you triggered this. Do you happen to know if this
is RT only or also triggers without the RT bits?

Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: lockdep splat with RT config v6.15+
  2025-07-11  8:59                 ` Sebastian Andrzej Siewior
@ 2025-07-11  9:03                   ` Mike Galbraith
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2025-07-11  9:03 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-block, Jens Axboe, RT

On Fri, 2025-07-11 at 10:59 +0200, Sebastian Andrzej Siewior wrote:
> On 2025-07-11 10:05:18 [+0200], Mike Galbraith wrote:
> > (was PREEMPT_RT vs i915)
> > 
> > Ok, the splat below is not new to the 6.16 cycle, seems it landed in
> > the 6.15 cycle, as that also gripes but 6.14 boots clean.
> 
> I don't remember how you triggered this. Do you happen to know if this
> is RT only or also triggers without the RT bits?

It's CONFIG_PREEMPT_RT dependent. The trigger is just boot for both my
lappy or desktop box.

	-Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: PREEMPT_RT vs i915
  2025-07-10 15:20           ` Sebastian Andrzej Siewior
@ 2025-07-11 12:35             ` Maarten Lankhorst
  0 siblings, 0 replies; 20+ messages in thread
From: Maarten Lankhorst @ 2025-07-11 12:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Ville Syrjälä
  Cc: Ben Hutchings, linux-rt-users, intel-gfx,
	Debian kernel maintainers

Hey,

On 2025-07-10 17:20, Sebastian Andrzej Siewior wrote:
> On 2025-07-10 18:04:42 [+0300], Ville Syrjälä wrote:
>>
>> When this was last discussed I suggested that there should be a
>> versions of the tracepoint macros that do the sampling outside
>> the lock, but that wasn't deemed acceptable for whatever reason.
>> I don't even know why the current macros are doing the
>> sampling while holding the lock...
> 
> Any objections to me sending the batch and we figure out later how get
> the tracepoints for i915 enabled again on RT?
> It would be an improvement because you could take a vanilla kernel and
> enable PREEMPT_RT and you would only miss the tracepoints while now you
> can't enable i915 at all and XE either doesn't compile or spills
> warnings at runtime due to the code it shares with i915.
> 
> Sebastian

FWIW, I did some quick testing. There should be no need for disabling tracepoints on xe.
The uncore lock is for a very specific reason (intel_de.h):
 * Certain architectures will die if the same cacheline is concurrently accessed
 * by different clients (e.g. on Ivybridge). Access to registers should
 * therefore generally be serialised, by either the dev_priv->uncore.lock or
 * a more localised lock guarding all access to that bank of registers.

Since we only support modern platforms on xe there is no need for the uncore lock and the specific error will not trigger.
On xe, it only exists to be used in vblank evasion for compatiblity with i915.

From the xe point of view, I would say applying patch 1-3 is sufficient.
Patch 3 because i915_utils.h is used by display still, unfortunately..

I would recommend adjusting the patch to keep the display tracepoints enabled on xe,
the non-vblank i915-specific patches should be harmless to apply.

After that, the rest can be applied too.

While I understand the theoretical need for more testing, I think we should go for practical and apply
patch 1-2 too. Even on normal kernels there is absolutely no guarantee of fast completion.
I'd say on a deterministic -rt kernel, I'd say it's less likely that vblank evasion is a problem.

Kind regards,
~Maarten


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-07-11 12:35 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08 19:35 PREEMPT_RT vs i915 Ben Hutchings
2025-07-09 17:30 ` Ville Syrjälä
2025-07-09 19:44   ` Sebastian Andrzej Siewior
2025-07-09 20:09     ` Ville Syrjälä
2025-07-09 22:04       ` Matthew Brost
2025-07-10  6:30         ` Sebastian Andrzej Siewior
2025-07-10 15:21         ` Ville Syrjälä
2025-07-10 18:04           ` Matthew Brost
2025-07-10 18:15             ` Ville Syrjälä
2025-07-10  4:52       ` Mike Galbraith
2025-07-10 15:50         ` Ville Syrjälä
2025-07-11  2:36           ` Mike Galbraith
2025-07-11  3:33             ` Mike Galbraith
2025-07-11  8:05               ` block: lockdep splat with RT config v6.15+ Mike Galbraith
2025-07-11  8:59                 ` Sebastian Andrzej Siewior
2025-07-11  9:03                   ` Mike Galbraith
2025-07-10  6:41       ` PREEMPT_RT vs i915 Sebastian Andrzej Siewior
2025-07-10 15:04         ` Ville Syrjälä
2025-07-10 15:20           ` Sebastian Andrzej Siewior
2025-07-11 12:35             ` Maarten Lankhorst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).