Dovetail/Xenomai 3: Timer tick locking problem

All of lore.kernel.org
 help / color / mirror / Atom feed

* Dovetail/Xenomai 3: Timer tick locking problem
@ 2024-06-06  7:47 Florian Bezdeka
  2024-06-06  8:18 ` Philippe Gerum
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Bezdeka @ 2024-06-06  7:47 UTC (permalink / raw)
  To: xenomai, Philippe Gerum; +Cc: Jan Kiszka

Hi all,

I'm searching for the root cause of the following WARNING - followed by
a complete system hang:

[Xenomai] lock 00000000e04e7d2d already unlocked on CPU #3
          last owner = kernel/xenomai/pipeline/intr.c:26 (xnintr_core_clock_handler(), CPU #-2)
CPU: 3 PID: 31 Comm: ksoftirqd/3 Not tainted 6.1.34-xenomai-1 #1
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006

Up to now I'm only able to reproduce that with VirtualBox when the PV
spinlock infrastructure is enabled. It takes ~5 min to stall the system
by running stress-ng with the --iomix stressor. "nopvspin" to the
kernel cmdline "solves" this problem for now.

I'm able to stall the same image with the --iomix stressor when running
on kvm/qemu as well. Obviously there is no warning triggered. I'm using
"pci=nomsi" on the kernel cmdline to get the same IRQ routing (via
IOAPIC) as VBox.

While reading all the related code I had some questions that I would
like to have answered. 

First question:
Broken locking when Xenomai timer tick interrupts an OOB task? 

The call stack into the xnintr_core_clock_handler():

#0  xnintr_host_tick (sched=0xffff88803e8ab060) at kernel/xenomai/pipeline/intr.c:14
#1  xnintr_core_clock_handler () at kernel/xenomai/pipeline/intr.c:40
#2  0xffffffff81061fdc in clockevents_handle_event (ced=0xffff88803e89c280) at ./include/linux/clockchips.h:281
#3  lapic_oob_handler (irq=<optimized out>, dev_id=<optimized out>) at arch/x86/kernel/apic/apic.c:503
#4  0xffffffff81112ecc in do_oob_irq (desc=desc@entry=0xffff888003929400) at kernel/irq/pipeline.c:933
#5  0xffffffff8111315a in handle_oob_irq (desc=0xffff888003929400) at kernel/irq/pipeline.c:1036
#6  handle_oob_irq (desc=0xffff888003929400) at kernel/irq/pipeline.c:991
#7  0xffffffff811133ff in generic_handle_irq_desc (desc=0xffff888003929400) at ./include/linux/irqdesc.h:161
#8  generic_pipeline_irq_desc (desc=desc@entry=0xffff888003929400) at kernel/irq/pipeline.c:1141
#9  0xffffffff81070e78 in arch_handle_irq (regs=regs@entry=0xffffc9000009be38, vector=vector@entry=236 '\354', irq_movable=irq_movable@entry=false) at arch/x86/kernel/irq_pipeline.c:243
#10 0xffffffff81f48dcb in arch_pipeline_entry (regs=0xffffc9000009be38, vector=236 '\354') at arch/x86/kernel/irq_pipeline.c:291
#11 0xffffffff8200148a in asm_sysvec_apic_timer_interrupt () at ./arch/x86/include/asm/idtentry.h:760
#12 0x0000000000000000 in ?? ()

To my understanding this is an OOB IRQ, interrupting any task, OOB
tasks included. The code in xnintr_core_clock_handler():

	xnlock_get(&nklock);
	xnclock_tick(&nkclock);
	xnlock_put(&nklock);

When running over an OOB task that is currently owning nklock, we will
release the lock unconditionally, leaving the task "unprotected" /
unsynchronized. Right?

There is no OOB task running on my system yet, so it's likely not my
original problem.

Second question:
Back to the original problem. If I interpret the warning correctly
something like

    CPU #2                      CPU #3
                                xnlock_get(&nklock)
    xnlock_get(&nklock)
    xnlock_put(&nklock)
                                xnlock_put(&nklock) (triggers WARN)

must have happened. I think we agree that the infrastructure behind
xnlock_get() should take care that this scenario can never happen. Any
ideas what is going on here? Might that be a bug in the VBox PV
implementation?

The vCPUs should be "parked"/"halted" instead of spinning. On x86 we
run into kvm_wait() doing a "sti;hlt;" combination.

Once kicked by the current lock holder we should wake up and continue
after "hlt;". Even if this goes somehow wrong, some checks about
spurious wakeups are in place.

Might that be a memory "visibility problem", maybe due to a missing
barrier?

Ideas welcome...

Best regards,
Florian

-- 
Siemens AG, Technology
Linux Expert Center

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dovetail/Xenomai 3: Timer tick locking problem
  2024-06-06  7:47 Dovetail/Xenomai 3: Timer tick locking problem Florian Bezdeka
@ 2024-06-06  8:18 ` Philippe Gerum
  2024-06-06  8:37   ` Florian Bezdeka
  2024-06-06 10:47   ` Florian Bezdeka
  0 siblings, 2 replies; 9+ messages in thread
From: Philippe Gerum @ 2024-06-06  8:18 UTC (permalink / raw)
  To: Florian Bezdeka; +Cc: xenomai, Jan Kiszka


Florian Bezdeka <florian.bezdeka@siemens.com> writes:

> Hi all,
>
> I'm searching for the root cause of the following WARNING - followed by
> a complete system hang:
>
>
> [Xenomai] lock 00000000e04e7d2d already unlocked on CPU #3
>           last owner = kernel/xenomai/pipeline/intr.c:26
> (xnintr_core_clock_handler(), CPU #-2)

Mm, -2 looks pretty bad. This should be either a valid CPU#, or -1 if free (~0).

> CPU: 3 PID: 31 Comm: ksoftirqd/3 Not tainted 6.1.34-xenomai-1 #1
> Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>
>
> Up to now I'm only able to reproduce that with VirtualBox when the PV
> spinlock infrastructure is enabled. It takes ~5 min to stall the system
> by running stress-ng with the --iomix stressor. "nopvspin" to the
> kernel cmdline "solves" this problem for now.
>
> I'm able to stall the same image with the --iomix stressor when running
> on kvm/qemu as well. Obviously there is no warning triggered. I'm using
> "pci=nomsi" on the kernel cmdline to get the same IRQ routing (via
> IOAPIC) as VBox.
>
> While reading all the related code I had some questions that I would
> like to have answered. 
>
>
> First question:
> Broken locking when Xenomai timer tick interrupts an OOB task? 
>
> The call stack into the xnintr_core_clock_handler():
>
> #0  xnintr_host_tick (sched=0xffff88803e8ab060) at kernel/xenomai/pipeline/intr.c:14
> #1  xnintr_core_clock_handler () at kernel/xenomai/pipeline/intr.c:40
> #2  0xffffffff81061fdc in clockevents_handle_event (ced=0xffff88803e89c280) at ./include/linux/clockchips.h:281
> #3  lapic_oob_handler (irq=<optimized out>, dev_id=<optimized out>) at arch/x86/kernel/apic/apic.c:503
> #4  0xffffffff81112ecc in do_oob_irq (desc=desc@entry=0xffff888003929400) at kernel/irq/pipeline.c:933
> #5  0xffffffff8111315a in handle_oob_irq (desc=0xffff888003929400) at kernel/irq/pipeline.c:1036
> #6  handle_oob_irq (desc=0xffff888003929400) at kernel/irq/pipeline.c:991
> #7  0xffffffff811133ff in generic_handle_irq_desc (desc=0xffff888003929400) at ./include/linux/irqdesc.h:161
> #8  generic_pipeline_irq_desc (desc=desc@entry=0xffff888003929400) at kernel/irq/pipeline.c:1141
> #9  0xffffffff81070e78 in arch_handle_irq (regs=regs@entry=0xffffc9000009be38, vector=vector@entry=236 '\354', irq_movable=irq_movable@entry=false) at arch/x86/kernel/irq_pipeline.c:243
> #10 0xffffffff81f48dcb in arch_pipeline_entry (regs=0xffffc9000009be38, vector=236 '\354') at arch/x86/kernel/irq_pipeline.c:291
> #11 0xffffffff8200148a in asm_sysvec_apic_timer_interrupt () at ./arch/x86/include/asm/idtentry.h:760
> #12 0x0000000000000000 in ?? ()
>
> To my understanding this is an OOB IRQ, interrupting any task, OOB
> tasks included. The code in xnintr_core_clock_handler():
>
> 	xnlock_get(&nklock);
> 	xnclock_tick(&nkclock);
> 	xnlock_put(&nklock);
>
> When running over an OOB task that is currently owning nklock, we will
> release the lock unconditionally, leaving the task "unprotected" /
> unsynchronized. Right?

No, the only way for a task to hold the ugly lock safely is to disable
IRQs if it has to compete with an IRQ handler. So this scenario is by
definition a usage bug on the application/driver side, not on the
infrastructure's. Meanwhile, the _irqsave() variant prevents spurious
lock release in recursion using a special marker in the saved interrupt
flags.

>
> There is no OOB task running on my system yet, so it's likely not my
> original problem.
>
>
> Second question:
> Back to the original problem. If I interpret the warning correctly
> something like
>
>     CPU #2                      CPU #3
>                                 xnlock_get(&nklock)
>     xnlock_get(&nklock)
>     xnlock_put(&nklock)
>                                 xnlock_put(&nklock) (triggers WARN)
>
> must have happened. I think we agree that the infrastructure behind
> xnlock_get() should take care that this scenario can never happen. Any
> ideas what is going on here? Might that be a bug in the VBox PV
> implementation?
>
> The vCPUs should be "parked"/"halted" instead of spinning. On x86 we
> run into kvm_wait() doing a "sti;hlt;" combination.
>
> Once kicked by the current lock holder we should wake up and continue
> after "hlt;". Even if this goes somehow wrong, some checks about
> spurious wakeups are in place.
>
> Might that be a memory "visibility problem", maybe due to a missing
> barrier?

I would address the "-2" weirdness before considering anything else.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dovetail/Xenomai 3: Timer tick locking problem
  2024-06-06  8:18 ` Philippe Gerum
@ 2024-06-06  8:37   ` Florian Bezdeka
  2024-06-06  9:04     ` Philippe Gerum
  2024-06-06 10:47   ` Florian Bezdeka
  1 sibling, 1 reply; 9+ messages in thread
From: Florian Bezdeka @ 2024-06-06  8:37 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai, Jan Kiszka

On Thu, 2024-06-06 at 10:18 +0200, Philippe Gerum wrote:
> > [Xenomai] lock 00000000e04e7d2d already unlocked on CPU #3
> >            last owner = kernel/xenomai/pipeline/intr.c:26
> > (xnintr_core_clock_handler(), CPU #-2)
> 
> Mm, -2 looks pretty bad. This should be either a valid CPU#, or -1 if free (~0).

-2 is coming from the "last owner" member, the (current) owner is -1 at
that time. The lock has been released on CPU 2, but somehow CPU 3 is no
longer the owner. Right?

Means CPU 3 was the owner at some point (assuming there is no double
release), but when it comes to releasing the lock the CPU is no longer
the owner...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dovetail/Xenomai 3: Timer tick locking problem
  2024-06-06  8:37   ` Florian Bezdeka
@ 2024-06-06  9:04     ` Philippe Gerum
  0 siblings, 0 replies; 9+ messages in thread
From: Philippe Gerum @ 2024-06-06  9:04 UTC (permalink / raw)
  To: Florian Bezdeka; +Cc: xenomai, Jan Kiszka


Florian Bezdeka <florian.bezdeka@siemens.com> writes:

> On Thu, 2024-06-06 at 10:18 +0200, Philippe Gerum wrote:
>> > [Xenomai] lock 00000000e04e7d2d already unlocked on CPU #3
>> >            last owner = kernel/xenomai/pipeline/intr.c:26
>> > (xnintr_core_clock_handler(), CPU #-2)
>> 
>> Mm, -2 looks pretty bad. This should be either a valid CPU#, or -1 if free (~0).
>
> -2 is coming from the "last owner" member, the (current) owner is -1 at
> that time. The lock has been released on CPU 2, but somehow CPU 3 is no
> longer the owner. Right?

Technically, -2 comes from the "owner" member of the lock, sign-inverted
to mark it as released. So CPU #2 was the last owner, which I agree, is
clearly wrong since CPU#3 managed to grab it earlier without releasing
it just yet.

>
> Means CPU 3 was the owner at some point (assuming there is no double
> release), but when it comes to releasing the lock the CPU is no longer
> the owner...

Which indeed makes no sense, unless some membar issue is hiding in the
dark.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dovetail/Xenomai 3: Timer tick locking problem
  2024-06-06  8:18 ` Philippe Gerum
  2024-06-06  8:37   ` Florian Bezdeka
@ 2024-06-06 10:47   ` Florian Bezdeka
  2024-06-06 12:42     ` Philippe Gerum
  1 sibling, 1 reply; 9+ messages in thread
From: Florian Bezdeka @ 2024-06-06 10:47 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai, Jan Kiszka

On Thu, 2024-06-06 at 10:18 +0200, Philippe Gerum wrote:
> > When running over an OOB task that is currently owning nklock, we will
> > release the lock unconditionally, leaving the task "unprotected" /
> > unsynchronized. Right?
> 
> No, the only way for a task to hold the ugly lock safely is to disable
> IRQs if it has to compete with an IRQ handler. So this scenario is by
> definition a usage bug on the application/driver side, not on the
> infrastructure's. Meanwhile, the _irqsave() variant prevents spurious
> lock release in recursion using a special marker in the saved interrupt
> flags.

I just checked all nklock usages. Most of them are indeed using the IRQ
safe variants. But:

In xnthread_relax() we have:

    splmax();
    xnlock_get(&nklock);
    xnthread_suspend(...);
        xnlock_get_irqsave(&nklock, s);
        xnlock_put_irqrestore(&nklock, s);
    splnone();

As xnthread_suspend() is using the _irqsave() variants, it's basically
a noop, the recursion will be detected / handled correctly.

What happens, if right after splnone() the Xenomai timer tick is
handled?





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dovetail/Xenomai 3: Timer tick locking problem
  2024-06-06 10:47   ` Florian Bezdeka
@ 2024-06-06 12:42     ` Philippe Gerum
  2024-06-07  7:37       ` Florian Bezdeka
  0 siblings, 1 reply; 9+ messages in thread
From: Philippe Gerum @ 2024-06-06 12:42 UTC (permalink / raw)
  To: Florian Bezdeka; +Cc: xenomai, Jan Kiszka


Florian Bezdeka <florian.bezdeka@siemens.com> writes:

> On Thu, 2024-06-06 at 10:18 +0200, Philippe Gerum wrote:
>> > When running over an OOB task that is currently owning nklock, we will
>> > release the lock unconditionally, leaving the task "unprotected" /
>> > unsynchronized. Right?
>> 
>> No, the only way for a task to hold the ugly lock safely is to disable
>> IRQs if it has to compete with an IRQ handler. So this scenario is by
>> definition a usage bug on the application/driver side, not on the
>> infrastructure's. Meanwhile, the _irqsave() variant prevents spurious
>> lock release in recursion using a special marker in the saved interrupt
>> flags.
>
> I just checked all nklock usages. Most of them are indeed using the IRQ
> safe variants. But:
>
> In xnthread_relax() we have:
>
>     splmax();
>     xnlock_get(&nklock);
>     xnthread_suspend(...);
>         xnlock_get_irqsave(&nklock, s);
>         xnlock_put_irqrestore(&nklock, s);
>     splnone();
>
> As xnthread_suspend() is using the _irqsave() variants, it's basically
> a noop, the recursion will be detected / handled correctly.
>
> What happens, if right after splnone() the Xenomai timer tick is
> handled?

At this point, the current task is relaxed and does not hold the nklock
anymore. It's been released by ___xnsched_run() <- xnthread_suspend(),
on the exit path.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dovetail/Xenomai 3: Timer tick locking problem
  2024-06-06 12:42     ` Philippe Gerum
@ 2024-06-07  7:37       ` Florian Bezdeka
  2024-06-07  9:17         ` Jan Kiszka
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Bezdeka @ 2024-06-07  7:37 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai, Jan Kiszka

On Thu, 2024-06-06 at 14:42 +0200, Philippe Gerum wrote:
> Florian Bezdeka <florian.bezdeka@siemens.com> writes:
> 
> > On Thu, 2024-06-06 at 10:18 +0200, Philippe Gerum wrote:
> > > > When running over an OOB task that is currently owning nklock, we will
> > > > release the lock unconditionally, leaving the task "unprotected" /
> > > > unsynchronized. Right?
> > > 
> > > No, the only way for a task to hold the ugly lock safely is to disable
> > > IRQs if it has to compete with an IRQ handler. So this scenario is by
> > > definition a usage bug on the application/driver side, not on the
> > > infrastructure's. Meanwhile, the _irqsave() variant prevents spurious
> > > lock release in recursion using a special marker in the saved interrupt
> > > flags.
> > 
> > I just checked all nklock usages. Most of them are indeed using the IRQ
> > safe variants. But:
> > 
> > In xnthread_relax() we have:
> > 
> >     splmax();
> >     xnlock_get(&nklock);
> >     xnthread_suspend(...);
> >         xnlock_get_irqsave(&nklock, s);
> >         xnlock_put_irqrestore(&nklock, s);
> >     splnone();
> > 
> > As xnthread_suspend() is using the _irqsave() variants, it's basically
> > a noop, the recursion will be detected / handled correctly.
> > 
> > What happens, if right after splnone() the Xenomai timer tick is
> > handled?
> 
> At this point, the current task is relaxed and does not hold the nklock
> anymore. It's been released by ___xnsched_run() <- xnthread_suspend(),
> on the exit path.
> 

Exactly. Thanks Philippe.

I think I narrowed it down to kvm_wait(). Propably it's not a good idea
to enable the hw IRQs unconditionally after coming back from a vCPU
kick. Testing ongoing, maybe you can have a look as well. 




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dovetail/Xenomai 3: Timer tick locking problem
  2024-06-07  7:37       ` Florian Bezdeka
@ 2024-06-07  9:17         ` Jan Kiszka
  2024-06-07 13:15           ` Florian Bezdeka
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2024-06-07  9:17 UTC (permalink / raw)
  To: Florian Bezdeka, Philippe Gerum; +Cc: xenomai

On 07.06.24 09:37, Florian Bezdeka wrote:
> On Thu, 2024-06-06 at 14:42 +0200, Philippe Gerum wrote:
>> Florian Bezdeka <florian.bezdeka@siemens.com> writes:
>>
>>> On Thu, 2024-06-06 at 10:18 +0200, Philippe Gerum wrote:
>>>>> When running over an OOB task that is currently owning nklock, we will
>>>>> release the lock unconditionally, leaving the task "unprotected" /
>>>>> unsynchronized. Right?
>>>>
>>>> No, the only way for a task to hold the ugly lock safely is to disable
>>>> IRQs if it has to compete with an IRQ handler. So this scenario is by
>>>> definition a usage bug on the application/driver side, not on the
>>>> infrastructure's. Meanwhile, the _irqsave() variant prevents spurious
>>>> lock release in recursion using a special marker in the saved interrupt
>>>> flags.
>>>
>>> I just checked all nklock usages. Most of them are indeed using the IRQ
>>> safe variants. But:
>>>
>>> In xnthread_relax() we have:
>>>
>>>     splmax();
>>>     xnlock_get(&nklock);
>>>     xnthread_suspend(...);
>>>         xnlock_get_irqsave(&nklock, s);
>>>         xnlock_put_irqrestore(&nklock, s);
>>>     splnone();
>>>
>>> As xnthread_suspend() is using the _irqsave() variants, it's basically
>>> a noop, the recursion will be detected / handled correctly.
>>>
>>> What happens, if right after splnone() the Xenomai timer tick is
>>> handled?
>>
>> At this point, the current task is relaxed and does not hold the nklock
>> anymore. It's been released by ___xnsched_run() <- xnthread_suspend(),
>> on the exit path.
>>
> 
> Exactly. Thanks Philippe.
> 
> I think I narrowed it down to kvm_wait(). Propably it's not a good idea
> to enable the hw IRQs unconditionally after coming back from a vCPU
> kick. Testing ongoing, maybe you can have a look as well. 
> 

If we ever enter kvm_wait with hard-IRQs off, they must not be turned on
before, while or after waiting. Maybe this condition is simply missing
from the implementation because it assumed to only be enter with
hard-IRQs on?

Jan

-- 
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dovetail/Xenomai 3: Timer tick locking problem
  2024-06-07  9:17         ` Jan Kiszka
@ 2024-06-07 13:15           ` Florian Bezdeka
  0 siblings, 0 replies; 9+ messages in thread
From: Florian Bezdeka @ 2024-06-07 13:15 UTC (permalink / raw)
  To: Jan Kiszka, Philippe Gerum; +Cc: xenomai

On Fri, 2024-06-07 at 11:17 +0200, Jan Kiszka wrote:
> On 07.06.24 09:37, Florian Bezdeka wrote:
> > On Thu, 2024-06-06 at 14:42 +0200, Philippe Gerum wrote:
> > > Florian Bezdeka <florian.bezdeka@siemens.com> writes:
> > > 
> > > > On Thu, 2024-06-06 at 10:18 +0200, Philippe Gerum wrote:
> > > > > > When running over an OOB task that is currently owning nklock, we will
> > > > > > release the lock unconditionally, leaving the task "unprotected" /
> > > > > > unsynchronized. Right?
> > > > > 
> > > > > No, the only way for a task to hold the ugly lock safely is to disable
> > > > > IRQs if it has to compete with an IRQ handler. So this scenario is by
> > > > > definition a usage bug on the application/driver side, not on the
> > > > > infrastructure's. Meanwhile, the _irqsave() variant prevents spurious
> > > > > lock release in recursion using a special marker in the saved interrupt
> > > > > flags.
> > > > 
> > > > I just checked all nklock usages. Most of them are indeed using the IRQ
> > > > safe variants. But:
> > > > 
> > > > In xnthread_relax() we have:
> > > > 
> > > >     splmax();
> > > >     xnlock_get(&nklock);
> > > >     xnthread_suspend(...);
> > > >         xnlock_get_irqsave(&nklock, s);
> > > >         xnlock_put_irqrestore(&nklock, s);
> > > >     splnone();
> > > > 
> > > > As xnthread_suspend() is using the _irqsave() variants, it's basically
> > > > a noop, the recursion will be detected / handled correctly.
> > > > 
> > > > What happens, if right after splnone() the Xenomai timer tick is
> > > > handled?
> > > 
> > > At this point, the current task is relaxed and does not hold the nklock
> > > anymore. It's been released by ___xnsched_run() <- xnthread_suspend(),
> > > on the exit path.
> > > 
> > 
> > Exactly. Thanks Philippe.
> > 
> > I think I narrowed it down to kvm_wait(). Propably it's not a good idea
> > to enable the hw IRQs unconditionally after coming back from a vCPU
> > kick. Testing ongoing, maybe you can have a look as well. 
> > 
> 
> If we ever enter kvm_wait with hard-IRQs off, they must not be turned on
> before, while or after waiting. Maybe this condition is simply missing
> from the implementation because it assumed to only be enter with
> hard-IRQs on?

Hm... I think HW IRQ enablement is still allowed, or even necessary to
allow the processing of those events. Otherwise non-RT spinlocks would
stall RT tasks. In my opinion the handling of PV spinlocks should be
the same as of "normal" ones, basically running the slow path
(spinning) in HW IRQ enabled state.

I'm currently playing around with 

+#if CONFIG_IRQ_PIPELINE
+       if (hard_irqs_disabled() && READ_ONCE(*ptr) == val) {
+               safe_halt(); // enables HW IRQs
+               hard_local_irq_disable();
+               return;
+       }
+#endif
+

in kvm_wait(). That makes LOCKDEP happy and seems to survive my stress
test (so far).

Btw: I'm not sure if the current kvm_wait() implementation is correct
for the !CONFIG_IRQ_PIPELINE case. Will have to check...

A different approach would be the disablement of PV spinlocks if IRQ
pipelining is active. WDYT?

Florian

> 
> Jan
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-06-07 13:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-06  7:47 Dovetail/Xenomai 3: Timer tick locking problem Florian Bezdeka
2024-06-06  8:18 ` Philippe Gerum
2024-06-06  8:37   ` Florian Bezdeka
2024-06-06  9:04     ` Philippe Gerum
2024-06-06 10:47   ` Florian Bezdeka
2024-06-06 12:42     ` Philippe Gerum
2024-06-07  7:37       ` Florian Bezdeka
2024-06-07  9:17         ` Jan Kiszka
2024-06-07 13:15           ` Florian Bezdeka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.