* [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
@ 2026-05-07 2:36 Mauricio Faria de Oliveira
2026-05-07 6:58 ` David Woodhouse
0 siblings, 1 reply; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2026-05-07 2:36 UTC (permalink / raw)
To: David Woodhouse, Paul Durrant, Sean Christopherson, Paolo Bonzini,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
Cc: kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90, Mauricio Faria de Oliveira
kvm_xen_set_evtchn_fast() calls read_lock_irqsave(), which might block
on PREEMPT_RT, but that is invalid in IRQ context, as when it's called
by xen_timer_callback() (even on PREEMPT_RT per HRTIMER_MODE_ABS_HARD).
Check for that case, and bail out early.
Note: there is previous work and discussion on this [1] (~2 years ago),
which involved continuing to execute the function with changes, but it
was not merged. That was a different, more complex approach.
[1] https://lore.kernel.org/lkml/ZdPQVP7eejq3eFjc@google.com/
This is quickly hit while booting a Xen guest in a KVM Xen host.
With this patch, it boots quietly and runs timer stress without issues
(e.g., stress-ng --quiet --timer 1 --timer-freq 19000 --timer-slack 0).
Tested with/without CONFIG_PREEMPT_RT.
Test case:
=========
Configure a host kernel (CONFIG_KVM_XEN) like,
$ make x86_64_defconfig
$ ./scripts/config \
-e EXPERT -e PREEMPT_RT -e DEBUG_ATOMIC_SLEEP \
-e KVM -e KVM_INTEL -e KVM_AMD -e KVM_XEN
$ make olddefconfig
and boot a Xen guest kernel (CONFIG_XEN) with:
# qemu-system-x86_64 \
-accel kvm,xen-version=0x40011,kernel-irqchip=split \
-cpu host,+xen-vapic -smp 1 -m 1024 \
-nodefaults -nographic -serial stdio \
-kernel arch/x86/boot/bzImage -append 'console=ttyS0'
See dmesg in the host:
[ 27.643129] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:231
[ 27.643134] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 284, name: qemu-system-x86
[ 27.643137] preempt_count: 10000, expected: 0
[ 27.643138] RCU nest depth: 0, expected: 0
[ 27.643146] CPU: 1 UID: 0 PID: 284 Comm: qemu-system-x86 Not tainted 7.1.0-rc2 #5 PREEMPT_{RT,(lazy)}
[ 27.643150] Hardware name: QEMU Ubuntu 25.10 PC v2 (i440FX + PIIX, + 10.1 machine, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 27.643152] Call Trace:
[ 27.643155] <TASK>
[ 27.643157] dump_stack_lvl+0x64/0x80
[ 27.643165] __might_resched+0x131/0x180
[ 27.643171] rt_read_lock+0x47/0x210
[ 27.643176] kvm_xen_set_evtchn_fast+0xa5/0x3f0
[ 27.643184] xen_timer_callback+0x88/0xc0
[ 27.643188] __hrtimer_run_queues+0x10b/0x280
[ 27.643193] hrtimer_interrupt+0xf6/0x1b0
[ 27.643196] __sysvec_apic_timer_interrupt+0x55/0x130
[ 27.643200] sysvec_apic_timer_interrupt+0x39/0x80
[ 27.643204] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 27.643208] RIP: 0033:0x7f069721a8db
...
[ 27.643226] </TASK>
Reported-by: syzbot+208f7f3e5f59c11aeb90@syzkaller.appspotmail.com
Closes: https://syzbot.org/bug?extid=208f7f3e5f59c11aeb90
Signed-off-by: Mauricio Faria de Oliveira <mfo@igalia.com>
---
arch/x86/kvm/xen.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 91fd3673c09a2ef3dc154050e01df608182e59e5..76782191043b56c581f89c3861979236662cdbd7 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1814,6 +1814,10 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
rc = -EWOULDBLOCK;
+ /* Bail in IRQ context on PREEMPT_RT; read_lock_irqsave() might block */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_hardirq())
+ goto out;
+
idx = srcu_read_lock(&kvm->srcu);
read_lock_irqsave(&gpc->lock, flags);
@@ -1892,6 +1896,7 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
kvm_vcpu_kick(vcpu);
}
+ out:
return rc;
}
---
base-commit: 7fd2df204f342fc17d1a0bfcd474b24232fb0f32
change-id: 20260506-xen-rt-sleep-e71b92097f19
Best regards,
--
Mauricio Faria de Oliveira <mfo@igalia.com>
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 2:36 [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast() Mauricio Faria de Oliveira
@ 2026-05-07 6:58 ` David Woodhouse
2026-05-07 7:12 ` Sebastian Andrzej Siewior
2026-05-07 14:56 ` Mauricio Faria de Oliveira
0 siblings, 2 replies; 15+ messages in thread
From: David Woodhouse @ 2026-05-07 6:58 UTC (permalink / raw)
To: Mauricio Faria de Oliveira, Paul Durrant, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt
Cc: kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
[-- Attachment #1: Type: text/plain, Size: 2080 bytes --]
On Wed, 2026-05-06 at 23:36 -0300, Mauricio Faria de Oliveira wrote:
> kvm_xen_set_evtchn_fast() calls read_lock_irqsave(), which might block
> on PREEMPT_RT, but that is invalid in IRQ context, as when it's called
> by xen_timer_callback() (even on PREEMPT_RT per HRTIMER_MODE_ABS_HARD).
>
> Check for that case, and bail out early.
>
> Note: there is previous work and discussion on this [1] (~2 years ago),
> which involved continuing to execute the function with changes, but it
> was not merged. That was a different, more complex approach.
>
> [1] https://lore.kernel.org/lkml/ZdPQVP7eejq3eFjc@google.com/
...
> + /* Bail in IRQ context on PREEMPT_RT; read_lock_irqsave() might block */
> + if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_hardirq())
> + goto out;
The approach in Paul's earlier patch was better; we absolutely *want*
to deliver the interrupt to the guest immediately whenever we can, and
only fall back to the workqueue in the rare case that the shared info
page has been invalidated.
We should switch to plain read_trylock(), *without* the
local_irq_save(). And since this was the *only* case where the GPC lock
was ever taken under IRQ¹, all the GPC locking can drop the _irq part.
Sean's concern was:
>> I am not comfortable applying this patch. As shown by the need for the next patch
>> to optimize unrelated invalidations, switching to read_trylock() is more subtle
>> than it seems at first glance. Specifically, there are no fairness guarantees.
I'm OK with that in this case. Because kvm_xen_set_evtchn_fast(), as
with *everything* called from kvm_arch_set_irq_inatomic(), is
explicitly designed to be a 'best effort' and allowed to return
-EWOULDBLOCK when it's too hard.
And the write lock being held here should a *rare* case, as the GPC for
the shared_info and vcpu_info pages should basically *never* get
invalidated while the guest is running.
I've taken the same read_trylock() approach in
https://lore.kernel.org/all/1d6712ed413ea66ef376d1410811997c3b416e99.camel@infradead.org/
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 6:58 ` David Woodhouse
@ 2026-05-07 7:12 ` Sebastian Andrzej Siewior
2026-05-07 7:30 ` David Woodhouse
2026-05-07 13:00 ` David Woodhouse
2026-05-07 14:56 ` Mauricio Faria de Oliveira
1 sibling, 2 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-05-07 7:12 UTC (permalink / raw)
To: David Woodhouse
Cc: Mauricio Faria de Oliveira, Paul Durrant, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
On 2026-05-07 07:58:00 [+0100], David Woodhouse wrote:
> >> I am not comfortable applying this patch. As shown by the need for the next patch
> >> to optimize unrelated invalidations, switching to read_trylock() is more subtle
> >> than it seems at first glance. Specifically, there are no fairness guarantees.
>
> I'm OK with that in this case. Because kvm_xen_set_evtchn_fast(), as
> with *everything* called from kvm_arch_set_irq_inatomic(), is
> explicitly designed to be a 'best effort' and allowed to return
> -EWOULDBLOCK when it's too hard.
>
> And the write lock being held here should a *rare* case, as the GPC for
> the shared_info and vcpu_info pages should basically *never* get
> invalidated while the guest is running.
>
> I've taken the same read_trylock() approach in
> https://lore.kernel.org/all/1d6712ed413ea66ef376d1410811997c3b416e99.camel@infradead.org/
So the cited patch does not look bad. That read-trylock should be fine
on RT (as in I don't see anything wrong with it). What did happen to it?
Sebastian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 7:12 ` Sebastian Andrzej Siewior
@ 2026-05-07 7:30 ` David Woodhouse
2026-05-07 13:00 ` David Woodhouse
1 sibling, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2026-05-07 7:30 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Mauricio Faria de Oliveira, Paul Durrant, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
[-- Attachment #1: Type: text/plain, Size: 1256 bytes --]
On Thu, 2026-05-07 at 09:12 +0200, Sebastian Andrzej Siewior wrote:
> On 2026-05-07 07:58:00 [+0100], David Woodhouse wrote:
> > > > I am not comfortable applying this patch. As shown by the need for the next patch
> > > > to optimize unrelated invalidations, switching to read_trylock() is more subtle
> > > > than it seems at first glance. Specifically, there are no fairness guarantees.
> >
> > I'm OK with that in this case. Because kvm_xen_set_evtchn_fast(), as
> > with *everything* called from kvm_arch_set_irq_inatomic(), is
> > explicitly designed to be a 'best effort' and allowed to return
> > -EWOULDBLOCK when it's too hard.
> >
> > And the write lock being held here should a *rare* case, as the GPC for
> > the shared_info and vcpu_info pages should basically *never* get
> > invalidated while the guest is running.
> >
> > I've taken the same read_trylock() approach in
> > https://lore.kernel.org/all/1d6712ed413ea66ef376d1410811997c3b416e99.camel@infradead.org/
>
> So the cited patch does not look bad. That read-trylock should be fine
> on RT (as in I don't see anything wrong with it). What did happen to it?
Nothing, but it's addressing a different problem, not the
kvm_xen_set_evtchn_fast() problem.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 7:12 ` Sebastian Andrzej Siewior
2026-05-07 7:30 ` David Woodhouse
@ 2026-05-07 13:00 ` David Woodhouse
2026-05-07 13:21 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2026-05-07 13:00 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Mauricio Faria de Oliveira, Paul Durrant, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
[-- Attachment #1: Type: text/plain, Size: 3035 bytes --]
On Thu, 2026-05-07 at 09:12 +0200, Sebastian Andrzej Siewior wrote:
>
> So the cited patch does not look bad. That read-trylock should be fine
> on RT (as in I don't see anything wrong with it). What did happen to it?
The read_trylock() may be fine... but does a read_unlock() try to
"re-enable" hardirqs?
[ 112.266981] ------------[ cut here ]------------
[ 112.266987] DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context())
[ 112.266990] WARNING: kernel/locking/lockdep.c:4404 at lockdep_hardirqs_on_prepare.part.0+0xca/0x140, CPU#139: swapper/139/0
[ 112.267010] Modules linked in: binfmt_misc cfg80211 rfkill 8021q garp mrp stp llc vfat fat spi_nor pmt_telemetry mtd iTCO_wdt pmt_discovery intel_pmc_bxt pmt_class intel_sdsi intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac skx_edac_common nfit libnvdimm ena x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate snd_pcm snd_timer snd intel_uncore soundcore pcspkr dax_hmem isst_if_mbox_pci isst_if_mmio isst_if_common intel_vsec spi_intel_pci i2c_i801 spi_intel i2c_smbus i2c_ismt nfnetlink lz4hc_compress lz4_compress iaa_crypto nvme nvme_core nvme_keyring nvme_auth pinctrl_emmitsburg wmi qat_4xxx intel_qat idxd crc8 idxd_bus fuse
[ 112.267088] CPU: 139 UID: 0 PID: 0 Comm: swapper/139 Not tainted 7.1.0-rc2+ #13 PREEMPT_{RT,LAZY}
[ 112.267096] Hardware name: Amazon EC2 c7i.metal-48xl/Not Specified, BIOS 1.0 10/16/2017
[ 112.267099] RIP: 0010:lockdep_hardirqs_on_prepare.part.0+0xd1/0x140
[ 112.267107] Code: 00 00 00 5b c3 cc cc cc cc e8 eb e5 8b 00 85 c0 74 cf 8b 15 11 b5 62 02 85 d2 75 c5 48 8d 3d 66 54 64 02 48 c7 c6 40 4f 7e b4 <67> 48 0f b9 3a 5b c3 cc cc cc cc e8 bf e5 8b 00 85 c0 74 a3 44 8b
[ 112.267112] RSP: 0018:ff60c0a7cf974df8 EFLAGS: 00010046
[ 112.267118] RAX: 0000000000000001 RBX: ff251118e4165cf0 RCX: 0000000000000001
[ 112.267122] RDX: 0000000000000000 RSI: ffffffffb47e4f40 RDI: ffffffffb4fcd6a0
[ 112.267125] RBP: ff60c0a7cf974e70 R08: 0000000000000001 R09: 0000000000000000
[ 112.267127] R10: 0000000000000000 R11: 0000000000000001 R12: ff251118e4166530
[ 112.267130] R13: ff251118e4166548 R14: ff251118e5d82bc0 R15: 0000000000000000
[ 112.267134] FS: 0000000000000000(0000) GS:ff2511184a123000(0000) knlGS:0000000000000000
[ 112.267138] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 112.267142] CR2: 0000000000000000 CR3: 000000535ee7c004 CR4: 0000000000f73ef0
[ 112.267146] PKRU: 55555554
[ 112.267148] Call Trace:
[ 112.267151] <IRQ>
[ 112.267156] trace_hardirqs_on+0x18/0x100
[ 112.267172] ? lock_release.part.0+0x1c/0x50
[ 112.267179] _raw_spin_unlock_irq+0x28/0x50
[ 112.267193] rt_read_unlock+0x1a2/0x290
[ 112.267202] kvm_xen_set_evtchn_fast+0x347/0x480
[ 112.267217] ? __lock_release.isra.0+0x59/0x170
[ 112.267224] ? __pfx_xen_timer_callback+0x10/0x10
[ 112.267231] xen_timer_callback+0x8c/0xd0
[ 112.267239] __hrtimer_run_queues+0x86/0x3b0
[ 112.267253] hrtimer_interrupt+0x115/0x240
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 13:00 ` David Woodhouse
@ 2026-05-07 13:21 ` Sebastian Andrzej Siewior
2026-05-07 13:43 ` David Woodhouse
0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-05-07 13:21 UTC (permalink / raw)
To: David Woodhouse
Cc: Mauricio Faria de Oliveira, Paul Durrant, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
On 2026-05-07 14:00:49 [+0100], David Woodhouse wrote:
> On Thu, 2026-05-07 at 09:12 +0200, Sebastian Andrzej Siewior wrote:
> >
> > So the cited patch does not look bad. That read-trylock should be fine
> > on RT (as in I don't see anything wrong with it). What did happen to it?
>
> The read_trylock() may be fine... but does a read_unlock() try to
> "re-enable" hardirqs?
Yes, I missed it while looking for it. This could become a _irqsave().
We do this already for spinlock_t/ rt_spin_lock()/ _unlock() because it
is used during early boot where interrupts are disabled and locks are
acquired. So it needs to preserve the state and since rwlock_t is not
(yet) used during early boot it was not done/ observed.
The try-lock on the read lock just increments a counter in order to
acquire the lock. This is it. The spinlock_t on the other hand records
the current context as owner which can lead to a mess. Therefore a
trylock on a spinlock_t from hardirq context is wrong but it should be
doable for rwlock_t. I don't see anything wrong with it (except for this
one thing can be corrected).
Sebastian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 13:21 ` Sebastian Andrzej Siewior
@ 2026-05-07 13:43 ` David Woodhouse
2026-05-07 14:54 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2026-05-07 13:43 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Mauricio Faria de Oliveira, Paul Durrant, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
[-- Attachment #1: Type: text/plain, Size: 3218 bytes --]
On Thu, 2026-05-07 at 15:21 +0200, Sebastian Andrzej Siewior wrote:
> On 2026-05-07 14:00:49 [+0100], David Woodhouse wrote:
> > On Thu, 2026-05-07 at 09:12 +0200, Sebastian Andrzej Siewior wrote:
> > >
> > > So the cited patch does not look bad. That read-trylock should be fine
> > > on RT (as in I don't see anything wrong with it). What did happen to it?
> >
> > The read_trylock() may be fine... but does a read_unlock() try to
> > "re-enable" hardirqs?
>
> Yes, I missed it while looking for it. This could become a _irqsave().
>
> We do this already for spinlock_t/ rt_spin_lock()/ _unlock() because it
> is used during early boot where interrupts are disabled and locks are
> acquired. So it needs to preserve the state and since rwlock_t is not
> (yet) used during early boot it was not done/ observed.
>
> The try-lock on the read lock just increments a counter in order to
> acquire the lock. This is it. The spinlock_t on the other hand records
> the current context as owner which can lead to a mess. Therefore a
> trylock on a spinlock_t from hardirq context is wrong but it should be
> doable for rwlock_t. I don't see anything wrong with it (except for this
> one thing can be corrected).
Thanks. Something like this?
From 7199173d543d25333418537bcf07d6b2fce18ff1 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Thu, 7 May 2026 14:39:22 +0100
Subject: [PATCH] locking/rt: Use raw_spin_lock_irqsave() in
__rwbase_read_unlock()
__rwbase_read_unlock() uses raw_spin_lock_irq()/raw_spin_unlock_irq()
which unconditionally disables and re-enables interrupts. When
read_unlock() is called from hardirq context (e.g. after a successful
read_trylock() in a timer callback), the raw_spin_unlock_irq()
incorrectly re-enables interrupts within the hardirq handler.
This causes lockdep warnings ('hardirqs_on_prepare' from hardirq
context) and can lead to IRQ state corruption.
Switch to raw_spin_lock_irqsave()/raw_spin_unlock_irqrestore() to
preserve the caller's IRQ state, matching the pattern already used
for rt_spin_lock/unlock.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
kernel/locking/rwbase_rt.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 82e078c0665a..25744862d627 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -153,8 +153,9 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb,
struct rt_mutex_base *rtm = &rwb->rtmutex;
struct task_struct *owner;
DEFINE_RT_WAKE_Q(wqh);
+ unsigned long flags;
- raw_spin_lock_irq(&rtm->wait_lock);
+ raw_spin_lock_irqsave(&rtm->wait_lock, flags);
/*
* Wake the writer, i.e. the rtmutex owner. It might release the
* rtmutex concurrently in the fast path (due to a signal), but to
@@ -167,7 +168,7 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb,
/* Pairs with the preempt_enable in rt_mutex_wake_up_q() */
preempt_disable();
- raw_spin_unlock_irq(&rtm->wait_lock);
+ raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
rt_mutex_wake_up_q(&wqh);
}
--
2.43.0
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 13:43 ` David Woodhouse
@ 2026-05-07 14:54 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-05-07 14:54 UTC (permalink / raw)
To: David Woodhouse
Cc: Mauricio Faria de Oliveira, Paul Durrant, Sean Christopherson,
Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
On 2026-05-07 14:43:20 [+0100], David Woodhouse wrote:
> Thanks. Something like this?
Yes. If you could stuff the below into the patch so it does not look
like fixing symptoms.
> From 7199173d543d25333418537bcf07d6b2fce18ff1 Mon Sep 17 00:00:00 2001
> From: David Woodhouse <dwmw@amazon.co.uk>
> Date: Thu, 7 May 2026 14:39:22 +0100
> Subject: [PATCH] locking/rt: Use raw_spin_lock_irqsave() in
> __rwbase_read_unlock()
>
> __rwbase_read_unlock() uses raw_spin_lock_irq()/raw_spin_unlock_irq()
> which unconditionally disables and re-enables interrupts. When
> read_unlock() is called from hardirq context (e.g. after a successful
> read_trylock() in a timer callback), the raw_spin_unlock_irq()
> incorrectly re-enables interrupts within the hardirq handler.
>
> This causes lockdep warnings ('hardirqs_on_prepare' from hardirq
> context) and can lead to IRQ state corruption.
Using read_trylock() in hardirq context on PREEMPT_RT is safe because it
does not record the lock owner. The read_unlock() acquires the wait_lock
which is hardirq safe. This change additionally allows rwlock_t during
early boot.
>
> Switch to raw_spin_lock_irqsave()/raw_spin_unlock_irqrestore() to
> preserve the caller's IRQ state, matching the pattern already used
> for rt_spin_lock/unlock.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Sebastian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 6:58 ` David Woodhouse
2026-05-07 7:12 ` Sebastian Andrzej Siewior
@ 2026-05-07 14:56 ` Mauricio Faria de Oliveira
2026-05-07 15:22 ` David Woodhouse
1 sibling, 1 reply; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2026-05-07 14:56 UTC (permalink / raw)
To: David Woodhouse
Cc: Paul Durrant, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
On 2026-05-07 03:58, David Woodhouse wrote:
> On Wed, 2026-05-06 at 23:36 -0300, Mauricio Faria de Oliveira wrote:
>> kvm_xen_set_evtchn_fast() calls read_lock_irqsave(), which might block
>> on PREEMPT_RT, but that is invalid in IRQ context, as when it's called
>> by xen_timer_callback() (even on PREEMPT_RT per HRTIMER_MODE_ABS_HARD).
>>
>> Check for that case, and bail out early.
>>
>> Note: there is previous work and discussion on this [1] (~2 years ago),
>> which involved continuing to execute the function with changes, but it
>> was not merged. That was a different, more complex approach.
>>
>> [1] https://lore.kernel.org/lkml/ZdPQVP7eejq3eFjc@google.com/
>
> ...
>
>> + /* Bail in IRQ context on PREEMPT_RT; read_lock_irqsave() might block */
>> + if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_hardirq())
>> + goto out;
>
>
> The approach in Paul's earlier patch was better; we absolutely *want*
> to deliver the interrupt to the guest immediately whenever we can, and
> only fall back to the workqueue in the rare case that the shared info
> page has been invalidated.
Certainly, that was better. This was a simple workaround, but with this
clarification, it indeed doesn't fit.
> We should switch to plain read_trylock(), *without* the
> local_irq_save(). And since this was the *only* case where the GPC lock
> was ever taken under IRQ¹, all the GPC locking can drop the _irq part.
Ok, I can take a look. Or do you plan to work on it yourself (as you
hit the issue with read_unlock later in this thread)?
> Sean's concern was:
>
>>> I am not comfortable applying this patch. As shown by the need for the next patch
>>> to optimize unrelated invalidations, switching to read_trylock() is more subtle
>>> than it seems at first glance. Specifically, there are no fairness guarantees.
>
> I'm OK with that in this case. Because kvm_xen_set_evtchn_fast(), as
> with *everything* called from kvm_arch_set_irq_inatomic(), is
> explicitly designed to be a 'best effort' and allowed to return
> -EWOULDBLOCK when it's too hard.
>
> And the write lock being held here should a *rare* case, as the GPC for
> the shared_info and vcpu_info pages should basically *never* get
> invalidated while the guest is running.
>
> I've taken the same read_trylock() approach in
> https://lore.kernel.org/all/1d6712ed413ea66ef376d1410811997c3b416e99.camel@infradead.org/
Thanks for the pointers.
--
Mauricio
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 14:56 ` Mauricio Faria de Oliveira
@ 2026-05-07 15:22 ` David Woodhouse
2026-05-07 16:02 ` Mauricio Faria de Oliveira
0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2026-05-07 15:22 UTC (permalink / raw)
To: Mauricio Faria de Oliveira
Cc: Paul Durrant, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
[-- Attachment #1: Type: text/plain, Size: 2110 bytes --]
On Thu, 2026-05-07 at 11:56 -0300, Mauricio Faria de Oliveira wrote:
> On 2026-05-07 03:58, David Woodhouse wrote:
> > On Wed, 2026-05-06 at 23:36 -0300, Mauricio Faria de Oliveira wrote:
> > > kvm_xen_set_evtchn_fast() calls read_lock_irqsave(), which might block
> > > on PREEMPT_RT, but that is invalid in IRQ context, as when it's called
> > > by xen_timer_callback() (even on PREEMPT_RT per HRTIMER_MODE_ABS_HARD).
> > >
> > > Check for that case, and bail out early.
> > >
> > > Note: there is previous work and discussion on this [1] (~2 years ago),
> > > which involved continuing to execute the function with changes, but it
> > > was not merged. That was a different, more complex approach.
> > >
> > > [1] https://lore.kernel.org/lkml/ZdPQVP7eejq3eFjc@google.com/
> >
> > ...
> >
> > > + /* Bail in IRQ context on PREEMPT_RT; read_lock_irqsave() might block */
> > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_hardirq())
> > > + goto out;
> >
> >
> > The approach in Paul's earlier patch was better; we absolutely *want*
> > to deliver the interrupt to the guest immediately whenever we can, and
> > only fall back to the workqueue in the rare case that the shared info
> > page has been invalidated.
>
> Certainly, that was better. This was a simple workaround, but with this
> clarification, it indeed doesn't fit.
>
> > We should switch to plain read_trylock(), *without* the
> > local_irq_save(). And since this was the *only* case where the GPC lock
> > was ever taken under IRQ¹, all the GPC locking can drop the _irq part.
>
> Ok, I can take a look. Or do you plan to work on it yourself (as you
> hit the issue with read_unlock later in this thread)?
I'm working on it now; thanks.
Currently confused by the fact that the read_trylock() seems to fail
more often than it should under RT, causing the fallback path to be
taken... and the fallback path doesn't seem to work properly...
Your version should have seen this too, surely? Did the selftest in
linux/tools/testing/selftests/kvm/x86/xen_shinfo_test work with your
patch?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 15:22 ` David Woodhouse
@ 2026-05-07 16:02 ` Mauricio Faria de Oliveira
2026-05-07 16:15 ` David Woodhouse
2026-05-08 17:48 ` David Woodhouse
0 siblings, 2 replies; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2026-05-07 16:02 UTC (permalink / raw)
To: David Woodhouse
Cc: Paul Durrant, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
On 2026-05-07 12:22, David Woodhouse wrote:
> On Thu, 2026-05-07 at 11:56 -0300, Mauricio Faria de Oliveira wrote:
>> On 2026-05-07 03:58, David Woodhouse wrote:
>> > On Wed, 2026-05-06 at 23:36 -0300, Mauricio Faria de Oliveira wrote:
>> > > kvm_xen_set_evtchn_fast() calls read_lock_irqsave(), which might block
>> > > on PREEMPT_RT, but that is invalid in IRQ context, as when it's called
>> > > by xen_timer_callback() (even on PREEMPT_RT per HRTIMER_MODE_ABS_HARD).
>> > >
>> > > Check for that case, and bail out early.
>> > >
>> > > Note: there is previous work and discussion on this [1] (~2 years ago),
>> > > which involved continuing to execute the function with changes, but it
>> > > was not merged. That was a different, more complex approach.
>> > >
>> > > [1] https://lore.kernel.org/lkml/ZdPQVP7eejq3eFjc@google.com/
>> >
>> > ...
>> >
>> > > + /* Bail in IRQ context on PREEMPT_RT; read_lock_irqsave() might block */
>> > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_hardirq())
>> > > + goto out;
>> >
>> >
>> > The approach in Paul's earlier patch was better; we absolutely *want*
>> > to deliver the interrupt to the guest immediately whenever we can, and
>> > only fall back to the workqueue in the rare case that the shared info
>> > page has been invalidated.
>>
>> Certainly, that was better. This was a simple workaround, but with this
>> clarification, it indeed doesn't fit.
>>
>> > We should switch to plain read_trylock(), *without* the
>> > local_irq_save(). And since this was the *only* case where the GPC lock
>> > was ever taken under IRQ¹, all the GPC locking can drop the _irq part.
>>
>> Ok, I can take a look. Or do you plan to work on it yourself (as you
>> hit the issue with read_unlock later in this thread)?
>
> I'm working on it now; thanks.
Ok, thanks; I'll drop this. Could you please Cc me when you send it out?
> Currently confused by the fact that the read_trylock() seems to fail
> more often than it should under RT, causing the fallback path to be
> taken... and the fallback path doesn't seem to work properly...
>
> Your version should have seen this too, surely? Did the selftest in
> linux/tools/testing/selftests/kvm/x86/xen_shinfo_test work with your
> patch?
Yes, it works, apparently. Timing is similar with/without PREEMPT_RT.
# dmesg -C
# time ./xen_shinfo_test
Random seed: 0x6b8b4567
real 0m4.064s
user 0m1.654s
sys 0m4.915s
# echo $?
0
# dmesg
#
--
Mauricio
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 16:02 ` Mauricio Faria de Oliveira
@ 2026-05-07 16:15 ` David Woodhouse
2026-05-07 16:34 ` Mauricio Faria de Oliveira
2026-05-08 17:48 ` David Woodhouse
1 sibling, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2026-05-07 16:15 UTC (permalink / raw)
To: Mauricio Faria de Oliveira
Cc: Paul Durrant, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
[-- Attachment #1: Type: text/plain, Size: 2756 bytes --]
On Thu, 2026-05-07 at 13:02 -0300, Mauricio Faria de Oliveira wrote:
> On 2026-05-07 12:22, David Woodhouse wrote:
> > On Thu, 2026-05-07 at 11:56 -0300, Mauricio Faria de Oliveira wrote:
> > > On 2026-05-07 03:58, David Woodhouse wrote:
> > > > On Wed, 2026-05-06 at 23:36 -0300, Mauricio Faria de Oliveira wrote:
> > > > > kvm_xen_set_evtchn_fast() calls read_lock_irqsave(), which might block
> > > > > on PREEMPT_RT, but that is invalid in IRQ context, as when it's called
> > > > > by xen_timer_callback() (even on PREEMPT_RT per HRTIMER_MODE_ABS_HARD).
> > > > >
> > > > > Check for that case, and bail out early.
> > > > >
> > > > > Note: there is previous work and discussion on this [1] (~2 years ago),
> > > > > which involved continuing to execute the function with changes, but it
> > > > > was not merged. That was a different, more complex approach.
> > > > >
> > > > > [1] https://lore.kernel.org/lkml/ZdPQVP7eejq3eFjc@google.com/
> > > >
> > > > ...
> > > >
> > > > > + /* Bail in IRQ context on PREEMPT_RT; read_lock_irqsave() might block */
> > > > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_hardirq())
> > > > > + goto out;
> > > >
> > > >
> > > > The approach in Paul's earlier patch was better; we absolutely *want*
> > > > to deliver the interrupt to the guest immediately whenever we can, and
> > > > only fall back to the workqueue in the rare case that the shared info
> > > > page has been invalidated.
> > >
> > > Certainly, that was better. This was a simple workaround, but with this
> > > clarification, it indeed doesn't fit.
> > >
> > > > We should switch to plain read_trylock(), *without* the
> > > > local_irq_save(). And since this was the *only* case where the GPC lock
> > > > was ever taken under IRQ¹, all the GPC locking can drop the _irq part.
> > >
> > > Ok, I can take a look. Or do you plan to work on it yourself (as you
> > > hit the issue with read_unlock later in this thread)?
> >
> > I'm working on it now; thanks.
>
> Ok, thanks; I'll drop this. Could you please Cc me when you send it out?
>
> > Currently confused by the fact that the read_trylock() seems to fail
> > more often than it should under RT, causing the fallback path to be
> > taken... and the fallback path doesn't seem to work properly...
> >
> > Your version should have seen this too, surely? Did the selftest in
> > linux/tools/testing/selftests/kvm/x86/xen_shinfo_test work with your
> > patch?
>
> Yes, it works, apparently. Timing is similar with/without PREEMPT_RT.
Huh. Something weird going on for me. Would you mind confirming it
still works with my version:
https://git.infradead.org/?p=users/dwmw2/linux.git;a=shortlog;h=refs/heads/gpc-stealtime
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 16:15 ` David Woodhouse
@ 2026-05-07 16:34 ` Mauricio Faria de Oliveira
2026-05-07 17:45 ` David Woodhouse
0 siblings, 1 reply; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2026-05-07 16:34 UTC (permalink / raw)
To: David Woodhouse
Cc: Paul Durrant, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
On 2026-05-07 13:15, David Woodhouse wrote:
> On Thu, 2026-05-07 at 13:02 -0300, Mauricio Faria de Oliveira wrote:
>> On 2026-05-07 12:22, David Woodhouse wrote:
>> > On Thu, 2026-05-07 at 11:56 -0300, Mauricio Faria de Oliveira wrote:
>> > > On 2026-05-07 03:58, David Woodhouse wrote:
>> > > > On Wed, 2026-05-06 at 23:36 -0300, Mauricio Faria de Oliveira wrote:
>> > > > > kvm_xen_set_evtchn_fast() calls read_lock_irqsave(), which might block
>> > > > > on PREEMPT_RT, but that is invalid in IRQ context, as when it's called
>> > > > > by xen_timer_callback() (even on PREEMPT_RT per HRTIMER_MODE_ABS_HARD).
>> > > > >
>> > > > > Check for that case, and bail out early.
>> > > > >
>> > > > > Note: there is previous work and discussion on this [1] (~2 years ago),
>> > > > > which involved continuing to execute the function with changes, but it
>> > > > > was not merged. That was a different, more complex approach.
>> > > > >
>> > > > > [1] https://lore.kernel.org/lkml/ZdPQVP7eejq3eFjc@google.com/
>> > > >
>> > > > ...
>> > > >
>> > > > > + /* Bail in IRQ context on PREEMPT_RT; read_lock_irqsave() might block */
>> > > > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_hardirq())
>> > > > > + goto out;
>> > > >
>> > > >
>> > > > The approach in Paul's earlier patch was better; we absolutely *want*
>> > > > to deliver the interrupt to the guest immediately whenever we can, and
>> > > > only fall back to the workqueue in the rare case that the shared info
>> > > > page has been invalidated.
>> > >
>> > > Certainly, that was better. This was a simple workaround, but with this
>> > > clarification, it indeed doesn't fit.
>> > >
>> > > > We should switch to plain read_trylock(), *without* the
>> > > > local_irq_save(). And since this was the *only* case where the GPC lock
>> > > > was ever taken under IRQ¹, all the GPC locking can drop the _irq part.
>> > >
>> > > Ok, I can take a look. Or do you plan to work on it yourself (as you
>> > > hit the issue with read_unlock later in this thread)?
>> >
>> > I'm working on it now; thanks.
>>
>> Ok, thanks; I'll drop this. Could you please Cc me when you send it out?
>>
>> > Currently confused by the fact that the read_trylock() seems to fail
>> > more often than it should under RT, causing the fallback path to be
>> > taken... and the fallback path doesn't seem to work properly...
>> >
>> > Your version should have seen this too, surely? Did the selftest in
>> > linux/tools/testing/selftests/kvm/x86/xen_shinfo_test work with your
>> > patch?
>>
>> Yes, it works, apparently. Timing is similar with/without PREEMPT_RT.
>
> Huh. Something weird going on for me. Would you mind confirming it
> still works with my version:
>
> https://git.infradead.org/?p=users/dwmw2/linux.git;a=shortlog;h=refs/heads/gpc-stealtime
Yes, it also works with that tree [1].
Is PREEMPT_RT enabled in the Xen guest as well? In earlier tests, that
caused interrupt issues on boot (a few 'nobody cared, try irqpoll' and
'Disabled IRQ #'), but that also happened without changes in the Xen
host kernel, thus not a regression, and I disabled it as couldn't look
further then.
[1] HEAD at commit 916875860566 ("locking/rt: Use
raw_spin_lock_irqsave() in __rwbase_read_unlock()")
Hope this helps,
--
Mauricio
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 16:34 ` Mauricio Faria de Oliveira
@ 2026-05-07 17:45 ` David Woodhouse
0 siblings, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2026-05-07 17:45 UTC (permalink / raw)
To: Mauricio Faria de Oliveira
Cc: Paul Durrant, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
[-- Attachment #1: Type: text/plain, Size: 3585 bytes --]
On Thu, 2026-05-07 at 13:34 -0300, Mauricio Faria de Oliveira wrote:
> On 2026-05-07 13:15, David Woodhouse wrote:
> > On Thu, 2026-05-07 at 13:02 -0300, Mauricio Faria de Oliveira wrote:
> > > On 2026-05-07 12:22, David Woodhouse wrote:
> > > > On Thu, 2026-05-07 at 11:56 -0300, Mauricio Faria de Oliveira wrote:
> > > > > On 2026-05-07 03:58, David Woodhouse wrote:
> > > > > > On Wed, 2026-05-06 at 23:36 -0300, Mauricio Faria de Oliveira wrote:
> > > > > > > kvm_xen_set_evtchn_fast() calls read_lock_irqsave(), which might block
> > > > > > > on PREEMPT_RT, but that is invalid in IRQ context, as when it's called
> > > > > > > by xen_timer_callback() (even on PREEMPT_RT per HRTIMER_MODE_ABS_HARD).
> > > > > > >
> > > > > > > Check for that case, and bail out early.
> > > > > > >
> > > > > > > Note: there is previous work and discussion on this [1] (~2 years ago),
> > > > > > > which involved continuing to execute the function with changes, but it
> > > > > > > was not merged. That was a different, more complex approach.
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/lkml/ZdPQVP7eejq3eFjc@google.com/
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > > + /* Bail in IRQ context on PREEMPT_RT; read_lock_irqsave() might block */
> > > > > > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_hardirq())
> > > > > > > + goto out;
> > > > > >
> > > > > >
> > > > > > The approach in Paul's earlier patch was better; we absolutely *want*
> > > > > > to deliver the interrupt to the guest immediately whenever we can, and
> > > > > > only fall back to the workqueue in the rare case that the shared info
> > > > > > page has been invalidated.
> > > > >
> > > > > Certainly, that was better. This was a simple workaround, but with this
> > > > > clarification, it indeed doesn't fit.
> > > > >
> > > > > > We should switch to plain read_trylock(), *without* the
> > > > > > local_irq_save(). And since this was the *only* case where the GPC lock
> > > > > > was ever taken under IRQ¹, all the GPC locking can drop the _irq part.
> > > > >
> > > > > Ok, I can take a look. Or do you plan to work on it yourself (as you
> > > > > hit the issue with read_unlock later in this thread)?
> > > >
> > > > I'm working on it now; thanks.
> > >
> > > Ok, thanks; I'll drop this. Could you please Cc me when you send it out?
> > >
> > > > Currently confused by the fact that the read_trylock() seems to fail
> > > > more often than it should under RT, causing the fallback path to be
> > > > taken... and the fallback path doesn't seem to work properly...
> > > >
> > > > Your version should have seen this too, surely? Did the selftest in
> > > > linux/tools/testing/selftests/kvm/x86/xen_shinfo_test work with your
> > > > patch?
> > >
> > > Yes, it works, apparently. Timing is similar with/without PREEMPT_RT.
> >
> > Huh. Something weird going on for me. Would you mind confirming it
> > still works with my version:
> >
> > https://git.infradead.org/?p=users/dwmw2/linux.git;a=shortlog;h=refs/heads/gpc-stealtime
>
> Yes, it also works with that tree [1].
Thanks.
> Is PREEMPT_RT enabled in the Xen guest as well? In earlier tests, that
> caused interrupt issues on boot (a few 'nobody cared, try irqpoll' and
> 'Disabled IRQ #'), but that also happened without changes in the Xen
> host kernel, thus not a regression, and I disabled it as couldn't look
> further then.
I haven't got as far as actually running guest kernels under
PREEMPT_RT; I'm only using the selftest for now.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast()
2026-05-07 16:02 ` Mauricio Faria de Oliveira
2026-05-07 16:15 ` David Woodhouse
@ 2026-05-08 17:48 ` David Woodhouse
1 sibling, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2026-05-08 17:48 UTC (permalink / raw)
To: Mauricio Faria de Oliveira
Cc: Paul Durrant, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
kernel-dev, kvm, linux-kernel, linux-rt-devel,
syzbot+208f7f3e5f59c11aeb90
[-- Attachment #1: Type: text/plain, Size: 952 bytes --]
On Thu, 2026-05-07 at 13:02 -0300, Mauricio Faria de Oliveira wrote:
>
> Ok, thanks; I'll drop this. Could you please Cc me when you send it out?
>
> > Currently confused by the fact that the read_trylock() seems to fail
> > more often than it should under RT, causing the fallback path to be
> > taken... and the fallback path doesn't seem to work properly...
> >
> > Your version should have seen this too, surely? Did the selftest in
> > linux/tools/testing/selftests/kvm/x86/xen_shinfo_test work with your
> > patch?
>
> Yes, it works, apparently. Timing is similar with/without PREEMPT_RT.
Turns out my issue wasn't about the locking at all; hrtimer delivery
was broken by commit 15dd3a948855 and is fixed by
https://lore.kernel.org/all/20260423155611.216805954@infradead.org/
All that staring at the delivery fallback path did lead me to find an
issue with the latency of the slow path though.
Patch series incoming.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-05-08 17:49 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 2:36 [PATCH] KVM: x86/xen: bail in IRQ context on PREEMPT_RT in kvm_xen_set_evtchn_fast() Mauricio Faria de Oliveira
2026-05-07 6:58 ` David Woodhouse
2026-05-07 7:12 ` Sebastian Andrzej Siewior
2026-05-07 7:30 ` David Woodhouse
2026-05-07 13:00 ` David Woodhouse
2026-05-07 13:21 ` Sebastian Andrzej Siewior
2026-05-07 13:43 ` David Woodhouse
2026-05-07 14:54 ` Sebastian Andrzej Siewior
2026-05-07 14:56 ` Mauricio Faria de Oliveira
2026-05-07 15:22 ` David Woodhouse
2026-05-07 16:02 ` Mauricio Faria de Oliveira
2026-05-07 16:15 ` David Woodhouse
2026-05-07 16:34 ` Mauricio Faria de Oliveira
2026-05-07 17:45 ` David Woodhouse
2026-05-08 17:48 ` David Woodhouse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox