public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: Binbin Wu <binbin.wu@linux.intel.com>,
	"Verma, Vishal L" <vishal.l.verma@intel.com>,
	"peterz@infradead.org" <peterz@infradead.org>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"Wu, Binbin" <binbin.wu@intel.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: CPU Lockups in KVM with deferred hrtimer rearming
Date: Tue, 21 Apr 2026 09:39:14 +0200	[thread overview]
Message-ID: <87eck8daot.ffs@tglx> (raw)
In-Reply-To: <770ae152-c3fd-4068-8462-23064de02238@linux.intel.com>

On Tue, Apr 21 2026 at 12:51, Binbin Wu wrote:
> On 4/20/2026 11:00 PM, Thomas Gleixner wrote:
>>>  static inline void xfer_to_guest_mode_prepare(void)
>>>  {
>>>         lockdep_assert_irqs_disabled();
>>> +       hrtimer_rearm_deferred();
>>>         tick_nohz_user_enter_prepare();
>> 
>> 
>> This code should never be reached with a rearm pending. Something else
>> went wrong earlier. So while the patch "works" it papers over the
>> underlying problem.
>
> IIUC, the problem might be:
>
> HRTimer -> VMExit:
> [IRQ is disabled]
>     kvm_x86_call(handle_exit_irqoff)(vcpu)
>         vmx_handle_exit_irqoff
>             handle_external_interrupt_irqoff
>                 sysvec_apic_timer_interrupt
>                     irqentry_enter
>                     ...
>                     irqentry_exit
>                         irqentry_exit_to_kernel_mode
>                             if (!regs_irqs_disabled(regs)) //<-- This is false, hrtimer 
>                                 hrtimer_rearm_deferred()         rearm is skipped!
>
>
> This issue is triggered on TDX since TDX can't use preemption timer while normal
> VMX VM uses preemption timer by default.

Kinda.

The issue is that vmx_handle_exit_irqoff() always hands in regs with
regs->flags.X86_EFLAGS_IF == 0. That has absolutely nothing to do with
TDX and the preemption timer.

The patch below solves the problem right there in the exit code, which
is unfortunate as there might be a NEED_RESCHED pending. But that can't
be taken into account as KVM enables interrupts _before_ reaching the
exit work point.

Yet another proof that virt creates more problems than it solves.

Thanks,

        tglx
---
Subject: entry: Enforce hrtimer rearming in the irqentry_exit path
From: Thomas Gleixner <tglx@kernel.org>
Date: Tue, 21 Apr 2026 09:00:52 +0200

irqentry_exit_to_kernel_mode_after_preempt() invokes
hrtimer_rearm_deferred() only when the interrupted context had interrupts
enabled. That's a correct decision because the timer interrupt can only be
delivered in interrupt enabled contexts. The interrupt disabled path is
used by exceptions and traps which never touch the hrtimer mechanics.

So much for the theory, but then there is VIRT which ruins everything.

KVM invokes regular interrupts with pt_regs which have interrupts
disabled. That's correct from the KVM point of view, but completely
violates the obviously correct expectations of the interrupt entry/exit
code.

Cure this by adding a hrtimer_rearm_deferred() invocation into the
interrupted context has interrupt disabled path of
irqentry_exit_to_kernel_mode_after_preempt().

That's unfortunate when there is an actual reschedule pending, but it can't
be avoided because KVM invokes a lot of code and also reenables interrupts
_before_ reaching the point where the reschedule condition is handled. That
can delay the rearming significantly, which in turn can cause artificial
latencies.

Fixes: 0e98eb14814e ("entry: Prepare for deferred hrtimer rearming")
Reported-by: "Verma, Vishal L" <vishal.l.verma@intel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Closes: https://lore.kernel.org/70cd3e97fbb796e2eb2ff8cd4b7614ada05a5f24.camel@intel.com
---
 include/linux/irq-entry-common.h |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -516,6 +516,14 @@ irqentry_exit_to_kernel_mode_after_preem
 		instrumentation_end();
 	} else {
 		/*
+		 * This is sadly required due to KVM, which invokes regular
+		 * interrupt handlers with interrupt disabled state in @regs.
+		 */
+		instrumentation_begin();
+		hrtimer_rearm_deferred();
+		instrumentation_end();
+
+		/*
 		 * IRQ flags state is correct already. Just tell RCU if it
 		 * was not watching on entry.
 		 */

  reply	other threads:[~2026-04-21  7:39 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16 20:50 CPU Lockups in KVM with deferred hrtimer rearming Verma, Vishal L
2026-04-20 15:00 ` Thomas Gleixner
2026-04-20 15:22   ` Thomas Gleixner
2026-04-20 20:57   ` Verma, Vishal L
2026-04-20 22:19     ` Thomas Gleixner
2026-04-20 22:24       ` Verma, Vishal L
2026-04-21  6:29         ` Thomas Gleixner
2026-04-21  4:51   ` Binbin Wu
2026-04-21  7:39     ` Thomas Gleixner [this message]
2026-04-21 11:18       ` Peter Zijlstra
2026-04-21 11:32         ` Peter Zijlstra
2026-04-21 11:34           ` Peter Zijlstra
2026-04-21 11:49             ` Peter Zijlstra
2026-04-21 12:05               ` Peter Zijlstra
2026-04-21 13:19                 ` Peter Zijlstra
2026-04-21 13:29                   ` Peter Zijlstra
2026-04-21 16:36                     ` Thomas Gleixner
2026-04-21 18:11                     ` Verma, Vishal L
2026-04-21 17:11               ` Thomas Gleixner
2026-04-21 17:20                 ` Jim Mattson
2026-04-21 18:29                   ` Thomas Gleixner
2026-04-21 18:55                     ` Sean Christopherson
2026-04-21 20:06                       ` Peter Zijlstra
2026-04-21 20:46                         ` Peter Zijlstra
2026-04-21 20:57                         ` Sean Christopherson
2026-04-21 21:02                           ` Peter Zijlstra
2026-04-21 21:42                             ` Sean Christopherson
2026-04-22  6:55                               ` Peter Zijlstra
2026-04-22  7:46                                 ` Peter Zijlstra
2026-04-22 14:08                                   ` Peter Zijlstra
2026-04-22 15:26                                     ` Sean Christopherson
2026-04-22 13:47                                 ` Sean Christopherson
2026-04-21 20:39                       ` Paolo Bonzini
2026-04-21 21:02                         ` Sean Christopherson
2026-04-21 22:48                         ` Thomas Gleixner
2026-04-21 23:15                           ` Paolo Bonzini
2026-04-21 23:34                             ` Jim Mattson
2026-04-21 23:37                               ` Paolo Bonzini
2026-04-22  2:10                             ` Thomas Gleixner
2026-04-21 21:49                       ` Thomas Gleixner
2026-04-21 22:07                         ` Sean Christopherson
2026-04-21 22:24                         ` Paolo Bonzini
2026-04-21 19:18                 ` Verma, Vishal L
2026-04-21 16:30           ` Thomas Gleixner
2026-04-21 16:11       ` Verma, Vishal L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eck8daot.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=binbin.wu@intel.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox