All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>,
	Jim Mattson <jmattson@google.com>,
	 Binbin Wu <binbin.wu@linux.intel.com>,
	Vishal L Verma <vishal.l.verma@intel.com>,
	 "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Rick P Edgecombe <rick.p.edgecombe@intel.com>,
	 Binbin Wu <binbin.wu@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	Paolo Bonzini <bonzini@redhat.com>
Subject: Re: CPU Lockups in KVM with deferred hrtimer rearming
Date: Wed, 22 Apr 2026 08:26:50 -0700	[thread overview]
Message-ID: <aejouuIqsyOuICSj@google.com> (raw)
In-Reply-To: <20260422140831.GR3102624@noisy.programming.kicks-ass.net>

On Wed, Apr 22, 2026, Peter Zijlstra wrote:
> On Wed, Apr 22, 2026 at 09:46:46AM +0200, Peter Zijlstra wrote:
> > +		instrumentation_begin();
> > +		/*
> > +		 * KVM/VMX will dispatch from IRQ-disabled but for a context
> > +		 * that will have IRQs-enabled. This confuses the entry code
> > +		 * and it will not have reprogrammed the timer (or do
> > +		 * preemption). Minimal fixup for now.
> > +		 */
> > +		hrtimer_rearm_deferred();
> > +		instrumentation_end();
> 
> So I've been looking at this preemption thing. After having gotten my
> head in a twist a few times around, I think this is done by
> vcpu_enter_guest() like:
> 
> 	preempt_disable();
> 	local_irq_disable();
> 	...
> 	kvm_x86_call(handle_exit_irqoff)(vcpu); <--- all of this VMX nonsense

To be fair, there's some SVM nonsense too. 

> 	...
> 	local_irq_enable();
> 				<--- WTF goes here :-)

All IRQs on AMD, and tick IRQs on VMX that arrive _just_ after the VM-Exit.

Because IRQ exits on AMD/SVM are purely a notification, KVM needs to enable IRQs
in order to service the exit.  The early enabling exists to get the timeslice
accounting correct.  With the comments...

	/*
	 * Consume any pending interrupts, including the possible source of
	 * VM-Exit on SVM and any ticks that occur between VM-Exit and now.
	 * An instruction is required after local_irq_enable() to fully unblock
	 * interrupts on processors that implement an interrupt shadow, the
	 * stat.exits increment will do nicely.
	 */
	kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
	local_irq_enable();
	++vcpu->stat.exits;
	local_irq_disable();
	kvm_after_interrupt(vcpu);

	/*
	 * Wait until after servicing IRQs to account guest time so that any
	 * ticks that occurred while running the guest are properly accounted
	 * to the guest.  Waiting until IRQs are enabled degrades the accuracy
	 * of accounting via context tracking, but the loss of accuracy is
	 * acceptable for all known use cases.
	 */
	guest_timing_exit_irqoff();

When NOT precisely accounting virtual CPU usage, accounting is done by setting
PF_VCPU on current->flags on the way into the guest, and then clearing it on the
way back out.  The latter is done by guest_timing_exit_irqoff().  If KVM waits
to enable IRQs until after guest_timing_exit_irqoff(), then the tick IRQ handler
will account the tick to the host, not that guest, even if 99.99999% of the time
was spent in the guest.

As to why this is in common code, i.e. isn't AMD/SVM specific, if the guest and
host are running with the same tick frequency, it's suprisingly easy to get in
a state where the host tick IRQ almost always arrives just after VM-Exit, before
KVM fully enables IRQs.  Specifically, the guest's programmed tick will trigger
a VMX Preemption Timer VM-Exit at the same frequency the host's tick triggers an
IRQ.

> 	++vcpu->stat.exits;
> 	local_irq_disable();
> 	...
> 	local_irq_enable();
> 	preempt_enable(); <--- here we finally preempt
> 
> This earlier IRQ-enable makes my head hurt and I had to go buy a new
> WTF'o'meter (again!).

  reply	other threads:[~2026-04-22 15:26 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16 20:50 CPU Lockups in KVM with deferred hrtimer rearming Verma, Vishal L
2026-04-20 15:00 ` Thomas Gleixner
2026-04-20 15:22   ` Thomas Gleixner
2026-04-20 20:57   ` Verma, Vishal L
2026-04-20 22:19     ` Thomas Gleixner
2026-04-20 22:24       ` Verma, Vishal L
2026-04-21  6:29         ` Thomas Gleixner
2026-04-21  4:51   ` Binbin Wu
2026-04-21  7:39     ` Thomas Gleixner
2026-04-21 11:18       ` Peter Zijlstra
2026-04-21 11:32         ` Peter Zijlstra
2026-04-21 11:34           ` Peter Zijlstra
2026-04-21 11:49             ` Peter Zijlstra
2026-04-21 12:05               ` Peter Zijlstra
2026-04-21 13:19                 ` Peter Zijlstra
2026-04-21 13:29                   ` Peter Zijlstra
2026-04-21 16:36                     ` Thomas Gleixner
2026-04-21 18:11                     ` Verma, Vishal L
2026-04-21 17:11               ` Thomas Gleixner
2026-04-21 17:20                 ` Jim Mattson
2026-04-21 18:29                   ` Thomas Gleixner
2026-04-21 18:55                     ` Sean Christopherson
2026-04-21 20:06                       ` Peter Zijlstra
2026-04-21 20:46                         ` Peter Zijlstra
2026-04-21 20:57                         ` Sean Christopherson
2026-04-21 21:02                           ` Peter Zijlstra
2026-04-21 21:42                             ` Sean Christopherson
2026-04-22  6:55                               ` Peter Zijlstra
2026-04-22  7:46                                 ` Peter Zijlstra
2026-04-22 14:08                                   ` Peter Zijlstra
2026-04-22 15:26                                     ` Sean Christopherson [this message]
2026-04-22 19:13                                   ` Verma, Vishal L
2026-04-22 22:57                                   ` Thomas Gleixner
2026-04-23 15:23                                     ` Peter Zijlstra
2026-04-22 13:47                                 ` Sean Christopherson
2026-04-21 20:39                       ` Paolo Bonzini
2026-04-21 21:02                         ` Sean Christopherson
2026-04-21 22:48                         ` Thomas Gleixner
2026-04-21 23:15                           ` Paolo Bonzini
2026-04-21 23:34                             ` Jim Mattson
2026-04-21 23:37                               ` Paolo Bonzini
2026-04-22  2:10                             ` Thomas Gleixner
2026-04-21 21:49                       ` Thomas Gleixner
2026-04-21 22:07                         ` Sean Christopherson
2026-04-21 22:24                         ` Paolo Bonzini
2026-04-21 19:18                 ` Verma, Vishal L
2026-04-21 16:30           ` Thomas Gleixner
2026-04-21 16:11       ` Verma, Vishal L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aejouuIqsyOuICSj@google.com \
    --to=seanjc@google.com \
    --cc=binbin.wu@intel.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=bonzini@redhat.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=tglx@kernel.org \
    --cc=vishal.l.verma@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.