From: Sean Christopherson <seanjc@google.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Paolo Bonzini <pbonzini@redhat.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Wanpeng Li <wanpengli@tencent.com>
Subject: Re: [PATCH v2 0/9] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM
Date: Thu, 2 Apr 2026 16:43:13 -0700 [thread overview]
Message-ID: <ac7_EcgDhs5s4s0P@google.com> (raw)
In-Reply-To: <CANRm+Cy4MxrtvufkCeefFZm2ycT5WjUmHLuiG1xPthbf58VDjQ@mail.gmail.com>
On Wed, Apr 01, 2026, Wanpeng Li wrote:
> Hi Sean,
> On Fri, 13 Mar 2026 at 09:13, Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Dec 19, 2025, Wanpeng Li wrote:
> > > Part 2: KVM IPI-Aware Directed Yield (patches 6-9)
> > >
> > > Enhance kvm_vcpu_on_spin() with lightweight IPI tracking to improve
> > > directed yield candidate selection. Track sender/receiver relationships
> > > when IPIs are delivered and use this information to prioritize yield
> > > targets.
> > >
> > > The tracking mechanism:
> > >
> > > - Hooks into kvm_irq_delivery_to_apic() to detect unicast fixed IPIs (the
> > > common case for inter-processor synchronization). When exactly one
> > > destination vCPU receives an IPI, record the sender->receiver relationship
> > > with a monotonic timestamp.
> > >
> > > In high VM density scenarios, software-based IPI tracking through
> > > interrupt delivery interception becomes particularly valuable. It
> > > captures precise sender/receiver relationships that can be leveraged
> > > for intelligent scheduling decisions, providing performance benefits
> > > that complement or even exceed hardware-accelerated interrupt delivery
> > > in overcommitted environments.
> > >
> > > - Uses lockless READ_ONCE/WRITE_ONCE accessors for minimal overhead. The
> > > per-vCPU ipi_context structure is carefully designed to avoid cache line
> > > bouncing.
> > >
> > > - Implements a short recency window (50ms default) to avoid stale IPI
> > > information inflating boost priority on throughput-sensitive workloads.
> > > Old IPI relationships are naturally aged out.
> > >
> > > - Clears IPI context on EOI with two-stage precision: unconditionally clear
> > > the receiver's context (it processed the interrupt), but only clear the
> > > sender's pending flag if the receiver matches and the IPI is recent. This
> > > prevents unrelated EOIs from prematurely clearing valid IPI state.
> >
> > That all relies on lack of IPI and EOI virtualization, which seems very
> > counter-productive given the way hardware is headed.
>
> I think there is an important distinction here. APICv / posted
> interrupts accelerate IPI *delivery*, but they do not help with the
> host-side *scheduling decision* in kvm_vcpu_on_spin().
I know, but that doesn't change the reality of where hardware is headed (or rather,
already is).
> A posted interrupt can land in a not-yet-scheduled vCPU's PIR, but that vCPU
> still won't process it until it actually gets CPU time. IPI tracking targets
> exactly this gap: which vCPU should we yield to right now.
>
> In high VM density / overcommitted scenarios, APICv's advantage narrows
> precisely because the bottleneck shifts from IPI delivery latency to
> *scheduling latency* — the target vCPU may have its posted interrupt sitting
> in PIR but cannot process it because it is competing for physical CPU time
> with many other vCPUs. In that regime, making a better yield-to decision on
> the host side has a more direct impact on end-to-end IPI response time than
> faster hardware delivery to a vCPU that isn't running.
>
> So I would not characterize IPI tracking as a workaround for lack of hardware
> virtualization support. It addresses an orthogonal problem — host-side
> scheduling decisions — that hardware IPI acceleration does not solve. The two
> are complementary: APICv makes delivery fast when the target is running;
> IPI-aware directed yield makes scheduling better when the target is not
> running.
Except they aren't complementary in the sense that, as implemented, they are
mutually exclusive. The x86 changes here rely on tracking IPIs, and unless I'm
missing something in the series, that code falls apart when IPI virtualization
is enabled.
next prev parent reply other threads:[~2026-04-02 23:43 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-19 3:53 [PATCH v2 0/9] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM Wanpeng Li
2025-12-19 3:53 ` [PATCH v2 1/9] sched: Add vCPU debooster infrastructure Wanpeng Li
2025-12-19 3:53 ` [PATCH v2 2/9] sched/fair: Add rate-limiting and validation helpers Wanpeng Li
2025-12-22 21:12 ` kernel test robot
2026-01-04 4:09 ` Hillf Danton
2025-12-19 3:53 ` [PATCH v2 3/9] sched/fair: Add cgroup LCA finder for hierarchical yield Wanpeng Li
2025-12-19 3:53 ` [PATCH v2 4/9] sched/fair: Add penalty calculation and application logic Wanpeng Li
2025-12-22 23:36 ` kernel test robot
2025-12-19 3:53 ` [PATCH v2 5/9] sched/fair: Wire up yield deboost in yield_to_task_fair() Wanpeng Li
2025-12-22 7:06 ` kernel test robot
2025-12-22 9:31 ` kernel test robot
2025-12-19 3:53 ` [PATCH v2 6/9] KVM: x86: Add IPI tracking infrastructure Wanpeng Li
2025-12-19 3:53 ` [PATCH v2 7/9] KVM: x86/lapic: Integrate IPI tracking with interrupt delivery Wanpeng Li
2025-12-19 3:53 ` [PATCH v2 8/9] KVM: Implement IPI-aware directed yield candidate selection Wanpeng Li
2025-12-19 3:53 ` [PATCH v2 9/9] KVM: Relaxed boost as safety net Wanpeng Li
2026-01-04 2:40 ` [PATCH v2 0/9] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM Wanpeng Li
2026-01-05 6:26 ` K Prateek Nayak
2026-03-13 1:13 ` Sean Christopherson
2026-04-01 9:48 ` Wanpeng Li
2026-04-02 23:43 ` Sean Christopherson [this message]
2026-03-26 14:41 ` Christian Borntraeger
2026-04-01 9:34 ` Wanpeng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ac7_EcgDhs5s4s0P@google.com \
--to=seanjc@google.com \
--cc=borntraeger@linux.ibm.com \
--cc=juri.lelli@redhat.com \
--cc=kernellwp@gmail.com \
--cc=kprateek.nayak@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox