From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Xen-devel <xen-devel@lists.xenproject.org>,
Jan Beulich <JBeulich@suse.com>,
Anthony PERARD <anthony.perard@vates.tech>,
Michal Orzel <michal.orzel@amd.com>,
Julien Grall <julien@xen.org>,
Stefano Stabellini <sstabellini@kernel.org>
Subject: Re: [PATCH 4/6] x86/idle: Implement a new MWAIT IPI-elision algorithm
Date: Fri, 4 Jul 2025 09:24:06 +0200 [thread overview]
Message-ID: <aGeBlmi7-KX8gF7X@macbook.local> (raw)
In-Reply-To: <7d51879c-87c8-4e36-a7d7-66ba02ef2886@citrix.com>
On Thu, Jul 03, 2025 at 06:48:23PM +0100, Andrew Cooper wrote:
> On 03/07/2025 5:36 pm, Roger Pau Monné wrote:
> > On Wed, Jul 02, 2025 at 03:41:19PM +0100, Andrew Cooper wrote:
> >> In order elide IPIs, we must be able to identify whether a target CPU is in
> >> MWAIT at the point it is woken up. i.e. the store to wake it up must also
> >> identify the state.
> >>
> >> Create a new in_mwait variable beside __softirq_pending, so we can use a
> >> CMPXCHG to set the softirq while also observing the status safely. Implement
> >> an x86 version of arch_pend_softirq() which does this.
> >>
> >> In mwait_idle_with_hints(), advertise in_mwait, with an explanation of
> >> precisely what it means. X86_BUG_MONITOR can be accounted for simply by not
> >> advertising in_mwait.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> ---
> >> CC: Jan Beulich <JBeulich@suse.com>
> >> CC: Roger Pau Monné <roger.pau@citrix.com>
> >> CC: Anthony PERARD <anthony.perard@vates.tech>
> >> CC: Michal Orzel <michal.orzel@amd.com>
> >> CC: Julien Grall <julien@xen.org>
> >> CC: Stefano Stabellini <sstabellini@kernel.org>
> >>
> >> This is modelled after Linux's TIF_NEED_RESCHED (single bit equivelent of all
> >> of __softirq_pending), and TIF_POLLING_NRFLAG (arch-neutral "in_mwait").
> >>
> >> In Linux, they're both in the same flags field, which adds complexity. In
> >> Xen, __softirq_pending is already unsigned long for everything other than x86,
> >> so adding an arch-neutral "in_mwait" would need wider changes.
> >> ---
> >> xen/arch/x86/acpi/cpu_idle.c | 20 +++++++++++++++++-
> >> xen/arch/x86/include/asm/hardirq.h | 14 +++++++++++-
> >> xen/arch/x86/include/asm/softirq.h | 34 ++++++++++++++++++++++++++++++
> >> 3 files changed, 66 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
> >> index caa0fef0b3b1..13040ef467a0 100644
> >> --- a/xen/arch/x86/acpi/cpu_idle.c
> >> +++ b/xen/arch/x86/acpi/cpu_idle.c
> >> @@ -439,7 +439,21 @@ __initcall(cpu_idle_key_init);
> >> void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
> >> {
> >> unsigned int cpu = smp_processor_id();
> >> - const unsigned int *this_softirq_pending = &softirq_pending(cpu);
> >> + irq_cpustat_t *stat = &irq_stat[cpu];
> >> + const unsigned int *this_softirq_pending = &stat->__softirq_pending;
> >> +
> >> + /*
> >> + * By setting in_mwait, we promise to other CPUs that we'll notice changes
> >> + * to __softirq_pending without being sent an IPI. We achieve this by
> >> + * either not going to sleep, or by having hardware notice on our behalf.
> >> + *
> >> + * Some errata exist where MONITOR doesn't work properly, and the
> >> + * workaround is to force the use of an IPI. Cause this to happen by
> >> + * simply not advertising outselves as being in_mwait.
> >> + */
> >> + alternative_io("movb $1, %[in_mwait]",
> >> + "", X86_BUG_MONITOR,
> >> + [in_mwait] "=m" (stat->in_mwait));
> >>
> >> monitor(this_softirq_pending, 0, 0);
> >> smp_mb();
> >> @@ -452,6 +466,10 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
> >> mwait(eax, ecx);
> >> spec_ctrl_exit_idle(info);
> >> }
> >> +
> >> + alternative_io("movb $0, %[in_mwait]",
> >> + "", X86_BUG_MONITOR,
> >> + [in_mwait] "=m" (stat->in_mwait));
> > Isn't it a bit overkill to use alternatives here when you could have a
> > conditional based on a global variable to decide whether to set/clear
> > in_mwait?
>
> I view it differently. Why should the common case suffer overhead
> (extra memory read and conditional branch) to work around 3 buggy pieces
> of hardware in a common path?
TBH I don't think the extra read and conditional would make any
difference in this specific path performance wise. Either the CPU is
going to sleep and has nothing to do, or the cost of getting back from
idle will shadow the overhead of an extra read and conditional.
It's all a question of taste I guess. If it was me I would set/clear
stat->in_mwait unconditionally in mwait_idle_with_hints(), and then in
arch_pend_softirq() I would return:
return new & (1UL << 32) || force_mwait_ipi_wakeup;
Or AFAICT you could possibly skip the cmpxchg() in the
force_mwait_ipi_wakeup case in arch_pend_softirq(), and just do:
if ( force_mwait_ipi_wakeup )
return test_and_set_bit(nr, &softirq_pending(cpu));
Because in that case Xen doesn't care about the in_mwait status. It
would be an optimization for the buggy hardware that already has to
deal with the extra cost of always sending an IPI.
> >> + };
> >> + uint64_t softirq_mwait_raw;
> >> + };
> > This could be a named union type ...
> >
> >> +
> >> unsigned int __local_irq_count;
> >> unsigned int nmi_count;
> >> unsigned int mce_count;
> >> diff --git a/xen/arch/x86/include/asm/softirq.h b/xen/arch/x86/include/asm/softirq.h
> >> index e4b194f069fb..069e5716a68d 100644
> >> --- a/xen/arch/x86/include/asm/softirq.h
> >> +++ b/xen/arch/x86/include/asm/softirq.h
> >> @@ -1,6 +1,8 @@
> >> #ifndef __ASM_SOFTIRQ_H__
> >> #define __ASM_SOFTIRQ_H__
> >>
> >> +#include <asm/system.h>
> >> +
> >> #define NMI_SOFTIRQ (NR_COMMON_SOFTIRQS + 0)
> >> #define TIME_CALIBRATE_SOFTIRQ (NR_COMMON_SOFTIRQS + 1)
> >> #define VCPU_KICK_SOFTIRQ (NR_COMMON_SOFTIRQS + 2)
> >> @@ -9,4 +11,36 @@
> >> #define HVM_DPCI_SOFTIRQ (NR_COMMON_SOFTIRQS + 4)
> >> #define NR_ARCH_SOFTIRQS 5
> >>
> >> +/*
> >> + * Ensure softirq @nr is pending on @cpu. Return true if an IPI can be
> >> + * skipped, false if the IPI cannot be skipped.
> >> + *
> >> + * We use a CMPXCHG covering both __softirq_pending and in_mwait, in order to
> >> + * set softirq @nr while also observing in_mwait in a race-free way.
> >> + */
> >> +static always_inline bool arch_pend_softirq(unsigned int nr, unsigned int cpu)
> >> +{
> >> + uint64_t *ptr = &irq_stat[cpu].softirq_mwait_raw;
> >> + uint64_t old, new;
> > ... so that you also use it here?
>
> No, it cant. The of softirq_pending() in common code requires no
> intermediate field names, and I'm not untangling that mess in a series
> wanting backporting.
Oh, I see. Never mind then, something for a later cleanup if
anything.
Thanks, Roger.
next prev parent reply other threads:[~2025-07-04 7:24 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-02 14:41 [PATCH 0/6] x86/idle: Multiple MWAIT fixes Andrew Cooper
2025-07-02 14:41 ` [PATCH 1/6] x86/idle: Remove broken MWAIT implementation Andrew Cooper
2025-07-03 16:01 ` Roger Pau Monné
2025-07-03 16:19 ` Andrew Cooper
2025-07-02 14:41 ` [PATCH 2/6] x86/idle: Convert force_mwait_ipi_wakeup to X86_BUG_MONITOR Andrew Cooper
2025-07-02 14:41 ` [PATCH 3/6] xen/softirq: Rework arch_skip_send_event_check() into arch_pend_softirq() Andrew Cooper
2025-07-03 8:11 ` Jan Beulich
2025-07-03 10:36 ` Andrew Cooper
2025-07-03 11:24 ` Jan Beulich
2025-07-03 16:21 ` Roger Pau Monné
2025-07-04 7:23 ` Jan Beulich
2025-07-04 7:55 ` Roger Pau Monné
2025-07-04 8:25 ` Jan Beulich
2025-07-04 16:00 ` Andrew Cooper
2025-07-02 14:41 ` [PATCH 4/6] x86/idle: Implement a new MWAIT IPI-elision algorithm Andrew Cooper
2025-07-03 9:01 ` Jan Beulich
2025-07-03 11:59 ` Andrew Cooper
2025-07-03 14:07 ` Jan Beulich
2025-07-03 17:29 ` Andrew Cooper
2025-07-03 16:36 ` Roger Pau Monné
2025-07-03 17:48 ` Andrew Cooper
2025-07-04 7:24 ` Roger Pau Monné [this message]
2025-07-04 16:13 ` Andrew Cooper
2025-07-04 7:52 ` Roger Pau Monné
2025-07-04 16:14 ` Andrew Cooper
2025-07-02 14:41 ` [PATCH 5/6] x86/idle: Drop incorrect smp_mb() in mwait_idle_with_hints() Andrew Cooper
2025-07-03 9:24 ` Jan Beulich
2025-07-03 12:37 ` Andrew Cooper
2025-07-03 13:30 ` Jan Beulich
2025-07-04 15:45 ` Andrew Cooper
2025-07-02 14:41 ` [PATCH 6/6] x86/idle: Fix buggy "x86/mwait-idle: enable interrupts before C1 on Xeons" Andrew Cooper
2025-07-03 9:35 ` Jan Beulich
2025-07-03 9:43 ` Jan Beulich
2025-07-03 12:10 ` Andrew Cooper
2025-07-03 13:11 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGeBlmi7-KX8gF7X@macbook.local \
--to=roger.pau@citrix.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=anthony.perard@vates.tech \
--cc=julien@xen.org \
--cc=michal.orzel@amd.com \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.