From: Marc Zyngier <maz@kernel.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Christian Loehle <christian.loehle@arm.com>,
Linux PM <linux-pm@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Artem Bityutskiy <artem.bityutskiy@linux.intel.com>,
Aboorva Devarajan <aboorvad@linux.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Mark Rutland <mark.rutland@arm.com>
Subject: Re: [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information
Date: Tue, 05 Aug 2025 17:00:08 +0100 [thread overview]
Message-ID: <86ectpahdj.wl-maz@kernel.org> (raw)
In-Reply-To: <CAJZ5v0g=eSeAp96mHCOm+C9jis3uNRXgPhNgtT0SgP9kZ1emvw@mail.gmail.com>
On Tue, 05 Aug 2025 14:23:56 +0100,
"Rafael J. Wysocki" <rafael@kernel.org> wrote:
>
> On Mon, Aug 4, 2025 at 6:54 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > [+ Thomas, Mark]
> >
> > On Thu, 06 Feb 2025 14:29:05 +0000,
> > "Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:
> > >
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > When giving up on making a high-confidence prediction,
> > > get_typical_interval() always returns UINT_MAX which means that the
> > > next idle interval prediction will be based entirely on the time till
> > > the next timer. However, the information represented by the most
> > > recent intervals may not be completely useless in those cases.
> > >
> > > Namely, the largest recent idle interval is an upper bound on the
> > > recently observed idle duration, so it is reasonable to assume that
> > > the next idle duration is unlikely to exceed it. Moreover, this is
> > > still true after eliminating the suspected outliers if the sample
> > > set still under consideration is at least as large as 50% of the
> > > maximum sample set size.
> > >
> > > Accordingly, make get_typical_interval() return the current maximum
> > > recent interval value in that case instead of UINT_MAX.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > > drivers/cpuidle/governors/menu.c | 13 ++++++++++++-
> > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > >
> > > --- a/drivers/cpuidle/governors/menu.c
> > > +++ b/drivers/cpuidle/governors/menu.c
> > > @@ -190,8 +190,19 @@
> > > * This can deal with workloads that have long pauses interspersed
> > > * with sporadic activity with a bunch of short pauses.
> > > */
> > > - if ((divisor * 4) <= INTERVALS * 3)
> > > + if (divisor * 4 <= INTERVALS * 3) {
> > > + /*
> > > + * If there are sufficiently many data points still under
> > > + * consideration after the outliers have been eliminated,
> > > + * returning without a prediction would be a mistake because it
> > > + * is likely that the next interval will not exceed the current
> > > + * maximum, so return the latter in that case.
> > > + */
> > > + if (divisor >= INTERVALS / 2)
> > > + return max;
> > > +
> > > return UINT_MAX;
> > > + }
> > >
> > > /* Update the thresholds for the next round. */
> > > if (avg - min > max - avg)
> >
> > It appears that this patch, which made it in 6.15, results in *a lot*
> > of extra interrupts on one of my arm64 test machines.
> >
> > * Without this patch:
> >
> > maz@big-leg-emma:~$ vmstat -y 1
> > procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
> > r b swpd free buff cache si so bi bo in cs us sy id wa st
> > 1 0 0 65370828 29244 106088 0 0 0 0 66 26 0 0 100 0 0
> > 1 0 0 65370828 29244 106088 0 0 0 0 103 66 0 0 100 0 0
> > 1 0 0 65370828 29244 106088 0 0 0 0 34 12 0 0 100 0 0
> > 1 0 0 65370828 29244 106088 0 0 0 0 25 12 0 0 100 0 0
> > 1 0 0 65370828 29244 106088 0 0 0 0 28 14 0 0 100 0 0
> >
> > we're idling at only a few interrupts per second, which isn't bad for
> > a 24 CPU toy.
> >
> > * With this patch:
> >
> > maz@big-leg-emma:~$ vmstat -y 1
> > procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
> > r b swpd free buff cache si so bi bo in cs us sy id wa st
> > 1 0 0 65361024 28420 105388 0 0 0 0 3710 27 0 0 100 0 0
> > 1 0 0 65361024 28420 105388 0 0 0 0 3399 20 0 0 100 0 0
> > 1 0 0 65361024 28420 105388 0 0 0 0 4439 78 0 0 100 0 0
> > 1 0 0 65361024 28420 105388 0 0 0 0 5634 14 0 0 100 0 0
> > 1 0 0 65361024 28420 105388 0 0 0 0 5575 14 0 0 100 0 0
> >
> > we're idling at anywhere between 3k and 6k interrupts per second. Not
> > exactly what you want. This appears to be caused by the broadcast
> > timer IPI.
> >
> > Reverting this patch on top of 6.16 restores sanity on this machine.
>
> I don't know what is going on here, but it looks highly suspicious to me.
What does? My observation? The likelihood of this patch being the
source (or the trigger) for an unwanted behaviour? Something else?
> The only effect of the change in question should be selecting a
> shallower idle state occasionally and why would this alone cause the
> number of wakeup interrupts to increase?
You tell me. I'm the messenger here.
> Arguably, it might interfere with the tick stopping logic if
> predicted_ns happened to be less than TICK_NSEC sufficiently often,
> but that is not expected to happen on an idle system because in that
> case the average interval between genuine wakeups is relatively large.
> The tick itself is not counted as a wakeup event, so returning a
> shallower state at one point shouldn't affect future predictions, but
> the data above suggests that it actually does affect them.
>
> It looks like selecting a shallower idle state by the governor at one
> point causes more wakeup interrupts to occur in the future which is
> really note expected to happen.
>
> Christian, what do you think?
>
> > I suspect that we're entering some deep idle state in a much more
> > aggressive way,
>
> The change actually goes the other way around. It causes shallower
> idle states to be more likely to be selected overall.
Another proof that I don't understand a thing, and that I should go
play music instead of worrying about kernel issues.
>
> > leading to a global timer firing as a wake-up mechanism,
>
> What timer and why would it fire?
The arch_timer_mem timer, which is used as a backup timer when the
CPUs lose their timer context while going into a deep enough idle
state.
>
> > and the broadcast IPI being used to kick everybody else
> > back. This is further confirmed by seeing the broadcast IPI almost
> > disappearing completely if I load the system a bit.
> >
> > Daniel, you should be able to reproduce this on a Synquacer box (this
> > what I used here).
> >
> > I'm happy to test things that could help restore some sanity.
>
> Before anything can be tested, I need to understand what exactly is going on.
>
> What cpuidle driver is used on this platform?
psci_idle.
> Any chance to try the teo governor on it to see if this problem can
> also be observed?
Neither ladder nor teo have this issue. The number of broadcast timer
IPIs is minimal, and so is the number of interrupts delivered from the
backup timer. Only menu exhibits the IPI-hose behaviour on this box
(and only this one).
> Please send the output of
>
> $ grep -r '.*' /sys/devices/system/cpu/cpu*/cpuidle
>
> collected after a period of idleness from the kernel in which the
> change in question is present and from a kernel without it?
* with the change present: https://pastebin.com/Cb45Rysy
* with the change reverted: https://pastebin.com/qRy2xzeT
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2025-08-05 16:00 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-06 14:21 [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals Rafael J. Wysocki
2025-02-06 14:22 ` [RFT][PATCH v1 1/5] cpuidle: menu: Drop a redundant local variable Rafael J. Wysocki
2025-02-06 14:55 ` Christian Loehle
2025-02-06 14:24 ` [RFT][PATCH v1 2/5] cpuidle: menu: Use one loop for average and variance computations Rafael J. Wysocki
2025-02-17 13:03 ` Christian Loehle
2025-02-06 14:25 ` [RFT][PATCH v1 3/5] cpuidle: menu: Tweak threshold use in get_typical_interval() Rafael J. Wysocki
2025-02-17 13:08 ` Christian Loehle
2025-02-06 14:26 ` [RFT][PATCH v1 4/5] cpuidle: menu: Eliminate outliers on both ends of the sample set Rafael J. Wysocki
2025-02-17 13:26 ` Christian Loehle
2025-02-06 14:29 ` [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information Rafael J. Wysocki
2025-02-17 13:39 ` Christian Loehle
2025-02-17 13:47 ` Rafael J. Wysocki
2025-08-04 16:54 ` Marc Zyngier
2025-08-05 13:23 ` Rafael J. Wysocki
2025-08-05 14:41 ` Christian Loehle
2025-08-05 16:00 ` Marc Zyngier [this message]
2025-08-05 18:50 ` Rafael J. Wysocki
2025-08-06 7:19 ` Marc Zyngier
2025-08-06 12:48 ` Christian Loehle
2025-02-07 14:48 ` [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals Artem Bityutskiy
2025-02-07 15:24 ` Christian Loehle
2025-02-07 15:35 ` Rafael J. Wysocki
2025-02-07 15:45 ` Rafael J. Wysocki
2025-03-12 21:38 ` Doug Smythies
2025-02-10 14:15 ` Christian Loehle
2025-02-10 14:43 ` Rafael J. Wysocki
2025-02-10 14:47 ` Christian Loehle
2025-02-18 21:17 ` Christian Loehle
2025-02-19 12:06 ` Rafael J. Wysocki
2025-02-14 4:30 ` Doug Smythies
2025-02-14 22:10 ` Rafael J. Wysocki
2025-02-16 16:16 ` Doug Smythies
2025-02-24 6:27 ` Aboorva Devarajan
2025-02-24 6:38 ` Aboorva Devarajan
2025-02-24 12:35 ` Rafael J. Wysocki
2025-02-26 4:49 ` Aboorva Devarajan
2025-02-26 10:54 ` Christian Loehle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86ectpahdj.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=aboorvad@linux.ibm.com \
--cc=artem.bityutskiy@linux.intel.com \
--cc=christian.loehle@arm.com \
--cc=daniel.lezcano@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=rafael@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).