Re: [PATCH v2 0/5] s390/idle: CPU idle driver

Linux Power Management development
 help / color / mirror / Atom feed

From: Mete Durlu <meted@linux.ibm.com>
To: Christian Loehle <christian.loehle@arm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	Frederic Weisbecker <frederic@kernel.org>,
	Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@kernel.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Daniel Lezcano <daniel.lezcano@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>,
	Ilya Leoshkevich <iii@linux.ibm.com>,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pm@vger.kernel.org
Subject: Re: [PATCH v2 0/5] s390/idle: CPU idle driver
Date: Wed, 17 Jun 2026 22:08:07 +0200	[thread overview]
Message-ID: <8d4e52c4-081f-4e2a-a940-cb76d578324b@linux.ibm.com> (raw)
In-Reply-To: <445a6ab5-3574-4493-9187-701ee7ea7c28@arm.com>

On 6/17/26 9:15 AM, Christian Loehle wrote:
> On 6/10/26 21:23, Mete Durlu wrote:
>> On 6/9/26 5:47 PM, Christian Loehle wrote:
>>> On 6/9/26 14:24, Mete Durlu wrote:
>>>> v1 -> v2:
>>>>
>>>> * Add idle driver enteries to MAINTAINERS file (Christian Loehle)
>>>> * Remove extra line break left in drivers/cpuidle/Kconfig.s390
>>>>     (Christian Loehle)
>>>>
>>>> This patch series introduces a CPU idle driver for s390
>>>> architecture that leverages the existing cpu idle infrastructure and
>>>> TEO (Timer Events Oriented) governor to optimize idle state selection
>>>> based on timer events and interrupt patterns.
>>>

...

>>
>>>> - Configuration
>>>> -----------------------------------------------------------------------
>>>>
>>>> Idle state parameters are tuned per hypervisor type after benchmarks:
>>>>
>>>> **LPAR:**
>>>> - Polling: 5us target residency, 0us exit latency
>>>> - EW: 5us target residency, 5us exit latency
>>>>
>>>> **KVM/z/VM:**
>>>> - Polling: 1us target residency, 0us exit latency
>>>> - EW: 1us target residency, 1us exit latency
>>>>
>>> I think this would also be useful in cpuidle-s390.c in particular the
>>> different residency+latency values for LPAR and KVM/z/VM and what they aim
>>> to achieve for you.
>>
>> We can put down a comment like below;
>>
>> /*
>>   * After various benchmark runs the tuneables for idle driver has shown
>>   * the best performance with the following values;
>>   * for LPAR:
>>   * - Polling: 5us target residency, 0us exit latency
>>   * - EW: 5us target residency, 5us exit latency
>>   *
>>   * for KVM/z/VM:
>>   * - Polling: 1us target residency, 0us exit latency
>>   * - EW: 1us target residency, 1us exit latency
>>   */
>>
>> Is that what you are looking for or something more extensive to cover
>> what sort of behavior it causes and why it benefits the performance?
>> I wouldn't really like to put down lengthy comments here to be honest.
> 
> Why not?
> At least to me the values are hard to follow and without knowing what they
> represent.
> For example is this actually the worst case exit latency for these systems?
> Also from a cpuidle perspective residency == latency isn't outright wrong
> but certainly always a little suspicious.

Since s390 is running virtualized machines, the values are not
deterministic. The values picked here are the best estimates backed by
testing. That is why I didn't want to put anything down initially but
of course the comment can also describe that. I'll add that in the
next version.

>>
>>> Additionally polling is initialised to 0/0 by poll_state.c, so I don't know
>>> where you're taking these values from?
>>
>> Having a look at the implementation of poll_idle() in poll_state.c.
>> The polling time limit (target_residency) is acquired from
>> cpuidle_poll_time(), which tries to find an enabled state deeper than
>> polling state and returns its target_residency. Since we only have two
>> states, it automatically means EW state's target_residency, or
>> IDLE_POLL_MAX if it is disabled.
> 
> I agree that cpuidle_poll_time() can derive the polling time limit from a
> deeper enabled state, but it only does that for states with
> target_residency_ns >= CPUIDLE_POLL_MIN (10us). With the proposed s390 EW
> values of 1us/5us, that state is skipped and the poll limit falls
> back to CPUIDLE_POLL_MAX (TICK_NSEC / 16).

Ah missed that CPUIDLE_POLL_MIN is 10us and had it wrong as less or
equal to 1us. For 1000Hz CPUIDLE_POLL_MAX calculates to 6250ns
which is close enough for LPAR (5us) but for KVM and ZVM case (1us)
it is a bit off.

Although, the benchmarks didn't reflect a problem. To verify I did
a quick test run with the following applied on a KVM and a ZVM guest;

--- a/drivers/cpuidle/cpuidle-s390.c
+++ b/drivers/cpuidle/cpuidle-s390.c
@@ -72,6 +72,20 @@ static int s390_cpuidle_cpu_dead(unsigned int cpu)
         return 0;
  }

+static void __init s390_cpuidle_poll_tune(void)
+{
+       struct cpuidle_device *dev;
+       int cpu;
+
+       for_each_present_cpu(cpu) {
+               dev = &per_cpu(cpuidle_dev, cpu);
+               if (machine_is_lpar())
+                       dev->poll_limit_ns = 5000;
+               else
+                       dev->poll_limit_ns = 1000;
+       }
+}
+

This causes cpuidle_poll_time() to exit early as poll_limit_ns is
already set for the polling cpu.

The results are quite similar to original benchmark results I collected,
therefore I don't think it is worth adding it to the patch series.

...

That said, one could argue that the value of CPUIDLE_POLL_MIN could be
reconsidered. Looking at the comments above and also the introducing
commit 7a25759eaa04 ("cpuidle: Select polling interval based on a 
c-state with a longer target residency")

```
/*
  * Min polling interval of 10usec is a guess. It is assuming that
  * for most users, the time for a single ping-pong workload like
  * perf bench pipe would generally complete within 10usec but
  * this is hardware dependent. Actual time can be estimated with
  *
  * perf bench sched pipe -l 10000
  *
...
#define CPUIDLE_POLL_MIN 10000
```

- On s390 "perf bench sched pipe -l 10000" results in usecs/op values
in between 0.9-3.5 usecs.

- On my x86 laptop this range is between 1-2 usecs/op.

Based on that observation and reading the commit message I'd say
there should be some improvements possible there but, this should
be tackled with a different patch series. In fact I'd like to give
it a go after this patch series.

Thanks for having a look. I'll send a new version shortly.

     prev parent reply	other threads:[~2026-06-17 20:08 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-09 13:24 [PATCH v2 0/5] s390/idle: CPU idle driver Mete Durlu
2026-06-09 13:24 ` [PATCH v2 1/5] s390/tick: Remove CIF_NOHZ_DELAY flag Mete Durlu
2026-06-09 13:24 ` [PATCH v2 2/5] tick: Remove arch_needs_cpu Mete Durlu
2026-06-09 15:16   ` Thomas Gleixner
2026-06-09 13:24 ` [PATCH v2 3/5] s390: Enable TIF_POLLING_NRFLAG Mete Durlu
2026-06-09 13:24 ` [PATCH v2 4/5] s390/idle: Introduce cpuidle for s390 Mete Durlu
2026-06-09 13:24 ` [PATCH v2 5/5] s390/configs: Enable cpuidle driver on s390 Mete Durlu
2026-06-09 14:11 ` [PATCH v2 0/5] s390/idle: CPU idle driver Heiko Carstens
2026-06-09 15:47 ` Christian Loehle
2026-06-10 20:23   ` Mete Durlu
2026-06-15 12:24     ` Mete Durlu
2026-06-17  7:15     ` Christian Loehle
2026-06-17 20:08       ` Mete Durlu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d4e52c4-081f-4e2a-a940-cb76d578324b@linux.ibm.com \
    --to=meted@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=anna-maria@linutronix.de \
    --cc=borntraeger@linux.ibm.com \
    --cc=christian.loehle@arm.com \
    --cc=daniel.lezcano@kernel.org \
    --cc=frederic@kernel.org \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=rafael@kernel.org \
    --cc=svens@linux.ibm.com \
    --cc=tglx@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox