All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: "Okanovic, Haris" <harisokn@amazon.com>
Cc: "mihai.carabas@oracle.com" <mihai.carabas@oracle.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
	"dianders@chromium.org" <dianders@chromium.org>,
	"ankur.a.arora@oracle.com" <ankur.a.arora@oracle.com>,
	"mic@digikod.net" <mic@digikod.net>,
	"pmladek@suse.com" <pmladek@suse.com>,
	"wanpengli@tencent.com" <wanpengli@tencent.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
	"arnd@arndb.de" <arnd@arndb.de>,
	"will@kernel.org" <will@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"npiggin@gmail.com" <npiggin@gmail.com>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"rick.p.edgecombe@intel.com" <rick.p.edgecombe@intel.com>,
	"juerg.haefliger@canonical.com" <juerg.haefliger@canonical.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v4 7/8] cpuidle/poll_state: replace cpu_relax with smp_cond_load_relaxed
Date: Fri, 05 Apr 2024 16:14:49 -0700	[thread overview]
Message-ID: <87r0fjtn9y.fsf@oracle.com> (raw)
In-Reply-To: <7f3e540ad30f40ae51f1abda24b1bea2c8b648ea.camel@amazon.com>


Okanovic, Haris <harisokn@amazon.com> writes:

> On Thu, 2024-02-15 at 09:41 +0200, Mihai Carabas wrote:
>> cpu_relax on ARM64 does a simple "yield". Thus we replace it with
>> smp_cond_load_relaxed which basically does a "wfe".
>>
>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
>> ---
>>  drivers/cpuidle/poll_state.c | 15 ++++++++++-----
>>  1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
>> index 9b6d90a72601..1e45be906e72 100644
>> --- a/drivers/cpuidle/poll_state.c
>> +++ b/drivers/cpuidle/poll_state.c
>> @@ -13,6 +13,7 @@
>>  static int __cpuidle poll_idle(struct cpuidle_device *dev,
>>  			       struct cpuidle_driver *drv, int index)
>>  {
>> +	unsigned long ret;
>>  	u64 time_start;
>>
>>  	time_start = local_clock_noinstr();
>> @@ -26,12 +27,16 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
>>
>>  		limit = cpuidle_poll_time(drv, dev);
>>
>> -		while (!need_resched()) {
>> -			cpu_relax();
>> -			if (loop_count++ < POLL_IDLE_RELAX_COUNT)
>> -				continue;
>> -
>> +		for (;;) {
>>  			loop_count = 0;
>> +
>> +			ret = smp_cond_load_relaxed(&current_thread_info()->flags,
>> +						    VAL & _TIF_NEED_RESCHED ||
>> +						    loop_count++ >= POLL_IDLE_RELAX_COUNT);
>
> Is it necessary to repeat this 200 times with a wfe poll?

The POLL_IDLE_RELAX_COUNT is there because on x86 each cpu_relax()
iteration is much shorter.

With WFE, it makes less sense.

> Does kvm not implement a timeout period?

Not yet, but it does become more useful after a WFE haltpoll is
available on ARM64.

Haltpoll does have a timeout, which you should be able to tune via
/sys/module/haltpoll/parameters/ but that, of course, won't help here.

> Could you make it configurable? This patch improves certain workloads
> on AWS Graviton instances as well, but blocks up to 6ms in 200 * 30us
> increments before going to wfi, which is a bit excessive.

Yeah, this looks like a problem. We could solve it by making it an
architectural parameter. Though I worry about ARM platforms with
much smaller default timeouts.
The other possibility is using WFET in the primitive, but then we
have that dependency and that's a bigger change.

Will address this in the next version.

Thanks for pointing this out.

--
ankur

WARNING: multiple messages have this Message-ID (diff)
From: Ankur Arora <ankur.a.arora@oracle.com>
To: "Okanovic, Haris" <harisokn@amazon.com>
Cc: "mihai.carabas@oracle.com" <mihai.carabas@oracle.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
	"dianders@chromium.org" <dianders@chromium.org>,
	"ankur.a.arora@oracle.com" <ankur.a.arora@oracle.com>,
	"mic@digikod.net" <mic@digikod.net>,
	"pmladek@suse.com" <pmladek@suse.com>,
	"wanpengli@tencent.com" <wanpengli@tencent.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
	"arnd@arndb.de" <arnd@arndb.de>,
	"will@kernel.org" <will@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"npiggin@gmail.com" <npiggin@gmail.com>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"rick.p.edgecombe@intel.com" <rick.p.edgecombe@intel.com>,
	"juerg.haefliger@canonical.com" <juerg.haefliger@canonical.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v4 7/8] cpuidle/poll_state: replace cpu_relax with smp_cond_load_relaxed
Date: Fri, 05 Apr 2024 16:14:49 -0700	[thread overview]
Message-ID: <87r0fjtn9y.fsf@oracle.com> (raw)
In-Reply-To: <7f3e540ad30f40ae51f1abda24b1bea2c8b648ea.camel@amazon.com>


Okanovic, Haris <harisokn@amazon.com> writes:

> On Thu, 2024-02-15 at 09:41 +0200, Mihai Carabas wrote:
>> cpu_relax on ARM64 does a simple "yield". Thus we replace it with
>> smp_cond_load_relaxed which basically does a "wfe".
>>
>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
>> ---
>>  drivers/cpuidle/poll_state.c | 15 ++++++++++-----
>>  1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
>> index 9b6d90a72601..1e45be906e72 100644
>> --- a/drivers/cpuidle/poll_state.c
>> +++ b/drivers/cpuidle/poll_state.c
>> @@ -13,6 +13,7 @@
>>  static int __cpuidle poll_idle(struct cpuidle_device *dev,
>>  			       struct cpuidle_driver *drv, int index)
>>  {
>> +	unsigned long ret;
>>  	u64 time_start;
>>
>>  	time_start = local_clock_noinstr();
>> @@ -26,12 +27,16 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
>>
>>  		limit = cpuidle_poll_time(drv, dev);
>>
>> -		while (!need_resched()) {
>> -			cpu_relax();
>> -			if (loop_count++ < POLL_IDLE_RELAX_COUNT)
>> -				continue;
>> -
>> +		for (;;) {
>>  			loop_count = 0;
>> +
>> +			ret = smp_cond_load_relaxed(&current_thread_info()->flags,
>> +						    VAL & _TIF_NEED_RESCHED ||
>> +						    loop_count++ >= POLL_IDLE_RELAX_COUNT);
>
> Is it necessary to repeat this 200 times with a wfe poll?

The POLL_IDLE_RELAX_COUNT is there because on x86 each cpu_relax()
iteration is much shorter.

With WFE, it makes less sense.

> Does kvm not implement a timeout period?

Not yet, but it does become more useful after a WFE haltpoll is
available on ARM64.

Haltpoll does have a timeout, which you should be able to tune via
/sys/module/haltpoll/parameters/ but that, of course, won't help here.

> Could you make it configurable? This patch improves certain workloads
> on AWS Graviton instances as well, but blocks up to 6ms in 200 * 30us
> increments before going to wfi, which is a bit excessive.

Yeah, this looks like a problem. We could solve it by making it an
architectural parameter. Though I worry about ARM platforms with
much smaller default timeouts.
The other possibility is using WFET in the primitive, but then we
have that dependency and that's a bigger change.

Will address this in the next version.

Thanks for pointing this out.

--
ankur

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-04-05 23:15 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-15  7:41 [PATCH v4] Enable haltpoll for arm64 Mihai Carabas
2024-02-15  7:41 ` Mihai Carabas
2024-02-15  7:41 ` [PATCH v4 1/8] x86: Move ARCH_HAS_CPU_RELAX to arch Mihai Carabas
2024-02-15  7:41   ` Mihai Carabas
2024-02-15  7:41 ` [PATCH v4 2/8] x86/kvm: Move haltpoll_want() to be arch defined Mihai Carabas
2024-02-15  7:41   ` Mihai Carabas
2024-02-15  7:41 ` [PATCH v4 3/8] governors/haltpoll: Drop kvm_para_available() check Mihai Carabas
2024-02-15  7:41   ` Mihai Carabas
2024-02-15  7:41 ` [PATCH v4 4/8] arm64: Select ARCH_HAS_CPU_RELAX Mihai Carabas
2024-02-15  7:41   ` Mihai Carabas
2024-02-15  7:41 ` [PATCH v4 5/8] arm64: Define TIF_POLLING_NRFLAG Mihai Carabas
2024-02-15  7:41   ` Mihai Carabas
2024-02-15  7:41 ` [PATCH v4 6/8] cpuidle-haltpoll: ARM64 support Mihai Carabas
2024-02-15  7:41   ` Mihai Carabas
2024-02-26  8:30   ` Tomohiro Misono (Fujitsu)
2024-02-26  8:30     ` Tomohiro Misono (Fujitsu)
2024-02-15  7:41 ` [PATCH v4 7/8] cpuidle/poll_state: replace cpu_relax with smp_cond_load_relaxed Mihai Carabas
2024-02-15  7:41   ` Mihai Carabas
2024-02-26  8:36   ` Tomohiro Misono (Fujitsu)
2024-02-26  8:36     ` Tomohiro Misono (Fujitsu)
2024-02-28  4:36     ` Ankur Arora
2024-02-28  4:36       ` Ankur Arora
2024-04-05 21:51   ` Okanovic, Haris
2024-04-05 21:51     ` Okanovic, Haris
2024-04-05 23:14     ` Ankur Arora [this message]
2024-04-05 23:14       ` Ankur Arora
2024-04-06 18:42       ` Okanovic, Haris
2024-04-06 18:42         ` Okanovic, Haris
2024-04-08 18:46         ` Ankur Arora
2024-04-08 18:46           ` Ankur Arora
2024-04-08 20:04           ` Okanovic, Haris
2024-04-08 20:04             ` Okanovic, Haris
2024-02-15  7:41 ` [PATCH v4 8/8] cpuidle: replace with HAS_CPU_RELAX with HAS_WANTS_IDLE_POLL Mihai Carabas
2024-02-15  7:41   ` Mihai Carabas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0fjtn9y.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dianders@chromium.org \
    --cc=harisokn@amazon.com \
    --cc=hpa@zytor.com \
    --cc=joao.m.martins@oracle.com \
    --cc=juerg.haefliger@canonical.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mic@digikod.net \
    --cc=mihai.carabas@oracle.com \
    --cc=mingo@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rafael@kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.