From: Ankur Arora <ankur.a.arora@oracle.com>
To: "Okanovic, Haris" <harisokn@amazon.com>
Cc: "ankur.a.arora@oracle.com" <ankur.a.arora@oracle.com>,
"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"dianders@chromium.org" <dianders@chromium.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"pmladek@suse.com" <pmladek@suse.com>,
"wanpengli@tencent.com" <wanpengli@tencent.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
"mihai.carabas@oracle.com" <mihai.carabas@oracle.com>,
"arnd@arndb.de" <arnd@arndb.de>,
"will@kernel.org" <will@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"peterz@infradead.org" <peterz@infradead.org>,
"mic@digikod.net" <mic@digikod.net>,
"vkuznets@redhat.com" <vkuznets@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"npiggin@gmail.com" <npiggin@gmail.com>,
"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
"rafael@kernel.org" <rafael@kernel.org>,
"juerg.haefliger@canonical.com" <juerg.haefliger@canonical.com>,
"x86@kernel.org" <x86@kernel.org>,
"rick.p.edgecombe@intel.com" <rick.p.edgecombe@intel.com>
Subject: Re: [PATCH v4 7/8] cpuidle/poll_state: replace cpu_relax with smp_cond_load_relaxed
Date: Mon, 08 Apr 2024 11:46:11 -0700 [thread overview]
Message-ID: <87il0rsnf0.fsf@oracle.com> (raw)
In-Reply-To: <aada0beae0b3479bfa311eea94a3b595bb8e5835.camel@amazon.com>
Okanovic, Haris <harisokn@amazon.com> writes:
> On Fri, 2024-04-05 at 16:14 -0700, Ankur Arora wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> Okanovic, Haris <harisokn@amazon.com> writes:
>>
>> > On Thu, 2024-02-15 at 09:41 +0200, Mihai Carabas wrote:
>> > > cpu_relax on ARM64 does a simple "yield". Thus we replace it with
>> > > smp_cond_load_relaxed which basically does a "wfe".
>> > >
>> > > Suggested-by: Peter Zijlstra <peterz@infradead.org>
>> > > Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
>> > > ---
>> > > drivers/cpuidle/poll_state.c | 15 ++++++++++-----
>> > > 1 file changed, 10 insertions(+), 5 deletions(-)
>> > >
>> > > diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
>> > > index 9b6d90a72601..1e45be906e72 100644
>> > > --- a/drivers/cpuidle/poll_state.c
>> > > +++ b/drivers/cpuidle/poll_state.c
>> > > @@ -13,6 +13,7 @@
>> > > static int __cpuidle poll_idle(struct cpuidle_device *dev,
>> > > struct cpuidle_driver *drv, int index)
>> > > {
>> > > + unsigned long ret;
>> > > u64 time_start;
>> > >
>> > > time_start = local_clock_noinstr();
>> > > @@ -26,12 +27,16 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
>> > >
>> > > limit = cpuidle_poll_time(drv, dev);
>> > >
>> > > - while (!need_resched()) {
>> > > - cpu_relax();
>> > > - if (loop_count++ < POLL_IDLE_RELAX_COUNT)
>> > > - continue;
>> > > -
>> > > + for (;;) {
>> > > loop_count = 0;
>> > > +
>> > > + ret = smp_cond_load_relaxed(¤t_thread_info()->flags,
>> > > + VAL & _TIF_NEED_RESCHED ||
>> > > + loop_count++ >= POLL_IDLE_RELAX_COUNT);
>> >
>> > Is it necessary to repeat this 200 times with a wfe poll?
>>
>> The POLL_IDLE_RELAX_COUNT is there because on x86 each cpu_relax()
>> iteration is much shorter.
>>
>> With WFE, it makes less sense.
>>
>> > Does kvm not implement a timeout period?
>>
>> Not yet, but it does become more useful after a WFE haltpoll is
>> available on ARM64.
>
> Note that kvm conditionally traps WFE and WFI based on number of host
> CPU tasks. VMs will sometimes see hardware behavior - potentially
> polling for a long time before entering WFI.
>
> https://elixir.bootlin.com/linux/latest/source/arch/arm64/kvm/arm.c#L459
Yeah. There was a discussion on this
https://lore.kernel.org/lkml/871qc6qufy.fsf@oracle.com/.
>> Haltpoll does have a timeout, which you should be able to tune via
>> /sys/module/haltpoll/parameters/ but that, of course, won't help here.
>>
>> > Could you make it configurable? This patch improves certain workloads
>> > on AWS Graviton instances as well, but blocks up to 6ms in 200 * 30us
>> > increments before going to wfi, which is a bit excessive.
>>
>> Yeah, this looks like a problem. We could solve it by making it an
>> architectural parameter. Though I worry about ARM platforms with
>> much smaller default timeouts.
>> The other possibility is using WFET in the primitive, but then we
>> have that dependency and that's a bigger change.
>
> See arm64's delay() for inspiration:
>
> https://elixir.bootlin.com/linux/v6.9-rc2/source/arch/arm64/lib/delay.c#L26
Sure, that part is straight-forward enough. However, this will need a fallback
the case when WFET is not available. And, because this path is used on x86,
so we need a cross platform smp_cond*timeout(). Though given that the x86
version is based on cpu_relax() then that could just fold the sched_clock()
check in.
Maybe another place to do this would be by KVM forcing a WFE timeout. Arguably
that is needed regardless of whether we use a smp_cond*timeout() or not.
--
ankur
WARNING: multiple messages have this Message-ID (diff)
From: Ankur Arora <ankur.a.arora@oracle.com>
To: "Okanovic, Haris" <harisokn@amazon.com>
Cc: "ankur.a.arora@oracle.com" <ankur.a.arora@oracle.com>,
"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"dianders@chromium.org" <dianders@chromium.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"pmladek@suse.com" <pmladek@suse.com>,
"wanpengli@tencent.com" <wanpengli@tencent.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
"mihai.carabas@oracle.com" <mihai.carabas@oracle.com>,
"arnd@arndb.de" <arnd@arndb.de>,
"will@kernel.org" <will@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"peterz@infradead.org" <peterz@infradead.org>,
"mic@digikod.net" <mic@digikod.net>,
"vkuznets@redhat.com" <vkuznets@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"npiggin@gmail.com" <npiggin@gmail.com>,
"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
"rafael@kernel.org" <rafael@kernel.org>,
"juerg.haefliger@canonical.com" <juerg.haefliger@canonical.com>,
"x86@kernel.org" <x86@kernel.org>,
"rick.p.edgecombe@intel.com" <rick.p.edgecombe@intel.com>
Subject: Re: [PATCH v4 7/8] cpuidle/poll_state: replace cpu_relax with smp_cond_load_relaxed
Date: Mon, 08 Apr 2024 11:46:11 -0700 [thread overview]
Message-ID: <87il0rsnf0.fsf@oracle.com> (raw)
In-Reply-To: <aada0beae0b3479bfa311eea94a3b595bb8e5835.camel@amazon.com>
Okanovic, Haris <harisokn@amazon.com> writes:
> On Fri, 2024-04-05 at 16:14 -0700, Ankur Arora wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> Okanovic, Haris <harisokn@amazon.com> writes:
>>
>> > On Thu, 2024-02-15 at 09:41 +0200, Mihai Carabas wrote:
>> > > cpu_relax on ARM64 does a simple "yield". Thus we replace it with
>> > > smp_cond_load_relaxed which basically does a "wfe".
>> > >
>> > > Suggested-by: Peter Zijlstra <peterz@infradead.org>
>> > > Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
>> > > ---
>> > > drivers/cpuidle/poll_state.c | 15 ++++++++++-----
>> > > 1 file changed, 10 insertions(+), 5 deletions(-)
>> > >
>> > > diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
>> > > index 9b6d90a72601..1e45be906e72 100644
>> > > --- a/drivers/cpuidle/poll_state.c
>> > > +++ b/drivers/cpuidle/poll_state.c
>> > > @@ -13,6 +13,7 @@
>> > > static int __cpuidle poll_idle(struct cpuidle_device *dev,
>> > > struct cpuidle_driver *drv, int index)
>> > > {
>> > > + unsigned long ret;
>> > > u64 time_start;
>> > >
>> > > time_start = local_clock_noinstr();
>> > > @@ -26,12 +27,16 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
>> > >
>> > > limit = cpuidle_poll_time(drv, dev);
>> > >
>> > > - while (!need_resched()) {
>> > > - cpu_relax();
>> > > - if (loop_count++ < POLL_IDLE_RELAX_COUNT)
>> > > - continue;
>> > > -
>> > > + for (;;) {
>> > > loop_count = 0;
>> > > +
>> > > + ret = smp_cond_load_relaxed(¤t_thread_info()->flags,
>> > > + VAL & _TIF_NEED_RESCHED ||
>> > > + loop_count++ >= POLL_IDLE_RELAX_COUNT);
>> >
>> > Is it necessary to repeat this 200 times with a wfe poll?
>>
>> The POLL_IDLE_RELAX_COUNT is there because on x86 each cpu_relax()
>> iteration is much shorter.
>>
>> With WFE, it makes less sense.
>>
>> > Does kvm not implement a timeout period?
>>
>> Not yet, but it does become more useful after a WFE haltpoll is
>> available on ARM64.
>
> Note that kvm conditionally traps WFE and WFI based on number of host
> CPU tasks. VMs will sometimes see hardware behavior - potentially
> polling for a long time before entering WFI.
>
> https://elixir.bootlin.com/linux/latest/source/arch/arm64/kvm/arm.c#L459
Yeah. There was a discussion on this
https://lore.kernel.org/lkml/871qc6qufy.fsf@oracle.com/.
>> Haltpoll does have a timeout, which you should be able to tune via
>> /sys/module/haltpoll/parameters/ but that, of course, won't help here.
>>
>> > Could you make it configurable? This patch improves certain workloads
>> > on AWS Graviton instances as well, but blocks up to 6ms in 200 * 30us
>> > increments before going to wfi, which is a bit excessive.
>>
>> Yeah, this looks like a problem. We could solve it by making it an
>> architectural parameter. Though I worry about ARM platforms with
>> much smaller default timeouts.
>> The other possibility is using WFET in the primitive, but then we
>> have that dependency and that's a bigger change.
>
> See arm64's delay() for inspiration:
>
> https://elixir.bootlin.com/linux/v6.9-rc2/source/arch/arm64/lib/delay.c#L26
Sure, that part is straight-forward enough. However, this will need a fallback
the case when WFET is not available. And, because this path is used on x86,
so we need a cross platform smp_cond*timeout(). Though given that the x86
version is based on cpu_relax() then that could just fold the sched_clock()
check in.
Maybe another place to do this would be by KVM forcing a WFE timeout. Arguably
that is needed regardless of whether we use a smp_cond*timeout() or not.
--
ankur
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-04-08 18:46 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-15 7:41 [PATCH v4] Enable haltpoll for arm64 Mihai Carabas
2024-02-15 7:41 ` Mihai Carabas
2024-02-15 7:41 ` [PATCH v4 1/8] x86: Move ARCH_HAS_CPU_RELAX to arch Mihai Carabas
2024-02-15 7:41 ` Mihai Carabas
2024-02-15 7:41 ` [PATCH v4 2/8] x86/kvm: Move haltpoll_want() to be arch defined Mihai Carabas
2024-02-15 7:41 ` Mihai Carabas
2024-02-15 7:41 ` [PATCH v4 3/8] governors/haltpoll: Drop kvm_para_available() check Mihai Carabas
2024-02-15 7:41 ` Mihai Carabas
2024-02-15 7:41 ` [PATCH v4 4/8] arm64: Select ARCH_HAS_CPU_RELAX Mihai Carabas
2024-02-15 7:41 ` Mihai Carabas
2024-02-15 7:41 ` [PATCH v4 5/8] arm64: Define TIF_POLLING_NRFLAG Mihai Carabas
2024-02-15 7:41 ` Mihai Carabas
2024-02-15 7:41 ` [PATCH v4 6/8] cpuidle-haltpoll: ARM64 support Mihai Carabas
2024-02-15 7:41 ` Mihai Carabas
2024-02-26 8:30 ` Tomohiro Misono (Fujitsu)
2024-02-26 8:30 ` Tomohiro Misono (Fujitsu)
2024-02-15 7:41 ` [PATCH v4 7/8] cpuidle/poll_state: replace cpu_relax with smp_cond_load_relaxed Mihai Carabas
2024-02-15 7:41 ` Mihai Carabas
2024-02-26 8:36 ` Tomohiro Misono (Fujitsu)
2024-02-26 8:36 ` Tomohiro Misono (Fujitsu)
2024-02-28 4:36 ` Ankur Arora
2024-02-28 4:36 ` Ankur Arora
2024-04-05 21:51 ` Okanovic, Haris
2024-04-05 21:51 ` Okanovic, Haris
2024-04-05 23:14 ` Ankur Arora
2024-04-05 23:14 ` Ankur Arora
2024-04-06 18:42 ` Okanovic, Haris
2024-04-06 18:42 ` Okanovic, Haris
2024-04-08 18:46 ` Ankur Arora [this message]
2024-04-08 18:46 ` Ankur Arora
2024-04-08 20:04 ` Okanovic, Haris
2024-04-08 20:04 ` Okanovic, Haris
2024-02-15 7:41 ` [PATCH v4 8/8] cpuidle: replace with HAS_CPU_RELAX with HAS_WANTS_IDLE_POLL Mihai Carabas
2024-02-15 7:41 ` Mihai Carabas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87il0rsnf0.fsf@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=daniel.lezcano@linaro.org \
--cc=dianders@chromium.org \
--cc=harisokn@amazon.com \
--cc=hpa@zytor.com \
--cc=joao.m.martins@oracle.com \
--cc=juerg.haefliger@canonical.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mic@digikod.net \
--cc=mihai.carabas@oracle.com \
--cc=mingo@redhat.com \
--cc=npiggin@gmail.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rafael@kernel.org \
--cc=rick.p.edgecombe@intel.com \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.