From: Dave Hansen <dave.hansen@intel.com>
To: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>,
Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>,
x86@kernel.org
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
rafael.j.wysocki@intel.com, len.brown@intel.com,
dave.hansen@linux.intel.com
Subject: Re: [PATCH v2 2/3] x86/smp: Allow forcing the mwait hint for play dead loop
Date: Wed, 30 Oct 2024 12:32:34 -0700 [thread overview]
Message-ID: <000fd68e-2b24-4eb3-b2d7-e4856b403212@intel.com> (raw)
In-Reply-To: <35946efe3b8b8b686ba4ea0ed5c9f15c50ca6ef8.camel@linux.intel.com>
On 10/30/24 02:58, Artem Bityutskiy wrote:
> On Tue, 2024-10-29 at 11:30 -0700, Dave Hansen wrote:
> 1. Could we at least set up a few rules here? Like, say what the hints
> are, what values can they have?
>
> The hints are 8-bit values, lower 4 bits define "sub-state", higher 4 bits
> define the state.
>
> The state value (higher 4 bits) correspond to the state enumerated by CPUID leaf
> 5 (Value 0 is C0, value 1 is C1, etc). The sub-state value is an opaque number.
>
> The hint is provided to the mwait instruction via EAX.
OK, so can you distill that down to something succinct and get it in a
comment above the new function, please?
> 2. Where do they come from?
>
> Hardware C-states are defined by the specific platform (e.g., C1, C1E, CC6,
> PC6). Then they are "mapped" to the SDM C-states (C0, C1, C2, etc). The specific
> platform defines the hint values.
>
> Intel typically provides the hint values in the EDS (External Design
> Specification) document. It is typically non-public.
>
> Intel also discloses the hint values for open-source projects like Linux, and
> then Intel engineers submit them to the intel_idle driver.
>
> Some of the hints may also be found via ACPI _CST table.
What about the mwait_play_dead() loop that calculates the hint? Doesn't
that derive the hint from CPUID?
> 3. Can this get called more than once?
>
> It is not supposed to. The idea is that if a driver like intel_idle is used, it
> can call 'set_mwait_play_dead_hint()' and provide the most optimal hint number
> for the offline code.
There are two important nuggets in there:
First, an idle driver can but is not required to set the hint. This
would be good comment material.
Second, only one thing is supposed to set the hint. This is a good
thing to WARN() about.
> 4. Does it _need_ to be set?
>
> No. It is more of an optimization. But it is an important optimization which may
> result in saving a lot of money in a datacenter.
>
> Typically using a "wrong" hint value is non-fatal, at least I did not see it
> being fatal so far. The CPU will map it to some hardware C-state request, but
> which one - depends on the "wrong" value and the CPU. It just may be sub-
> optimal.
OK, so this tells me we *don't* want some kind of:
WARN_ON(play_dead_mwait_hint == PLAY_DEAD_MWAIT_HINT_UNSET);
warning.
> 5. What's the behavior when it is not set?
>
> The offline code will fall-back to the generic non-architectural algorithm,
> which provides correct results for all server platforms I dealt with since 2017.
> It should provide the correct hint for most client platforms, as far as I am
> aware.
>
> Sierra Forest Xeon is the first platform where the generic algorithm provides a
> sub-optimal value 0x21. It is not fatal, just sub-optimal.
What is the non-architectural algorithm? Which Linux code are you
referring to?
> Note: I am working with Intel firmware team on having the FW "re-mapping" hint
> 0x21 to hint 0x23, so that "unaware" Linux kernel also ends up with requesting
> the deepest C-state for an offline CPU.
That would be great as well. Thanks for doing that!
> 6. Who is responsible for calling this?
>
> The idea for now is that the intel_idle driver calls it.
>
> But in theory, in the future, any driver/platform code may call it if it "knows"
> what's the most optimal hint, I suppose. I do not have a good example though.
So let's look at how this works:
void native_play_dead(void)
{
...
mwait_play_dead();
if (cpuidle_play_dead())
hlt_play_dead();
}
This _existing_ code has three different ways of playing dead (in this
order of preference):
1. mwait
2. cpuidle
3. hlt
It has (at least) two different mechanisms for telling which of these to
call:
1. mwait has a bunch of built-in logic that will ensure the CPU
should use for playing dead. If not, it does nothing and returns.
2. cpuidle_play_dead() (only used by acpi_idle_driver as far as I can
tell) will return an error if it does not support playing dead
If 1 and 2 fail, then hlt_play_dead() gets called.
But the really fun part of this is that idle driver is *called* here.
The driver that is also responsible for overriding the mwait hint.
So this series opts to have the boot code plumb the hint back into a
basically undocumented global variable while also assuming that the
system is *going* to use mwait. It then does *nothing* with the
callback just adjacent to the code it wants to modify.
Seems rather spaghetti-like to me.
To make it worse, go look at da6fa7ef67f0 ("x86/smpboot: Don't use
mwait_play_dead() on AMD systems"). It hacks AMD-specific code in
mwait_play_dead() just to force the cpuidle code to get called.
What if we did this? First, introduce a helper:
bool mwait_play_dead_with_hint(u32 hint)
and then restructure native_play_dead() to look like this:
static mwait_play_dead_generic(void)
{
u32 hint = get_deepest_mwait_hint();
return mwait_play_dead_with_hint(hint);
}
void native_play_dead(void)
{
bool used;
used = cpuidle_play_dead();
if (used)
return;
used = mwait_play_dead_generic();
if (used)
return;
hlt_play_dead();
}
If the cpuidle drivers want to use mwait with a different hint, they
override the *EXISTING* drv->states[].enter_dead() functionality and
call mwait_play_dead_with_hint() with their new hint. Then they don't
need to pass anything _over_ to the mwait code.
Wouldn't something like that makes this all much more straightforward?
next prev parent reply other threads:[~2024-10-30 19:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-29 10:15 [PATCH v2 0/3] SRF: Fix offline CPU preventing pc6 entry Patryk Wlazlyn
2024-10-29 10:15 ` [PATCH v2 1/3] x86/smp: Move mwait hint computation out of mwait_play_dead Patryk Wlazlyn
2024-10-29 10:15 ` [PATCH v2 2/3] x86/smp: Allow forcing the mwait hint for play dead loop Patryk Wlazlyn
2024-10-29 18:30 ` Dave Hansen
2024-10-30 9:58 ` Artem Bityutskiy
2024-10-30 19:32 ` Dave Hansen [this message]
2024-10-30 19:53 ` Rafael J. Wysocki
2024-10-30 20:11 ` Dave Hansen
2024-10-30 20:14 ` Rafael J. Wysocki
2024-11-06 8:14 ` Artem Bityutskiy
2024-11-06 14:46 ` Dave Hansen
2024-10-30 13:33 ` Patryk Wlazlyn
2024-10-30 22:55 ` Dave Hansen
2024-10-29 10:15 ` [PATCH v2 3/3] intel_idle: Identify the deepest cstate for SRF Patryk Wlazlyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=000fd68e-2b24-4eb3-b2d7-e4856b403212@intel.com \
--to=dave.hansen@intel.com \
--cc=artem.bityutskiy@linux.intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=patryk.wlazlyn@linux.intel.com \
--cc=rafael.j.wysocki@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.