From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
To: Dave Hansen <dave.hansen@intel.com>,
Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>,
x86@kernel.org
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
rafael.j.wysocki@intel.com, len.brown@intel.com,
dave.hansen@linux.intel.com
Subject: Re: [PATCH v2 2/3] x86/smp: Allow forcing the mwait hint for play dead loop
Date: Wed, 30 Oct 2024 11:58:12 +0200 [thread overview]
Message-ID: <35946efe3b8b8b686ba4ea0ed5c9f15c50ca6ef8.camel@linux.intel.com> (raw)
In-Reply-To: <e332a243-5a98-49ed-81be-b6db305d5dc5@intel.com>
On Tue, 2024-10-29 at 11:30 -0700, Dave Hansen wrote:
> On 10/29/24 03:15, Patryk Wlazlyn wrote:
> > +void smp_set_mwait_play_dead_hint(unsigned int hint)
> > +{
> > + WRITE_ONCE(play_dead_mwait_hint, hint);
> > +}
>
> This all feels a bit hacky and unstructured to me.
>
> Could we at least set up a few rules here? Like, say what the hints
> are, what values can they have? Where do they come from? Can this get
> called more than once? Does it _need_ to be set? What's the behavior
> when it is not set? Who is responsible for calling this?
>
> What good does the smp_ prefix do? I don't think _callers_ care whether
> this is getting optimized out or not.
>
The goal of 'get_deepest_mwait_hint()' is to find the mwait hint of the deepest
available C-state, in order to request it for the offline CPU. On Intel CPUs,
the C-states and their mwait hint values are platform-specific.
Generally, there is no architectural way for enumerating mwait hints on Intel
CPUs. In the idle path (different to the CPU offline path), idle drivers (if
enabled) enumerate and request C-states using either ACPI mechanisms or a
compiled-in, per-platform custom C-states table, provided by Intel for specific
platforms.
In the CPU offline path, only the deepest C-state hint is needed. Historically,
it was determined using a simple algorithm, which happened to provide the
correct result on most Intel platforms. This algorithm is based on scanning
CPUID leaf 5 EDX bits and building the hint value from the C-state and sub-state
numbers.
Generally speaking, mwait hints are opaque numbers, and the algorithm is not
architectural. While it produces the correct results for most Intel CPUs, it
produces sub-optimal result for some CPUs. For example Intel Sierra Forest Xeon
CPU, the algorithm produces hint 0x21, while the actual deepest C-state hint is
0x23. If hint 0x21 is used, the result is that the offline CPU does not enter
the deepest available C-state. While this is not fatal, the CPU ends up saving
less energy than it could have saved.
The 'set_mwait_play_dead_hint()' function provides a mechanism for defining the
mwait hint for the offline CPU, and can be used for platforms where the generic
non-architectural algorithm provides a sub-optimal result.
Q&A.
1. Could we at least set up a few rules here? Like, say what the hints
are, what values can they have?
The hints are 8-bit values, lower 4 bits define "sub-state", higher 4 bits
define the state.
The state value (higher 4 bits) correspond to the state enumerated by CPUID leaf
5 (Value 0 is C0, value 1 is C1, etc). The sub-state value is an opaque number.
The hint is provided to the mwait instruction via EAX.
2. Where do they come from?
Hardware C-states are defined by the specific platform (e.g., C1, C1E, CC6,
PC6). Then they are "mapped" to the SDM C-states (C0, C1, C2, etc). The specific
platform defines the hint values.
Intel typically provides the hint values in the EDS (External Design
Specification) document. It is typically non-public.
Intel also discloses the hint values for open-source projects like Linux, and
then Intel engineers submit them to the intel_idle driver.
Some of the hints may also be found via ACPI _CST table.
3. Can this get called more than once?
It is not supposed to. The idea is that if a driver like intel_idle is used, it
can call 'set_mwait_play_dead_hint()' and provide the most optimal hint number
for the offline code.
4. Does it _need_ to be set?
No. It is more of an optimization. But it is an important optimization which may
result in saving a lot of money in a datacenter.
Typically using a "wrong" hint value is non-fatal, at least I did not see it
being fatal so far. The CPU will map it to some hardware C-state request, but
which one - depends on the "wrong" value and the CPU. It just may be sub-
optimal.
5. What's the behavior when it is not set?
The offline code will fall-back to the generic non-architectural algorithm,
which provides correct results for all server platforms I dealt with since 2017.
It should provide the correct hint for most client platforms, as far as I am
aware.
Sierra Forest Xeon is the first platform where the generic algorithm provides a
sub-optimal value 0x21. It is not fatal, just sub-optimal.
Note: I am working with Intel firmware team on having the FW "re-mapping" hint
0x21 to hint 0x23, so that "unaware" Linux kernel also ends up with requesting
the deepest C-state for an offline CPU.
6. Who is responsible for calling this?
The idea for now is that the intel_idle driver calls it.
But in theory, in the future, any driver/platform code may call it if it "knows"
what's the most optimal hint, I suppose. I do not have a good example though.
Artem.
next prev parent reply other threads:[~2024-10-30 9:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-29 10:15 [PATCH v2 0/3] SRF: Fix offline CPU preventing pc6 entry Patryk Wlazlyn
2024-10-29 10:15 ` [PATCH v2 1/3] x86/smp: Move mwait hint computation out of mwait_play_dead Patryk Wlazlyn
2024-10-29 10:15 ` [PATCH v2 2/3] x86/smp: Allow forcing the mwait hint for play dead loop Patryk Wlazlyn
2024-10-29 18:30 ` Dave Hansen
2024-10-30 9:58 ` Artem Bityutskiy [this message]
2024-10-30 19:32 ` Dave Hansen
2024-10-30 19:53 ` Rafael J. Wysocki
2024-10-30 20:11 ` Dave Hansen
2024-10-30 20:14 ` Rafael J. Wysocki
2024-11-06 8:14 ` Artem Bityutskiy
2024-11-06 14:46 ` Dave Hansen
2024-10-30 13:33 ` Patryk Wlazlyn
2024-10-30 22:55 ` Dave Hansen
2024-10-29 10:15 ` [PATCH v2 3/3] intel_idle: Identify the deepest cstate for SRF Patryk Wlazlyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=35946efe3b8b8b686ba4ea0ed5c9f15c50ca6ef8.camel@linux.intel.com \
--to=artem.bityutskiy@linux.intel.com \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=patryk.wlazlyn@linux.intel.com \
--cc=rafael.j.wysocki@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.