public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>,
	Ricardo Neri <ricardo.neri-calderon@linux.intel.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	linux-pm@vger.kernel.org
Cc: x86@kernel.org, linux-doc@vger.kernel.org,
	Len Brown <len.brown@intel.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	Amit Kucheria <amitk@kernel.org>, Andi Kleen <ak@linux.intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Ricardo Neri <ricardo.neri@intel.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/7] x86/Documentation: Describe the Intel Hardware Feedback Interface
Date: Tue, 30 Nov 2021 02:20:16 -0800	[thread overview]
Message-ID: <49299b089f553ef2878aaa7eaf60f8c3600b939d.camel@linux.intel.com> (raw)
In-Reply-To: <81bca26d-eac8-31ed-e5ec-81812664d671@linaro.org>

Hi Daniel,

On Tue, 2021-11-30 at 10:24 +0100, Daniel Lezcano wrote:
> Hi Ricardo,
> > 

[...]

> > +The Hardware Feedback Interface
> > +-------------------------------
> > +
> > +The Hardware Feedback Interface provides to the operating system
> > information
> > +about the performance and energy efficiency of each CPU in the
> > system. Each
> > +capability is given as a unit-less quantity in the range [0-255].
> > Higher values
> > +indicate higher capability. Energy efficiency and performance are
> > reported in
> > +separate capabilities.
> 
> Are they linked together (eg. higher energy efficiency => lower
> performance)?

Generally true.
But for some workload and condition higher energy efficient point
doesn't mean lower performance.

> 
> > +These capabilities may change at runtime as a result of changes in
> > the
> > +operating conditions of the system or the action of external
> > factors.
> 
> Is it possible to give examples?
For example a server farm decide to save power by reduce cooling cost,
by lowering TDP. This can be done remotely. This will result in
notification of a lower performance value or even perf=eff=0 on some
CPUs via HFI. Intel CPU has capability to change TDP level runtime.

Or if the system is over heating the firmware can indicate lower
performance, so OSPM can take action. 

> 
> > The rate
> > +at which these capabilities are updated is specific to each
> > processor model. On
> > +some models, capabilities are set at boot time and never change.
> > On others,
> > +capabilities may change every tens of milliseconds.
> > +
> > +The kernel or a userspace policy daemon can use these capabilities
> > to modify
> > +task placement decisions. For instance, if either the performance
> > or energy
> > +capabilities of a given logical processor becomes zero, it is an
> > indication that
> > +the hardware recommends to the operating system to not schedule
> > any tasks on
> > +that processor for performance or energy efficiency reasons,
> > respectively.
> 
> How the userspace can be involved in these decisions? If the
> performance
> is impacted then that should be reflected in the CPU capacity. The
> scheduler will prevent to put task on CPU with a low capacity, no?
> 
> I'm also worried about the overhead of the userspace notifications.
> 
> That sounds like similar to the thermal pressure? Wouldn't make sense
> to
> create a generic component where HFI, cpufreq cooling, LMh, etc ...
> are
> the backend?

The problem is treatment of perf/eff == 0 of a CPU, which we can
indicate as capacity  = 0 to scheduler. But this doesn't prevent
scheduler for using that CPU on a overloaded system. We can offline
that CPU in kernel, which will be intrusive without notifying user
space or may fail for CPU0. Tried cpu idle injection, remove from cpu
sets. But doesn't work when interrupt are affined to that CPU, soft
irqs or timer scheduled there.

Here the notification are in order of several ms in order ( In reality
they are in seconds for current use cases). These are not emergency
events. Same as other thermal notifications, if something urgent FW can
already force to lowest performance without even notifying user space.


Thanks,
Srinivas

> 
> 
> 
> > +Implementation details for Linux
> > +--------------------------------
> > +
> > +The infrastructure to handle thermal event interrupts has two
> > parts. In the
> > +Local Vector Table of a CPU's local APIC, there exists a register
> > for the
> > +Thermal Monitor Register. This register controls how interrupts
> > are delivered
> > +to a CPU when the thermal monitor generates and interrupt. Further
> > details
> > +can be found in the Intel SDM Vol. 3 Section 10.5 [1]_.
> > +
> > +The thermal monitor may generate interrupts per CPU or per
> > package. The HFI
> > +generates package-level interrupts. This monitor is configured and
> > initialized
> > +via a set of machine-specific registers. Specifically, the HFI
> > interrupt and
> > +status are controlled via designated bits in the
> > IA32_PACKAGE_THERM_INTERRUPT
> > +and IA32_PACKAGE_THERM_STATUS registers, respectively. There
> > exists one HFI
> > +table per package. Further details can be found in the Intel SDM
> > Vol. 3
> > +Section 14.9 [1]_.
> > +
> > +The hardware issues an HFI interrupt after updating the HFI table
> > and is ready
> > +for the operating system to consume it. CPUs receive such
> > interrupt via the
> > +thermal entry in the Local APIC's Local Vector Table.
> > +
> > +When servicing such interrupt, the HFI driver parses the updated
> > table and
> > +relays the update to userspace using the thermal notification
> > framework. Given
> > +that there may be many HFI updates every second, the updates
> > relayed to
> > +userspace are throttled at a rate of CONFIG_HZ jiffies.
> > +
> > +References
> > +----------
> > +
> > +.. [1] https://www.intel.com/sdm
> > 
> 
> 



  reply	other threads:[~2021-11-30 10:20 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-06  1:33 [PATCH 0/7] Thermal: Introduce the Hardware Feedback Interface for thermal and performance management Ricardo Neri
2021-11-06  1:33 ` [PATCH 1/7] x86/Documentation: Describe the Intel Hardware Feedback Interface Ricardo Neri
2021-11-30  9:24   ` Daniel Lezcano
2021-11-30 10:20     ` Srinivas Pandruvada [this message]
2021-11-06  1:33 ` [PATCH 2/7] x86: Add definitions for " Ricardo Neri
2021-11-06 10:30   ` Borislav Petkov
2021-11-06 22:01     ` Ricardo Neri
2021-11-06  1:33 ` [PATCH 3/7] thermal: intel: hfi: Minimally initialize the " Ricardo Neri
2021-11-08  8:47   ` Peter Zijlstra
2021-11-09  2:28     ` Ricardo Neri
2021-11-24 14:09   ` Rafael J. Wysocki
2021-11-30  3:20     ` Ricardo Neri
2021-11-30  3:55       ` Srinivas Pandruvada
2021-11-30 13:45         ` Ricardo Neri
2021-11-06  1:33 ` [PATCH 4/7] thermal: intel: hfi: Handle CPU hotplug events Ricardo Neri
2021-11-24 14:48   ` Rafael J. Wysocki
2021-11-30 13:21     ` Ricardo Neri
2021-11-30 13:32       ` Rafael J. Wysocki
2021-12-02 23:43         ` Ricardo Neri
2021-11-06  1:33 ` [PATCH 5/7] thermal: intel: hfi: Enable notification interrupt Ricardo Neri
2021-11-08  9:01   ` Peter Zijlstra
2021-11-09 15:00     ` Ricardo Neri
2021-11-08  9:07   ` Peter Zijlstra
2021-11-09  2:26     ` Ricardo Neri
2021-11-09  8:48       ` Peter Zijlstra
2021-11-09 12:54         ` Srinivas Pandruvada
2021-11-06  1:33 ` [PATCH 6/7] thermal: netlink: Add a new event to notify CPU capabilities change Ricardo Neri
2021-11-09 12:39   ` Lukasz Luba
2021-11-09 13:23     ` Srinivas Pandruvada
2021-11-09 13:53       ` Lukasz Luba
2021-11-09 14:15         ` Srinivas Pandruvada
2021-11-09 17:51           ` Lukasz Luba
2021-11-09 21:25             ` Srinivas Pandruvada
2021-11-30  9:29   ` Daniel Lezcano
2021-12-09 16:03     ` Ricardo Neri
2021-12-09 16:57       ` Daniel Lezcano
2021-12-09 17:39         ` Srinivas Pandruvada
2021-11-06  1:33 ` [PATCH 7/7] thermal: intel: hfi: Notify user space for HFI events Ricardo Neri
2021-11-24 15:18   ` Rafael J. Wysocki
2021-11-26  6:23     ` Srinivas Pandruvada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49299b089f553ef2878aaa7eaf60f8c3600b939d.camel@linux.intel.com \
    --to=srinivas.pandruvada@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=amitk@kernel.org \
    --cc=aubrey.li@linux.intel.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=len.brown@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=ricardo.neri-calderon@linux.intel.com \
    --cc=ricardo.neri@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox