public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Martin <Dave.Martin@arm.com>
To: Peter Newman <peternewman@google.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	Fenghua Yu <fenghuay@nvidia.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	Drew Fustini <dfustini@baylibre.com>,
	Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>,
	Chen Yu <yu.c.chen@intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev
Subject: Re: [PATCH v4 13/31] fs/resctrl: Add support for additional monitor event display formats
Date: Fri, 9 May 2025 17:43:12 +0100	[thread overview]
Message-ID: <aB4woCcnPC5Mz7cf@e133380.arm.com> (raw)
In-Reply-To: <CALPaoCjzrGMTEYmTpH=9o_=N24apE0U057p6Mt6Knt9PoyFmzw@mail.gmail.com>

Hi,

On Fri, May 09, 2025 at 04:46:30PM +0200, Peter Newman wrote:
> Hi Dave,
> 
> On Fri, May 9, 2025 at 1:29 PM Dave Martin <Dave.Martin@arm.com> wrote:

[...]

> > For example: scaling memory bandwidth percentages for MPAM is a
> > nuisance because the hardware uses fixed-point values scaled by a power
> > of 2, not by 100: the two scales can never match up anywhere except at
> > multiples of 25%, leading to irregular increments when rounded to an
> > integer percentage value and uncertainty about what the bandwidth_gran
> > parameter means.  Round-trip conversions between the two
> > representations become error-prone due to repeated rounding -- this
> > proved quite fiddly to get right.  Precision beyond 1% increments may
> > also be available in the hardware, but is not accessible through the
> > resctrl interface.
> 
> Google users got annoyed with these rounding errors very quickly and
> asked me to change the MBA interface to the raw, fixed-point value
> used by the MPAM register interface. (but at least shifted down, since
> the MBW_MIN/MAX fields are left-justified)

That's interesting.

Do you find a need to do things like step the bandwidth allocation for
a control group?  So, as part of a tuning regime, the bandwidth value
is read out, stepped to the next distinct hardware value and written
back in?

That kind of thing does not map in a convenient way onto the current
interface, although fire-and-forget programming of a predetermined
percentage works fine.

Extending my model outline, a 6-bit MPAM MBW_PART implementation might
be described by:

	min: 1
	max: 64
	step size: 1
	multiplier: 1
	divisor: 64

How easy / difficult do you think it would be for userspace to work
with this, if resctrlfs were to expose the raw control (minus the
ignored bits) with that metadata?

Needless to say, the max and divisor values would dependent on the
hardware and possibly other factors.  They would be fixed for the
lifetime of a single resctrl instance at the very least.

> > For backwards compatibility we probably shouldn't change that
> > particular interface, but if we can avoid new instances of the same
> > kind of problem then that would be a benefit: i.e., explicitly tell
> > userspace how to scale a given parameter.
> 
> MBA is not programmed by percentage on AMD, so I'm not sure why this
> is considered necessary for backwards compatibility.

I presumed scripts (or pre-tuned data fed through them) are in practice
pretty platform-specific, so that it will upset people if the interface
changes between kernel versions at least on a given hardware family.

The divergence between AMD and Intel in this area is unfortunate, but
absolute and proportional bandwidth measures do not really seem to be
interchangeable -- so a truly unified interface may not be easy to
achieve either.

Having two control names in the interface might work, say:

	MBP: proportion of total available memory bandwidth (%)

	MBA: absolute memory bandwidth (B/s)

Then just expose the one that the hardware implements natively (while
still exposing MB as a backwards compatible alias if necessary).

Cheers
---Dave

  parent reply	other threads:[~2025-05-09 16:43 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-29  0:33 [PATCH v4 00/31] x86/resctrl telemetry monitoring Tony Luck
2025-04-29  0:33 ` [PATCH v4 01/31] x86,fs/resctrl: Drop rdt_mon_features variable Tony Luck
2025-05-08  3:28   ` Reinette Chatre
2025-05-08 18:32     ` Luck, Tony
2025-05-08 23:44       ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 02/31] x86,fs/resctrl: Prepare for more monitor events Tony Luck
2025-05-08  3:30   ` Reinette Chatre
2025-05-09 15:02   ` Peter Newman
2025-04-29  0:33 ` [PATCH v4 03/31] fs/resctrl: Clean up rdtgroup_mba_mbps_event_{show,write}() Tony Luck
2025-05-08  3:31   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 04/31] fs/resctrl: Change how and when events are initialized Tony Luck
2025-05-08  3:31   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 05/31] fs/resctrl: Set up Kconfig options for telemetry events Tony Luck
2025-05-08  3:32   ` Reinette Chatre
2025-05-10  9:58   ` Chen, Yu C
2025-05-12 14:19     ` Luck, Tony
2025-04-29  0:33 ` [PATCH v4 06/31] x86/rectrl: Fake OOBMSM interface Tony Luck
2025-04-30 23:02   ` Luck, Tony
2025-05-08  3:33   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 07/31] x86,fs/resctrl: Improve domain type checking Tony Luck
2025-05-08  3:36   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 08/31] x86/resctrl: Move L3 initialization out of domain_add_cpu_mon() Tony Luck
2025-05-08  3:37   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 09/31] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
2025-05-08  3:37   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 10/31] x86/resctrl: Change generic monitor functions to use struct rdt_domain_hdr Tony Luck
2025-05-08  3:38   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 11/31] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
2025-05-08  3:39   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 12/31] fs/resctrl: Improve handling for events that can be read from any CPU Tony Luck
2025-05-08  3:54   ` Reinette Chatre
2025-05-13  3:19   ` Chen, Yu C
2025-05-13 16:20     ` Luck, Tony
2025-05-14  9:11       ` Chen, Yu C
2025-04-29  0:33 ` [PATCH v4 13/31] fs/resctrl: Add support for additional monitor event display formats Tony Luck
2025-05-08 15:49   ` Reinette Chatre
2025-05-08 20:28     ` Luck, Tony
2025-05-08 23:45       ` Reinette Chatre
2025-05-09 11:29         ` Dave Martin
2025-05-09 14:46           ` Peter Newman
2025-05-09 16:38             ` Luck, Tony
2025-05-09 16:43             ` Dave Martin [this message]
2025-04-29  0:33 ` [PATCH v4 14/31] fs/resctrl: Add an architectural hook called for each mount Tony Luck
2025-05-08 15:50   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 15/31] x86/resctrl: Add and initialize rdt_resource for package scope core monitor Tony Luck
2025-05-08 15:50   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 16/31] x86/resctrl: Add first part of telemetry event enumeration Tony Luck
2025-05-08 15:53   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 17/31] x86/resctrl: Add second " Tony Luck
2025-05-08 15:54   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 18/31] x86/resctrl: Add third " Tony Luck
2025-05-08 15:56   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 19/31] x86,fs/resctrl: Fill in details of Clearwater Forest events Tony Luck
2025-05-08 15:54   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 20/31] x86/resctrl: Check for adequate MMIO space Tony Luck
2025-05-08 15:56   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 21/31] x86/resctrl: Add fourth part of telemetry event enumeration Tony Luck
2025-05-08 15:56   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 22/31] x86/resctrl: Read core telemetry events Tony Luck
2025-05-08 15:57   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 23/31] x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
2025-05-08 15:58   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 24/31] fs/resctrl: Add type define for PERF_PKG files Tony Luck
2025-04-29  0:33 ` [PATCH v4 25/31] x86/resctrl: Final steps to enable RDT_RESOURCE_PERF_PKG Tony Luck
2025-04-29  0:33 ` [PATCH v4 26/31] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
2025-05-08 15:58   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 27/31] x86/resctrl: Handle number of RMIDs supported by telemetry resources Tony Luck
2025-05-08 15:59   ` Reinette Chatre
2025-04-29  0:33 ` [PATCH v4 28/31] x86,fs/resctrl: Fix RMID allocation for multiple monitor resources Tony Luck
2025-04-29  0:33 ` [PATCH v4 29/31] fs/resctrl: Add interface for per-resource debug info files Tony Luck
2025-04-29  0:33 ` [PATCH v4 30/31] x86/resctrl: Add info/PERF_PKG_MON/status file Tony Luck
2025-04-29  0:33 ` [PATCH v4 31/31] x86/resctrl: Update Documentation for package events Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aB4woCcnPC5Mz7cf@e133380.arm.com \
    --to=dave.martin@arm.com \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=babu.moger@amd.com \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=reinette.chatre@intel.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox