All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Region aware RDT options for resctrl
       [not found] <Z_mBcnAcGzMMvfxV@agluck-desk3>
@ 2025-04-11 20:56 ` Luck, Tony
  2025-04-14 17:30   ` Reinette Chatre
  0 siblings, 1 reply; 3+ messages in thread
From: Luck, Tony @ 2025-04-11 20:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy
  Cc: linux-kernel

On Fri, Apr 11, 2025 at 01:54:12PM -0700, Luck, Tony wrote:

Add Cc: lkml

> A future CPU from Intel will implement "region aware" memory bandwidth
> monitoring and bandwidth allocation. This will provide for more granular
> monitoring and control for heterogeneous memory configurations. BIOS
> will populate an ACPI table that describes which system physical address
> ranges belong to each region. E.g. for a two socket system with both
> DDR and CXL memory regions could be assigned like this:
> 
> Region 0: Local DDR
> Region 1: Remote DDR
> Region 2: Local CXL
> Region 3: Remote CXL
> 
> Details of the ACPI tables and MMIO registers in the "Intel(R)
> Resource Director Technology Architecture Specification" here:
> https://cdrdv2.intel.com/v1/dl/getContent/789566
> 
> The existing Linux resctrl user interface will need some extensions
> to handle these new hardware monitors and controls. Here are some
> options for discussion with the goal of aligning on some user interface
> that meets now and near future needs of all architectures.
> 
> Memory bandwidth monitoring
> ---------------------------
> 
> The existing interface provides two files in each of the per-domain
> directories under "mon_data":
> 
> 	mbm_local_bytes: Count of bytes transferred to/from "local" memory
> 	mbm_total_bytes: Count of bytes transferred to/from all memory
> 
> Proposal is to provide a new file to report traffic for each region
> for however many regions are implemented on a system:
> 
> 	mbm_region_0_bytes
> 	...
> 	mbm_region_N_bytes
> 
> Potentially a compatability file:
> 
> 	mbm_total_bytes
> 
> could be included which provides data for the sum across all regions.
> 
> Providing a similar mbm_local_bytes file would be challenging as the
> BIOS controls the region numbering and it may be difficult/impossible
> for Linux to determine which regions report "local" memory traffic.
> A future implementation may allow the OS to define the region mapping
> which makes things even more complex as the mappings could be changed
> at run time.
> 
> Memory bandwidth allocation
> ---------------------------
> 
> This is more complex as there are some additional capability improvements
> in addition to providing separate controls for each region. Resctrl
> already has support to control bandwidth to "slow" memory on AMD systems
> providing separate controls for "regular" and "slow" memory in the schemata file:
> 
> 	$ cat schemata
> 	MB:  0=100;1=100
> 	SMBA:0=100;1=100
> 
> It would be tricky for resctrl to build on this for regions for the same
> reason the mbm_local_bytes would be difficult. No way for Linux to determine
> which regions are CXL vs. DDR. This approach would also lose ability to
> control local vs. remote bandwidth. Also not extensible for future memory
> configuration options.
> 
> Option 1: Per-memory regions might be described individually like this:
> 
> 	$ cat schemata
> 	RMB0:0=100;1=100
> 	RMB1:0=75;1=75
> 	RMB2:0=25;1=25
> 
> Option 2: Add to schemata per-line syntax to keep one line, but specify each region
> in some comma separated list:
> 
> 	$ cat schemata
> 	RMB:0=100,75,50,25;1=100,50,25
> 
> But there are additional capabilities that would be useful to expose that
> may influence decisions.
> 
> 1) Better than 1% throttle granularity
> 
> Existing Intel implementations provide throttle controls in 10% steps. The
> architectural enumeration allows for at best 1% steps. But this may still be
> inadequate to provide distinct controls when very high levels of throttling
> are needed for low priority workloads. The RDT architecture specification
> allows for bandwidth limits to be specified from 1 (maximum throttle) to 511
> (no throttle) though implementations may provide other ranges, e.g. 1..255.
> 
> Option 1: Specify bandwidth in schemata with floating point values
> 
> 	$ cat info/MB/min_bandwidth
> 	0.1957
> 	$ info/MB/bandwidth_gran
> 	0.1957
> 	$ cat schemata
> 	RMB0:0=100;1=100
> 	RMB1:0=0.75;1=1.25
> 
> Option 2: Change from "percentage" to some enumerated range
> 
> 	$ cat schemata
> 	RMB0:0=511;1=511
> 
> 2) Min/max ranges for bandwidth
> 
> When a single fixed value for bandwidth limits is provided, users are
> forced to be overly conservative when assigning limits in the schemata
> file in order to keep memory controllers within capacity limits. This
> can result in jobs being throttled unnecessarily at times when there is
> plenty of bandwidth capacity available.
> 
> The latest RDT architecture specification allows for setting a minimum
> and maximum bandwidth in addition to the normal limit. Example usage
> would be to set a higher maximum value for low priority jobs to allow
> them to run faster when the system has available memory bandwidth capacity.
> High priority jobs can have a minimum bandwidth setting so that when
> the system is running close to capacity limits, those jobs are not
> throttled as much (or at all) while lower priority jobs are throttled.
> 
> Syntax option:
> 
> 	$ cat schemata
> 	RMB0:0=25<50<100;1=25<50<100
> 
> Combining some of these options for new capabilities we could have:
> 
> 	$ cat schemata
> 	RMB0:0=25<50<100;1=25<50<100
> 	RMB1:0=2.5<30<40;1=2.5<30<40
> 	RMB2:0=80<90<100;1=80<90<100
> 
> -Tony

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Region aware RDT options for resctrl
  2025-04-11 20:56 ` Region aware RDT options for resctrl Luck, Tony
@ 2025-04-14 17:30   ` Reinette Chatre
  2025-04-14 17:56     ` Luck, Tony
  0 siblings, 1 reply; 3+ messages in thread
From: Reinette Chatre @ 2025-04-14 17:30 UTC (permalink / raw)
  To: Luck, Tony, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy
  Cc: linux-kernel

Hi Tony,

Could you please help clarify how memory regions should be viewed?

On 4/11/25 1:56 PM, Luck, Tony wrote:
> On Fri, Apr 11, 2025 at 01:54:12PM -0700, Luck, Tony wrote:
> 
> Add Cc: lkml
> 
>> A future CPU from Intel will implement "region aware" memory bandwidth
>> monitoring and bandwidth allocation. This will provide for more granular
>> monitoring and control for heterogeneous memory configurations. BIOS
>> will populate an ACPI table that describes which system physical address
>> ranges belong to each region. E.g. for a two socket system with both
>> DDR and CXL memory regions could be assigned like this:
>>
>> Region 0: Local DDR
>> Region 1: Remote DDR
>> Region 2: Local CXL
>> Region 3: Remote CXL

If considering an assignment like above ...

...

>> Option 1: Per-memory regions might be described individually like this:
>>
>> 	$ cat schemata
>> 	RMB0:0=100;1=100
>> 	RMB1:0=75;1=75
>> 	RMB2:0=25;1=25
>>

... I assume "RMB0" represents "Region 0" and so forth. In this case, what do
the "domain IDs" used in above option represent?

Thank you.

Reinette

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Region aware RDT options for resctrl
  2025-04-14 17:30   ` Reinette Chatre
@ 2025-04-14 17:56     ` Luck, Tony
  0 siblings, 0 replies; 3+ messages in thread
From: Luck, Tony @ 2025-04-14 17:56 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	linux-kernel

On Mon, Apr 14, 2025 at 10:30:18AM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> Could you please help clarify how memory regions should be viewed?
> 
> On 4/11/25 1:56 PM, Luck, Tony wrote:
> > On Fri, Apr 11, 2025 at 01:54:12PM -0700, Luck, Tony wrote:
> > 
> > Add Cc: lkml
> > 
> >> A future CPU from Intel will implement "region aware" memory bandwidth
> >> monitoring and bandwidth allocation. This will provide for more granular
> >> monitoring and control for heterogeneous memory configurations. BIOS
> >> will populate an ACPI table that describes which system physical address
> >> ranges belong to each region. E.g. for a two socket system with both
> >> DDR and CXL memory regions could be assigned like this:
> >>
> >> Region 0: Local DDR
> >> Region 1: Remote DDR
> >> Region 2: Local CXL
> >> Region 3: Remote CXL
> 
> If considering an assignment like above ...
> 
> ...
> 
> >> Option 1: Per-memory regions might be described individually like this:
> >>
> >> 	$ cat schemata
> >> 	RMB0:0=100;1=100
> >> 	RMB1:0=75;1=75
> >> 	RMB2:0=25;1=25
> >>
> 
> ... I assume "RMB0" represents "Region 0" and so forth. In this case, what do
> the "domain IDs" used in above option represent?

Measurement and control is still done at the scope of each L3 cache. So
the domain ids in the schemata file and in the names of the directories
under "mon_data" are the Linux L3 cache ids.

Here's a different example of a schemata file (with different throttle
values in all positions to make the explanation below easier):

	$ cat schemata
	RMB0:0=100;1=75
	RMB1:0=50;1=25

This is a two socket system with just two regions (DDR local and remote)

0=100	CPUs in the domain of L3 instance 0 are not throttled for
	accesses to their local DDR
1=75	CPUs in the domain of L3 instance 1 are throttled to 75% for
	accesses to their local DDR
0=50	CPUS in the domain of L3 instance 0 are throttled to 50% for
	accesses to remote DDR
1=25	CPUs in the domain of L3 instance 1 are throttled to 25% for
	accesses to remote DDR

The ACPI MRRM table describes the memory ranges in each region. Each
range has two region number associated with it. One for local access,
the other for remote access. A dump of one entry looks like this:

[0002]                       Memory Range : 0000
[0002]                             Length : 0020
[0004]                           Reserved : 00000000
[0008]                System Address Base : 0000000000000000
[0008]              System Address Length : 00000000E0000000
[0002]                 Region Valid Flags : 0003
[0001]             Static Local Region ID : 00
[0001]            Static Remote Region ID : 01
[0004]                           Reserved : 00000000

It shows that the range from 0 GB to 3.5 GB will be counted/controlled
in region 0 when accessed by a CPU where this is local memory, but as
region 1 when accessed by a CPU where this range is remote.

> 
> Thank you.
> 
> Reinette

-Tony

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-04-14 17:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Z_mBcnAcGzMMvfxV@agluck-desk3>
2025-04-11 20:56 ` Region aware RDT options for resctrl Luck, Tony
2025-04-14 17:30   ` Reinette Chatre
2025-04-14 17:56     ` Luck, Tony

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.