* [ACPI Code First ECN v4] "Extended-linear" addressing for direct-mapped memory-side caches
@ 2024-07-01 22:54 Dan Williams
2024-07-02 12:31 ` Jonathan Cameron
0 siblings, 1 reply; 3+ messages in thread
From: Dan Williams @ 2024-07-01 22:54 UTC (permalink / raw)
To: linux-cxl, linux-acpi; +Cc: mahesh.natu, rafael, Jonathan.Cameron
# Title: "Extended-linear" addressing for direct-mapped memory-side caches
# Status: v4
# Document: ACPI Specification 6.6
# License
SPDX-License Identifier: CC-BY-4.0
# Submitter:
* Sponsor: Dan Williams, Intel
* Creators/Contributors:
* Andy Rudoff, retired
* Mahesh Natu, Intel
* Ishwar Agarwal, Intel
# Changelog
* v4: Drop "improved cache utilization" claim (Jonathan)
* v4: Clarify SPA vs HPA terminology
* v4: Clarify possibility and difficulty of multiple CPER records to
communicate aliases (Jonathan)
* v4: Clarify that N is expected to be an integer ration of "near" to
"far" memory. (Jonathan)
* v3: Replace "Inclusive Linear" with "Extended-linear" term, and
clarify the SPA vs HPA behavior of this cache addressing mode.
(Jonathan)
* v2: Clarify the "Inclusive" term as "including the capacity of the cache
in the SRAT range length"
* v2: Clarify that 0 is an undeclared / transparent Address Mode, and
that Address Mode values other than 1 are Reserved.
v3: http://lore.kernel.org/6650e4f835a0e_195e294a8@dwillia2-mobl3.amr.corp.intel.com.notmuch
v2: http://lore.kernel.org/663ea70884cfd_db82d29414@dwillia2-xfh.jf.intel.com.notmuch
# Summary of the Change
Recall that one of the modes available with persistent memory (PMEM) was a
direct-mapped memory-side cache where DDR-memory transparently cached
PMEM. This article has more details:
https://thessdguy.com/intels-optane-two-confusing-modes-part-2-memory-mode/
...but the main takeaway of that article that is relevant for this ECN
is:
"[PMEM] is paired with a DRAM that behaves as a cache, and,
like a cache, it is invisible to the user. [..] A typical system
might combine a 64GB DRAM DIMM with a 512GB Optane DIMM, but the
total memory size will appear to the software as only 512GB."
Instead, this new "extended-linear" direct-mapped memory-side cache
addressing mode would make the memory-size that appears to software in
the above example as 576GB, an increase of addressable capacity.
A primary motivation for updating HMAT to explicitly enumerate this
addressing mode is due to the OSPM's increased role for RAS and
address-translation with CXL topologies. With CXL and OS native RAS
flows OSPM is responsible for understanding and navigating the
relationship between System-Physical-Address (SPA) ranges published
ACPI.SRAT.MemoryAffinity, Host-Physical-Address ranges (HPA) published
in the ACPI.CEDT.CFMWS, and HPAs programmed in CXL memory expander
endpoints.
Enable an OSPM to enumerate that the capacity for a memory-side cache
extends an SRAT range. Typically the "Memory Side Cache Size" enumerated
in the HMAT is "excluded" from the SRAT range length because it is a
transparent cache of the SRAT capacity. I.e. typical systems present 1:1
SPA:HPA, extended-linear memory-side-cache systems alias multiple SPAs
to a single CXL HPA. The enumeration of this addressing mode enables
OSPM-memory-RAS (Reliability, Availability, and Serviceability) flows.
# Benefits of the Change
Without this change an OSPM that encounters a memory-side cache
configuration of DDR fronting CXL may not understand that an SRAT range
extended by cache capacity should be maintained as one contiguous SPA
range even though the CXL HPA decode configuration only maps a subset of
the SRAT SPA range. In other words the memory-side-cache dynamically
maps access to that SPA range to either a CXL or DDR HPA.
Without this change the only way for system software to become aware of
the fact that one memory poison event implicates multiple address
locations would be for multiple error records (CPER) to be emitted
per-poison consumption event. That may surprise existing OSPM
implementations.
With this change, the OSPM can enumerate aliases and take actions like
quarantine / offline all the impacted aliased pages to prevent further
consumption of poison, or run repair operations on all the affected
targets. The OSPM is enabled to identify the HPA associated with a given
CXL endpoint DPA event, and understand how SPAs map to CXL HPAs.
# Impact of the Change
The proposed "Address Mode" field consumes the 2 Reserved bytes
following the "Cache Attributes" field in the "Memory Side Cache
Information Structure". The default reserved value of 0 indicates the
status quo of an undeclared addressing mode where the expectation is
that it is safe to assume the cache-capacity is transparent to the SRAT
range capacity. An OSPM that knows about new values can consider SPA to
HPA relationships according to the address-layout definition proposed
below. A legacy OSPM will ignore it as a Reserved field.
# References
* Compute Express Link Specification v3.1,
<https://www.computeexpresslink.org/>
# Detailed Description of the Change
* Section Table 5.149: Memory Side Cache Information Structure redefine
the 2 Reserved bytes starting at offset 28 as "Address Mode":
* 0 - Reserved (Unknown Address Mode)
* 1 - Extended-linear (N direct-map aliases linearly mapped)
* 2..65535 - Reserved (Unknown Address Mode)
* Extend the implementation note after Table 5.149 to explain how to
interpret the "Extended-linear" mode.
* When Address Mode is 1 'Extended-Linear' it indicates that the
associated address range (SRAT.MemoryAffinityStructure.Length) is
comprised of the backing store capacity extended by the cache
capacity. It is arranged such that there are N directly addressable
aliases of a given cacheline where N is an integer ratio of target memory
proximity domain size and the memory side cache size. Where the N
aliased addresses for a given cacheline all share the same result
for the operation 'address modulo cache size'. This setting is only
allowed when 'Cache Associativity' is 'Direct Map'."
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [ACPI Code First ECN v4] "Extended-linear" addressing for direct-mapped memory-side caches
2024-07-01 22:54 [ACPI Code First ECN v4] "Extended-linear" addressing for direct-mapped memory-side caches Dan Williams
@ 2024-07-02 12:31 ` Jonathan Cameron
2024-07-03 23:04 ` Dan Williams
0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Cameron @ 2024-07-02 12:31 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael
On Mon, 1 Jul 2024 15:54:41 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> # Title: "Extended-linear" addressing for direct-mapped memory-side caches
>
> # Status: v4
>
> # Document: ACPI Specification 6.6
>
> # License
> SPDX-License Identifier: CC-BY-4.0
>
> # Submitter:
> * Sponsor: Dan Williams, Intel
> * Creators/Contributors:
> * Andy Rudoff, retired
> * Mahesh Natu, Intel
> * Ishwar Agarwal, Intel
>
> # Changelog
> * v4: Drop "improved cache utilization" claim (Jonathan)
> * v4: Clarify SPA vs HPA terminology
> * v4: Clarify possibility and difficulty of multiple CPER records to
> communicate aliases (Jonathan)
> * v4: Clarify that N is expected to be an integer ration of "near" to
> "far" memory. (Jonathan)
> * v3: Replace "Inclusive Linear" with "Extended-linear" term, and
> clarify the SPA vs HPA behavior of this cache addressing mode.
> (Jonathan)
> * v2: Clarify the "Inclusive" term as "including the capacity of the cache
> in the SRAT range length"
> * v2: Clarify that 0 is an undeclared / transparent Address Mode, and
> that Address Mode values other than 1 are Reserved.
>
> v3: http://lore.kernel.org/6650e4f835a0e_195e294a8@dwillia2-mobl3.amr.corp.intel.com.notmuch
> v2: http://lore.kernel.org/663ea70884cfd_db82d29414@dwillia2-xfh.jf.intel.com.notmuch
>
Trivial suggestions inline.
Otherwise seems fine to me.
Jonathan
> # Summary of the Change
> Recall that one of the modes available with persistent memory (PMEM) was a
> direct-mapped memory-side cache where DDR-memory transparently cached
> PMEM. This article has more details:
>
> https://thessdguy.com/intels-optane-two-confusing-modes-part-2-memory-mode/
>
> ...but the main takeaway of that article that is relevant for this ECN
> is:
>
> "[PMEM] is paired with a DRAM that behaves as a cache, and,
> like a cache, it is invisible to the user. [..] A typical system
> might combine a 64GB DRAM DIMM with a 512GB Optane DIMM, but the
> total memory size will appear to the software as only 512GB."
>
> Instead, this new "extended-linear" direct-mapped memory-side cache
> addressing mode would make the memory-size that appears to software in
> the above example as 576GB, an increase of addressable capacity.
>
> A primary motivation for updating HMAT to explicitly enumerate this
> addressing mode is due to the OSPM's increased role for RAS and
> address-translation with CXL topologies. With CXL and OS native RAS
> flows OSPM is responsible for understanding and navigating the
> relationship between System-Physical-Address (SPA) ranges published
> ACPI.SRAT.MemoryAffinity, Host-Physical-Address ranges (HPA) published
> in the ACPI.CEDT.CFMWS, and HPAs programmed in CXL memory expander
> endpoints.
>
> Enable an OSPM to enumerate that the capacity for a memory-side cache
> extends an SRAT range. Typically the "Memory Side Cache Size" enumerated
> in the HMAT is "excluded" from the SRAT range length because it is a
> transparent cache of the SRAT capacity. I.e. typical systems present 1:1
> SPA:HPA, extended-linear memory-side-cache systems alias multiple SPAs
> to a single CXL HPA. The enumeration of this addressing mode enables
> OSPM-memory-RAS (Reliability, Availability, and Serviceability) flows.
>
> # Benefits of the Change
> Without this change an OSPM that encounters a memory-side cache
> configuration of DDR fronting CXL may not understand that an SRAT range
> extended by cache capacity should be maintained as one contiguous SPA
> range even though the CXL HPA decode configuration only maps a subset of
> the SRAT SPA range. In other words the memory-side-cache dynamically
> maps access to that SPA range to either a CXL or DDR HPA.
>
> Without this change the only way for system software to become aware of
> the fact that one memory poison event implicates multiple address
> locations would be for multiple error records (CPER) to be emitted
multiple error record sections (one CPER can contain a bunch of those).
> per-poison consumption event. That may surprise existing OSPM
> implementations.
I'd be tempted to add one line say that obviously this is impossible
for OS first handling as no one there to generate the records. That
is kind of implied, but might as well say it.
>
> With this change, the OSPM can enumerate aliases and take actions like
> quarantine / offline all the impacted aliased pages to prevent further
> consumption of poison, or run repair operations on all the affected
> targets. The OSPM is enabled to identify the HPA associated with a given
> CXL endpoint DPA event, and understand how SPAs map to CXL HPAs.
>
> # Impact of the Change
> The proposed "Address Mode" field consumes the 2 Reserved bytes
> following the "Cache Attributes" field in the "Memory Side Cache
> Information Structure". The default reserved value of 0 indicates the
> status quo of an undeclared addressing mode where the expectation is
> that it is safe to assume the cache-capacity is transparent to the SRAT
> range capacity. An OSPM that knows about new values can consider SPA to
> HPA relationships according to the address-layout definition proposed
> below. A legacy OSPM will ignore it as a Reserved field.
>
> # References
> * Compute Express Link Specification v3.1,
> <https://www.computeexpresslink.org/>
>
> # Detailed Description of the Change
>
> * Section Table 5.149: Memory Side Cache Information Structure redefine
> the 2 Reserved bytes starting at offset 28 as "Address Mode":
>
> * 0 - Reserved (Unknown Address Mode)
> * 1 - Extended-linear (N direct-map aliases linearly mapped)
> * 2..65535 - Reserved (Unknown Address Mode)
>
> * Extend the implementation note after Table 5.149 to explain how to
> interpret the "Extended-linear" mode.
>
> * When Address Mode is 1 'Extended-Linear' it indicates that the
> associated address range (SRAT.MemoryAffinityStructure.Length) is
> comprised of the backing store capacity extended by the cache
> capacity. It is arranged such that there are N directly addressable
> aliases of a given cacheline where N is an integer ratio of target memory
> proximity domain size and the memory side cache size. Where the N
> aliased addresses for a given cacheline all share the same result
> for the operation 'address modulo cache size'. This setting is only
> allowed when 'Cache Associativity' is 'Direct Map'."
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [ACPI Code First ECN v4] "Extended-linear" addressing for direct-mapped memory-side caches
2024-07-02 12:31 ` Jonathan Cameron
@ 2024-07-03 23:04 ` Dan Williams
0 siblings, 0 replies; 3+ messages in thread
From: Dan Williams @ 2024-07-03 23:04 UTC (permalink / raw)
To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael
Jonathan Cameron wrote:
> On Mon, 1 Jul 2024 15:54:41 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > # Title: "Extended-linear" addressing for direct-mapped memory-side caches
> >
> > # Status: v4
> >
> > # Document: ACPI Specification 6.6
> >
> > # License
> > SPDX-License Identifier: CC-BY-4.0
> >
> > # Submitter:
> > * Sponsor: Dan Williams, Intel
> > * Creators/Contributors:
> > * Andy Rudoff, retired
> > * Mahesh Natu, Intel
> > * Ishwar Agarwal, Intel
> >
> > # Changelog
> > * v4: Drop "improved cache utilization" claim (Jonathan)
> > * v4: Clarify SPA vs HPA terminology
> > * v4: Clarify possibility and difficulty of multiple CPER records to
> > communicate aliases (Jonathan)
> > * v4: Clarify that N is expected to be an integer ration of "near" to
> > "far" memory. (Jonathan)
> > * v3: Replace "Inclusive Linear" with "Extended-linear" term, and
> > clarify the SPA vs HPA behavior of this cache addressing mode.
> > (Jonathan)
> > * v2: Clarify the "Inclusive" term as "including the capacity of the cache
> > in the SRAT range length"
> > * v2: Clarify that 0 is an undeclared / transparent Address Mode, and
> > that Address Mode values other than 1 are Reserved.
> >
> > v3: http://lore.kernel.org/6650e4f835a0e_195e294a8@dwillia2-mobl3.amr.corp.intel.com.notmuch
> > v2: http://lore.kernel.org/663ea70884cfd_db82d29414@dwillia2-xfh.jf.intel.com.notmuch
> >
> Trivial suggestions inline.
>
> Otherwise seems fine to me.
>
> Jonathan
[..]
> > # Benefits of the Change
> > Without this change an OSPM that encounters a memory-side cache
> > configuration of DDR fronting CXL may not understand that an SRAT range
> > extended by cache capacity should be maintained as one contiguous SPA
> > range even though the CXL HPA decode configuration only maps a subset of
> > the SRAT SPA range. In other words the memory-side-cache dynamically
> > maps access to that SPA range to either a CXL or DDR HPA.
> >
> > Without this change the only way for system software to become aware of
> > the fact that one memory poison event implicates multiple address
> > locations would be for multiple error records (CPER) to be emitted
>
> multiple error record sections (one CPER can contain a bunch of those).
Noted.
> > per-poison consumption event. That may surprise existing OSPM
> > implementations.
>
> I'd be tempted to add one line say that obviously this is impossible
> for OS first handling as no one there to generate the records. That
> is kind of implied, but might as well say it.
>
As I understand this proposal is already under ASWG consideration so I
am pausing on any other editorial updates here awaiting that result.
Thanks for the feedback Jonathan.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-07-03 23:04 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-01 22:54 [ACPI Code First ECN v4] "Extended-linear" addressing for direct-mapped memory-side caches Dan Williams
2024-07-02 12:31 ` Jonathan Cameron
2024-07-03 23:04 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox