[ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches

public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed

* [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches
@ 2024-05-24 19:05 Dan Williams
  2024-06-05  9:10 ` Jonathan Cameron
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2024-05-24 19:05 UTC (permalink / raw)
  To: linux-cxl, linux-acpi; +Cc: mahesh.natu, rafael, Jonathan.Cameron

# Title: "Extended-linear" addressing for direct-mapped memory-side caches

# Status: v3

# Document: ACPI Specification 6.6

# License
SPDX-License Identifier: CC-BY-4.0

# Submitter:
* Sponsor: Dan Williams, Intel
* Creators/Contributors:
    * Andy Rudoff, retired
    * Mahesh Natu, Intel
    * Ishwar Agarwal, Intel

# Changelog
* v3: Replace "Inclusive Linear" with "Extended-linear" term, and
  clarify the SPA vs HPA behavior of this cache addressing mode.
  (Jonathan Cameron)
* v2: Clarify the "Inclusive" term as "including the capacity of the cache
  in the SRAT range length"
* v2: Clarify that 0 is an undeclared / transparent Address Mode, and
  that Address Mode values other than 1 are Reserved.

# Summary of the Change
Recall that one of the modes available with persistent memory (PMEM) was a
direct-mapped memory-side cache where DDR-memory transparently cached
PMEM. This article has more details:

https://thessdguy.com/intels-optane-two-confusing-modes-part-2-memory-mode/

...but the main takeaway of that article that is relevant for this ECN
is:

    "[PMEM] is paired with a DRAM that behaves as a cache, and,
     like a cache, it is invisible to the user. [..] A typical system
     might combine a 64GB DRAM DIMM with a 512GB Optane DIMM, but the
     total memory size will appear to the software as only 512GB."

Instead, this new "extended-linear" direct-mapped memory-side cache
addressing mode would make the memory-size that appears to software in
the above example as 576GB. The inclusion of the DDR capacity to extend
the capacity visible to software may improve cache utilization.

A primary motiviation for updating HMAT to explicitly enumerate this
addressing mode is due to the OSPM's increased role for RAS and
address-translation with CXL topologies. With CXL and OS native RAS
flows OSPM is responsible for understanding and navigating the
relationship between System-Physical-Address (SPA) ranges published
ACPI.SRAT.MemoryAffinity, Host-Physical-Address ranges (HPA) published
in the ACPI.CEDT.CFMWS, and HPAs programmed in CXL memory expander
endpoints.

Enable an OSPM to enumerate that the capacity for a memory-side cache
extends an SRAT range. Typically the "Memory Side Cache Size" enumerated
in the HMAT is "excluded" from the SRAT range length because it is a
transparent cache of the SRAT capacity. The enumeration of this
addressing mode enables OSPM-memory-RAS (Reliability, Availability, and
Serviceability) flows.

# Benefits of the Change
Without this change an OSPM that encounters a memory-side cache
configuration of DDR fronting CXL may not understand that an SRAT range
extended by cache capacity should be maintained as one contiguous SPA
range even though the CXL HPA decode configuration only maps a subset of
the SRAT SPA range. In other words the memory-side-cache dynamically
maps access to that SPA range to either a CXL or DDR HPA.

When the OSPM knows about this relationship it can take actions like
quarantine / offline all the impacted aliased pages to prevent further
consumption of poison, or run repair operations on all the affected
targets. Without this change an OSPM may not accurately identify the HPA
associated with a given CXL endpoint DPA event, or it may misunderstand
the SPAs that map to CXL HPAs.

# Impact of the Change
The proposed "Address Mode" field consumes the 2 Reserved bytes
following the "Cache Attributes" field in the "Memory Side Cache
Information Structure". The default reserved value of 0 indicates the
status quo of an undeclared addressing mode where the expectation is
that it is safe to assume the cache-capacity is transparent to the SRAT
range capacity. An OSPM that knows about new values can consider SPA to
HPA relationships according to the address-layout definition proposed
below. A legacy OSPM will ignore it as a Reserved field.

# References
* Compute Express Link Specification v3.1,
<https://www.computeexpresslink.org/>

# Detailed Description of the Change

* Section Table 5.149: Memory Side Cache Information Structure redefine
  the 2 Reserved bytes starting at offset 28 as "Address Mode":

    * 0 - Reserved (OSPM may assume transparent cache addressing)
    * 1 - Extended-linear (N direct-map aliases linearly mapped)
    * 2..65535 - Reserved (Unknown Address Mode)

* Extend the implementation note after Table 5.149 to explain how to
  interpret the "Extended-linear" mode.

  * When Address Mode is 1 'Extended-Linear' it indicates that the
    associated address range (SRAT.MemoryAffinityStructure.Length) is
    comprised of the backing store capacity extended by the cache
    capacity. It is arranged such that there are N directly addressable
    aliases of a given cacheline where N is the ratio of target memory
    proximity domain size and the memory side cache size. Where the N
    aliased addresses for a given cacheline all share the same result
    for the operation 'address modulo cache size'. This setting is only
    allowed when 'Cache Associativity' is 'Direct Map'."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches
  2024-05-24 19:05 [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches Dan Williams
@ 2024-06-05  9:10 ` Jonathan Cameron
  2024-06-05 10:51   ` Jonathan Cameron
  2024-06-17 23:24   ` Dan Williams
  0 siblings, 2 replies; 7+ messages in thread
From: Jonathan Cameron @ 2024-06-05  9:10 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

On Fri, 24 May 2024 12:05:28 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> # Title: "Extended-linear" addressing for direct-mapped memory-side caches
> 
> # Status: v3
> 
> # Document: ACPI Specification 6.6
> 
> # License
> SPDX-License Identifier: CC-BY-4.0
> 
> # Submitter:
> * Sponsor: Dan Williams, Intel
> * Creators/Contributors:
>     * Andy Rudoff, retired
>     * Mahesh Natu, Intel
>     * Ishwar Agarwal, Intel
> 
> # Changelog
> * v3: Replace "Inclusive Linear" with "Extended-linear" term, and
>   clarify the SPA vs HPA behavior of this cache addressing mode.
>   (Jonathan Cameron)
> * v2: Clarify the "Inclusive" term as "including the capacity of the cache
>   in the SRAT range length"
> * v2: Clarify that 0 is an undeclared / transparent Address Mode, and
>   that Address Mode values other than 1 are Reserved.
> 
> # Summary of the Change
> Recall that one of the modes available with persistent memory (PMEM) was a
> direct-mapped memory-side cache where DDR-memory transparently cached
> PMEM. This article has more details:
> 
> https://thessdguy.com/intels-optane-two-confusing-modes-part-2-memory-mode/
> 
> ...but the main takeaway of that article that is relevant for this ECN
> is:
> 
>     "[PMEM] is paired with a DRAM that behaves as a cache, and,
>      like a cache, it is invisible to the user. [..] A typical system
>      might combine a 64GB DRAM DIMM with a 512GB Optane DIMM, but the
>      total memory size will appear to the software as only 512GB."
> 
> Instead, this new "extended-linear" direct-mapped memory-side cache
> addressing mode would make the memory-size that appears to software in
> the above example as 576GB. The inclusion of the DDR capacity to extend
> the capacity visible to software may improve cache utilization.

I'd skip the cache utilization point as even with 'may' it might just
end up rat holing!. Capacity seems enough a justification to me and
requires a lot less justification.

Perhaps something like
"The inclusion of the DDR increases the available capacity whilst still
 providing benefits of a lower latency cache."

Up to you though as I'll not have to explain that utilization point
to anyone whereas you might.

> 
> A primary motiviation for updating HMAT to explicitly enumerate this
> addressing mode is due to the OSPM's increased role for RAS and
> address-translation with CXL topologies. With CXL and OS native RAS
> flows OSPM is responsible for understanding and navigating the
> relationship between System-Physical-Address (SPA) ranges published
> ACPI.SRAT.MemoryAffinity, Host-Physical-Address ranges (HPA) published
> in the ACPI.CEDT.CFMWS, and HPAs programmed in CXL memory expander
> endpoints.
> 
> Enable an OSPM to enumerate that the capacity for a memory-side cache
> extends an SRAT range. Typically the "Memory Side Cache Size" enumerated
> in the HMAT is "excluded" from the SRAT range length because it is a
> transparent cache of the SRAT capacity. The enumeration of this
> addressing mode enables OSPM-memory-RAS (Reliability, Availability, and
> Serviceability) flows.
> 
> # Benefits of the Change
> Without this change an OSPM that encounters a memory-side cache
> configuration of DDR fronting CXL may not understand that an SRAT range
> extended by cache capacity should be maintained as one contiguous SPA
> range even though the CXL HPA decode configuration only maps a subset of
> the SRAT SPA range. In other words the memory-side-cache dynamically
> maps access to that SPA range to either a CXL or DDR HPA.
> 
> When the OSPM knows about this relationship it can take actions like
> quarantine / offline all the impacted aliased pages to prevent further
> consumption of poison, or run repair operations on all the affected
> targets. Without this change an OSPM may not accurately identify the HPA
> associated with a given CXL endpoint DPA event, or it may misunderstand
> the SPAs that map to CXL HPAs.

I'd like something here on impacts on firmware first error reporting.
Given we'd like that to work on a non CXL aware system not aware of this
feature at all, I'd propose multiple CPER records, one for each alias.
That assumes the firmware has no path to establish the alias.

Can certainly conceive of ways to implement a probe-type setup to allow
the discovery of which alias has been poisoned etc.

Perhaps needs a note somewhere in 18.3.  Something along lines of
"For any error with SPA originating in a range, where a memory-side cache
 with address mode extended-linear is present, multiple error records
 should be presented to cover any potentially affected aliases."

Maybe an OS could opt out of that multiple reporting via _OSC or similar
but I'm not sure why it would bother though. Easier to just allow for
multiple events.

> 
> # Impact of the Change
> The proposed "Address Mode" field consumes the 2 Reserved bytes
> following the "Cache Attributes" field in the "Memory Side Cache
> Information Structure". The default reserved value of 0 indicates the
> status quo of an undeclared addressing mode where the expectation is
> that it is safe to assume the cache-capacity is transparent to the SRAT
> range capacity. An OSPM that knows about new values can consider SPA to
> HPA relationships according to the address-layout definition proposed
> below. A legacy OSPM will ignore it as a Reserved field.
> 
> # References
> * Compute Express Link Specification v3.1,
> <https://www.computeexpresslink.org/>
> 
> # Detailed Description of the Change

Probably need to up rev HMAT as well.

> 
> * Section Table 5.149: Memory Side Cache Information Structure redefine
>   the 2 Reserved bytes starting at offset 28 as "Address Mode":
> 
>     * 0 - Reserved (OSPM may assume transparent cache addressing)

Can we make that assumption?  What are today's firmware's doing for this?
I'd drop the 'may assume'  Also after this change it's not reserved.
0 explicitly means transparent cache addressing.

>     * 1 - Extended-linear (N direct-map aliases linearly mapped)
>     * 2..65535 - Reserved (Unknown Address Mode)
> 
> * Extend the implementation note after Table 5.149 to explain how to
>   interpret the "Extended-linear" mode.
> 
>   * When Address Mode is 1 'Extended-Linear' it indicates that the
>     associated address range (SRAT.MemoryAffinityStructure.Length) is
>     comprised of the backing store capacity extended by the cache
>     capacity. It is arranged such that there are N directly addressable
>     aliases of a given cacheline where N is the ratio of target memory
>     proximity domain size and the memory side cache size. Where the N
>     aliased addresses for a given cacheline all share the same result
>     for the operation 'address modulo cache size'.

Probably need more here.  What if someone has two such ranges of size

Address 0, (512G + 64G) , (1024G + 128G)
And decides to pack them for some reason.
The second one will be aligned to 64G not, 128G so modulo needs to take
into account the base address.

Do we need explicit statement that N is an integer? Probably works anyway
but having 2.5 aliases is an unusual concept.

> This setting is only
>     allowed when 'Cache Associativity' is 'Direct Map'."

Other than these corner cases looks good to me and the new terminology and
clarifications help a lot.

Thanks,

Jonathan



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches
  2024-06-05  9:10 ` Jonathan Cameron
@ 2024-06-05 10:51   ` Jonathan Cameron
  2024-06-17 23:24   ` Dan Williams
  1 sibling, 0 replies; 7+ messages in thread
From: Jonathan Cameron @ 2024-06-05 10:51 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

On Wed, 5 Jun 2024 10:10:12 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Fri, 24 May 2024 12:05:28 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > # Title: "Extended-linear" addressing for direct-mapped memory-side caches
> > 
> > # Status: v3
> > 
> > # Document: ACPI Specification 6.6
> > 
> > # License
> > SPDX-License Identifier: CC-BY-4.0
> > 
> > # Submitter:
> > * Sponsor: Dan Williams, Intel
> > * Creators/Contributors:
> >     * Andy Rudoff, retired
> >     * Mahesh Natu, Intel
> >     * Ishwar Agarwal, Intel
> > 
> > # Changelog
> > * v3: Replace "Inclusive Linear" with "Extended-linear" term, and
> >   clarify the SPA vs HPA behavior of this cache addressing mode.
> >   (Jonathan Cameron)
> > * v2: Clarify the "Inclusive" term as "including the capacity of the cache
> >   in the SRAT range length"
> > * v2: Clarify that 0 is an undeclared / transparent Address Mode, and
> >   that Address Mode values other than 1 are Reserved.
> > 
> > # Summary of the Change
> > Recall that one of the modes available with persistent memory (PMEM) was a
> > direct-mapped memory-side cache where DDR-memory transparently cached
> > PMEM. This article has more details:
> > 
> > https://thessdguy.com/intels-optane-two-confusing-modes-part-2-memory-mode/
> > 
> > ...but the main takeaway of that article that is relevant for this ECN
> > is:
> > 
> >     "[PMEM] is paired with a DRAM that behaves as a cache, and,
> >      like a cache, it is invisible to the user. [..] A typical system
> >      might combine a 64GB DRAM DIMM with a 512GB Optane DIMM, but the
> >      total memory size will appear to the software as only 512GB."
> > 
> > Instead, this new "extended-linear" direct-mapped memory-side cache
> > addressing mode would make the memory-size that appears to software in
> > the above example as 576GB. The inclusion of the DDR capacity to extend
> > the capacity visible to software may improve cache utilization.  
> 
> I'd skip the cache utilization point as even with 'may' it might just
> end up rat holing!. Capacity seems enough a justification to me and
> requires a lot less justification.
> 
> Perhaps something like
> "The inclusion of the DDR increases the available capacity whilst still
>  providing benefits of a lower latency cache."
> 
> Up to you though as I'll not have to explain that utilization point
> to anyone whereas you might.
> 
> > 
> > A primary motiviation for updating HMAT to explicitly enumerate this
> > addressing mode is due to the OSPM's increased role for RAS and
> > address-translation with CXL topologies. With CXL and OS native RAS
> > flows OSPM is responsible for understanding and navigating the
> > relationship between System-Physical-Address (SPA) ranges published
> > ACPI.SRAT.MemoryAffinity, Host-Physical-Address ranges (HPA) published
> > in the ACPI.CEDT.CFMWS, and HPAs programmed in CXL memory expander
> > endpoints.
> > 
> > Enable an OSPM to enumerate that the capacity for a memory-side cache
> > extends an SRAT range. Typically the "Memory Side Cache Size" enumerated
> > in the HMAT is "excluded" from the SRAT range length because it is a
> > transparent cache of the SRAT capacity. The enumeration of this
> > addressing mode enables OSPM-memory-RAS (Reliability, Availability, and
> > Serviceability) flows.
> > 
> > # Benefits of the Change
> > Without this change an OSPM that encounters a memory-side cache
> > configuration of DDR fronting CXL may not understand that an SRAT range
> > extended by cache capacity should be maintained as one contiguous SPA
> > range even though the CXL HPA decode configuration only maps a subset of
> > the SRAT SPA range. In other words the memory-side-cache dynamically
> > maps access to that SPA range to either a CXL or DDR HPA.
> > 
> > When the OSPM knows about this relationship it can take actions like
> > quarantine / offline all the impacted aliased pages to prevent further
> > consumption of poison, or run repair operations on all the affected
> > targets. Without this change an OSPM may not accurately identify the HPA
> > associated with a given CXL endpoint DPA event, or it may misunderstand
> > the SPAs that map to CXL HPAs.  
> 
> I'd like something here on impacts on firmware first error reporting.
> Given we'd like that to work on a non CXL aware system not aware of this
> feature at all, I'd propose multiple CPER records, one for each alias.
> That assumes the firmware has no path to establish the alias.
> 
> Can certainly conceive of ways to implement a probe-type setup to allow
> the discovery of which alias has been poisoned etc.
> 
> Perhaps needs a note somewhere in 18.3.  Something along lines of
> "For any error with SPA originating in a range, where a memory-side cache
>  with address mode extended-linear is present, multiple error records
>  should be presented to cover any potentially affected aliases."
> 
> Maybe an OS could opt out of that multiple reporting via _OSC or similar
> but I'm not sure why it would bother though. Easier to just allow for
> multiple events.
> 
> > 
> > # Impact of the Change
> > The proposed "Address Mode" field consumes the 2 Reserved bytes
> > following the "Cache Attributes" field in the "Memory Side Cache
> > Information Structure". The default reserved value of 0 indicates the
> > status quo of an undeclared addressing mode where the expectation is
> > that it is safe to assume the cache-capacity is transparent to the SRAT
> > range capacity. An OSPM that knows about new values can consider SPA to
> > HPA relationships according to the address-layout definition proposed
> > below. A legacy OSPM will ignore it as a Reserved field.
> > 
> > # References
> > * Compute Express Link Specification v3.1,
> > <https://www.computeexpresslink.org/>
> > 
> > # Detailed Description of the Change  
> 
> Probably need to up rev HMAT as well.
> 
> > 
> > * Section Table 5.149: Memory Side Cache Information Structure redefine
> >   the 2 Reserved bytes starting at offset 28 as "Address Mode":
> > 
> >     * 0 - Reserved (OSPM may assume transparent cache addressing)  
> 
> Can we make that assumption?  What are today's firmware's doing for this?
> I'd drop the 'may assume'  Also after this change it's not reserved.
> 0 explicitly means transparent cache addressing.
> 
> >     * 1 - Extended-linear (N direct-map aliases linearly mapped)
> >     * 2..65535 - Reserved (Unknown Address Mode)
> > 
> > * Extend the implementation note after Table 5.149 to explain how to
> >   interpret the "Extended-linear" mode.
> > 
> >   * When Address Mode is 1 'Extended-Linear' it indicates that the
> >     associated address range (SRAT.MemoryAffinityStructure.Length) is
> >     comprised of the backing store capacity extended by the cache
> >     capacity. It is arranged such that there are N directly addressable
> >     aliases of a given cacheline where N is the ratio of target memory
> >     proximity domain size and the memory side cache size. Where the N
> >     aliased addresses for a given cacheline all share the same result
> >     for the operation 'address modulo cache size'.  
> 
> Probably need more here.  What if someone has two such ranges of size
> 
> Address 0, (512G + 64G) , (1024G + 128G)
> And decides to pack them for some reason.
> The second one will be aligned to 64G not, 128G so modulo needs to take
> into account the base address.

Ignore this one.  The maths works fine as is.
More coffee needed.


> 
> Do we need explicit statement that N is an integer? Probably works anyway
> but having 2.5 aliases is an unusual concept.
> 
> > This setting is only
> >     allowed when 'Cache Associativity' is 'Direct Map'."  
> 
> Other than these corner cases looks good to me and the new terminology and
> clarifications help a lot.
> 
> Thanks,
> 
> Jonathan
> 
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches
  2024-06-05  9:10 ` Jonathan Cameron
  2024-06-05 10:51   ` Jonathan Cameron
@ 2024-06-17 23:24   ` Dan Williams
  2024-06-20 17:37     ` Jonathan Cameron
  1 sibling, 1 reply; 7+ messages in thread
From: Dan Williams @ 2024-06-17 23:24 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

Jonathan Cameron wrote:
> On Fri, 24 May 2024 12:05:28 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > # Title: "Extended-linear" addressing for direct-mapped memory-side caches
> > 
> > # Status: v3
> > 
> > # Document: ACPI Specification 6.6
> > 
> > # License
> > SPDX-License Identifier: CC-BY-4.0
> > 
> > # Submitter:
> > * Sponsor: Dan Williams, Intel
> > * Creators/Contributors:
> >     * Andy Rudoff, retired
> >     * Mahesh Natu, Intel
> >     * Ishwar Agarwal, Intel
> > 
> > # Changelog
> > * v3: Replace "Inclusive Linear" with "Extended-linear" term, and
> >   clarify the SPA vs HPA behavior of this cache addressing mode.
> >   (Jonathan Cameron)
> > * v2: Clarify the "Inclusive" term as "including the capacity of the cache
> >   in the SRAT range length"
> > * v2: Clarify that 0 is an undeclared / transparent Address Mode, and
> >   that Address Mode values other than 1 are Reserved.
> > 
> > # Summary of the Change
> > Recall that one of the modes available with persistent memory (PMEM) was a
> > direct-mapped memory-side cache where DDR-memory transparently cached
> > PMEM. This article has more details:
> > 
> > https://thessdguy.com/intels-optane-two-confusing-modes-part-2-memory-mode/
> > 
> > ...but the main takeaway of that article that is relevant for this ECN
> > is:
> > 
> >     "[PMEM] is paired with a DRAM that behaves as a cache, and,
> >      like a cache, it is invisible to the user. [..] A typical system
> >      might combine a 64GB DRAM DIMM with a 512GB Optane DIMM, but the
> >      total memory size will appear to the software as only 512GB."
> > 
> > Instead, this new "extended-linear" direct-mapped memory-side cache
> > addressing mode would make the memory-size that appears to software in
> > the above example as 576GB. The inclusion of the DDR capacity to extend
> > the capacity visible to software may improve cache utilization.
> 
> I'd skip the cache utilization point as even with 'may' it might just
> end up rat holing!. Capacity seems enough a justification to me and
> requires a lot less justification.

Sure.

> Perhaps something like
> "The inclusion of the DDR increases the available capacity whilst still
>  providing benefits of a lower latency cache."

Per above will just keep it dryly associated with capacity and not make
any performance claims.

> Up to you though as I'll not have to explain that utilization point
> to anyone whereas you might.
> 
> > 
> > A primary motiviation for updating HMAT to explicitly enumerate this
> > addressing mode is due to the OSPM's increased role for RAS and
> > address-translation with CXL topologies. With CXL and OS native RAS
> > flows OSPM is responsible for understanding and navigating the
> > relationship between System-Physical-Address (SPA) ranges published
> > ACPI.SRAT.MemoryAffinity, Host-Physical-Address ranges (HPA) published
> > in the ACPI.CEDT.CFMWS, and HPAs programmed in CXL memory expander
> > endpoints.
> > 
> > Enable an OSPM to enumerate that the capacity for a memory-side cache
> > extends an SRAT range. Typically the "Memory Side Cache Size" enumerated
> > in the HMAT is "excluded" from the SRAT range length because it is a
> > transparent cache of the SRAT capacity. The enumeration of this
> > addressing mode enables OSPM-memory-RAS (Reliability, Availability, and
> > Serviceability) flows.
> > 
> > # Benefits of the Change
> > Without this change an OSPM that encounters a memory-side cache
> > configuration of DDR fronting CXL may not understand that an SRAT range
> > extended by cache capacity should be maintained as one contiguous SPA
> > range even though the CXL HPA decode configuration only maps a subset of
> > the SRAT SPA range. In other words the memory-side-cache dynamically
> > maps access to that SPA range to either a CXL or DDR HPA.
> > 
> > When the OSPM knows about this relationship it can take actions like
> > quarantine / offline all the impacted aliased pages to prevent further
> > consumption of poison, or run repair operations on all the affected
> > targets. Without this change an OSPM may not accurately identify the HPA
> > associated with a given CXL endpoint DPA event, or it may misunderstand
> > the SPAs that map to CXL HPAs.
> 
> I'd like something here on impacts on firmware first error reporting.
> Given we'd like that to work on a non CXL aware system not aware of this
> feature at all, I'd propose multiple CPER records, one for each alias.
> That assumes the firmware has no path to establish the alias.
> 
> Can certainly conceive of ways to implement a probe-type setup to allow
> the discovery of which alias has been poisoned etc.
> 
> Perhaps needs a note somewhere in 18.3.  Something along lines of
> "For any error with SPA originating in a range, where a memory-side cache
>  with address mode extended-linear is present, multiple error records
>  should be presented to cover any potentially affected aliases."
> 
> Maybe an OS could opt out of that multiple reporting via _OSC or similar
> but I'm not sure why it would bother though. Easier to just allow for
> multiple events.

Makes sense to add a note about the "multiple CPER record" expectation.
Effectively this ECN is about allowing native-error-handling to do the
same.


> > # Impact of the Change
> > The proposed "Address Mode" field consumes the 2 Reserved bytes
> > following the "Cache Attributes" field in the "Memory Side Cache
> > Information Structure". The default reserved value of 0 indicates the
> > status quo of an undeclared addressing mode where the expectation is
> > that it is safe to assume the cache-capacity is transparent to the SRAT
> > range capacity. An OSPM that knows about new values can consider SPA to
> > HPA relationships according to the address-layout definition proposed
> > below. A legacy OSPM will ignore it as a Reserved field.
> > 
> > # References
> > * Compute Express Link Specification v3.1,
> > <https://www.computeexpresslink.org/>
> > 
> > # Detailed Description of the Change
> 
> Probably need to up rev HMAT as well.

I'd let the ACPI working group make that determination. I am not clear
on whether repurposing a reserved field mandates a version bump.

> > 
> > * Section Table 5.149: Memory Side Cache Information Structure redefine
> >   the 2 Reserved bytes starting at offset 28 as "Address Mode":
> > 
> >     * 0 - Reserved (OSPM may assume transparent cache addressing)
> 
> Can we make that assumption?  What are today's firmware's doing for this?

The only shipping example I know of was for PMEM.

> I'd drop the 'may assume'  Also after this change it's not reserved.
> 0 explicitly means transparent cache addressing.

I am just going to switch the parenthetical to "(Unknown Address Mode)"
because "transparent" does not give any actionable information about
alias layout in the SRAT address space. So system-software can make no
assumptions about layout without consulting implementation specific
documentation.

> >     * 1 - Extended-linear (N direct-map aliases linearly mapped)
> >     * 2..65535 - Reserved (Unknown Address Mode)
> > 
> > * Extend the implementation note after Table 5.149 to explain how to
> >   interpret the "Extended-linear" mode.
> > 
> >   * When Address Mode is 1 'Extended-Linear' it indicates that the
> >     associated address range (SRAT.MemoryAffinityStructure.Length) is
> >     comprised of the backing store capacity extended by the cache
> >     capacity. It is arranged such that there are N directly addressable
> >     aliases of a given cacheline where N is the ratio of target memory
> >     proximity domain size and the memory side cache size. Where the N
> >     aliased addresses for a given cacheline all share the same result
> >     for the operation 'address modulo cache size'.
> 
> Probably need more here.  What if someone has two such ranges of size
> 
> Address 0, (512G + 64G) , (1024G + 128G)
> And decides to pack them for some reason.
> The second one will be aligned to 64G not, 128G so modulo needs to take
> into account the base address.

Decides to pack them how? My expectation in this situation is 2
proximity domains / memory-side cache descriptions.

> Do we need explicit statement that N is an integer? Probably works anyway
> but having 2.5 aliases is an unusual concept.

Easy enough to add "(integer)" after the first reference of "N".

> > This setting is only
> >     allowed when 'Cache Associativity' is 'Direct Map'."
> 
> Other than these corner cases looks good to me and the new terminology and
> clarifications help a lot.

Thanks for the feedback.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches
  2024-06-17 23:24   ` Dan Williams
@ 2024-06-20 17:37     ` Jonathan Cameron
  2024-06-20 18:51       ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Cameron @ 2024-06-20 17:37 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael


> 
> 
> > > # Impact of the Change
> > > The proposed "Address Mode" field consumes the 2 Reserved bytes
> > > following the "Cache Attributes" field in the "Memory Side Cache
> > > Information Structure". The default reserved value of 0 indicates the
> > > status quo of an undeclared addressing mode where the expectation is
> > > that it is safe to assume the cache-capacity is transparent to the SRAT
> > > range capacity. An OSPM that knows about new values can consider SPA to
> > > HPA relationships according to the address-layout definition proposed
> > > below. A legacy OSPM will ignore it as a Reserved field.
> > > 
> > > # References
> > > * Compute Express Link Specification v3.1,
> > > <https://www.computeexpresslink.org/>
> > > 
> > > # Detailed Description of the Change  
> > 
> > Probably need to up rev HMAT as well.  
> 
> I'd let the ACPI working group make that determination. I am not clear
> on whether repurposing a reserved field mandates a version bump.

Normally does, but sure ASWG can figure it out.

> 
> > > 
> > > * Section Table 5.149: Memory Side Cache Information Structure redefine
> > >   the 2 Reserved bytes starting at offset 28 as "Address Mode":
> > > 
> > >     * 0 - Reserved (OSPM may assume transparent cache addressing)  
> > 
> > Can we make that assumption?  What are today's firmware's doing for this?  
> 
> The only shipping example I know of was for PMEM.

I gather that there are others, though that is the most common one.

> 
> > I'd drop the 'may assume'  Also after this change it's not reserved.
> > 0 explicitly means transparent cache addressing.  
> 
> I am just going to switch the parenthetical to "(Unknown Address Mode)"
> because "transparent" does not give any actionable information about
> alias layout in the SRAT address space. So system-software can make no
> assumptions about layout without consulting implementation specific
> documentation.

I'd like an option to indicate that we know reported errors will not
involve problems with aliases. Something like...

0 - Unknown (all bets are off, read the manual).
1 - No aliases.
2 - your one.

A simple write-through or write-back cache would not result in aliases
for errors reported by the backing memory.

Assuming we don't get an address corruption (in which case everything
dead anyway as uncontainable error), then poison can come from:
1) poison happens in the memory itself (fine, the DPA in CXL is enough)
2) poison happens in cache and is written back to memory. (fine
   the DPA in CXL is enough).
3) poison happens in cache and is read by host. Synchronous handling and
   the HPA is available and enough.

Not much we can do with 0, but 1 at least lets us know we have the
single right answer.

> 
> > >     * 1 - Extended-linear (N direct-map aliases linearly mapped)
> > >     * 2..65535 - Reserved (Unknown Address Mode)
> > > 
> > > * Extend the implementation note after Table 5.149 to explain how to
> > >   interpret the "Extended-linear" mode.
> > > 
> > >   * When Address Mode is 1 'Extended-Linear' it indicates that the
> > >     associated address range (SRAT.MemoryAffinityStructure.Length) is
> > >     comprised of the backing store capacity extended by the cache
> > >     capacity. It is arranged such that there are N directly addressable
> > >     aliases of a given cacheline where N is the ratio of target memory
> > >     proximity domain size and the memory side cache size. Where the N
> > >     aliased addresses for a given cacheline all share the same result
> > >     for the operation 'address modulo cache size'.  
> > 
> > Probably need more here.  What if someone has two such ranges of size
> > 
> > Address 0, (512G + 64G) , (1024G + 128G)
> > And decides to pack them for some reason.
> > The second one will be aligned to 64G not, 128G so modulo needs to take
> > into account the base address.  
> 
> Decides to pack them how? My expectation in this situation is 2
> proximity domains / memory-side cache descriptions.
> 

I was wrongly thinking the modulo maths failed if not aligned to the
cache size. Ignore this bit.

> > Do we need explicit statement that N is an integer? Probably works anyway
> > but having 2.5 aliases is an unusual concept.  
> 
> Easy enough to add "(integer)" after the first reference of "N".
> 
> > > This setting is only
> > >     allowed when 'Cache Associativity' is 'Direct Map'."  
> > 
> > Other than these corner cases looks good to me and the new terminology and
> > clarifications help a lot.  
> 
> Thanks for the feedback.

No problem.

J
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches
  2024-06-20 17:37     ` Jonathan Cameron
@ 2024-06-20 18:51       ` Dan Williams
  2024-06-26  8:38         ` Jonathan Cameron
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2024-06-20 18:51 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

Jonathan Cameron wrote:
[..]
> > > I'd drop the 'may assume'  Also after this change it's not reserved.
> > > 0 explicitly means transparent cache addressing.  
> > 
> > I am just going to switch the parenthetical to "(Unknown Address Mode)"
> > because "transparent" does not give any actionable information about
> > alias layout in the SRAT address space. So system-software can make no
> > assumptions about layout without consulting implementation specific
> > documentation.
> 
> I'd like an option to indicate that we know reported errors will not
> involve problems with aliases. Something like...
> 
> 0 - Unknown (all bets are off, read the manual).
> 1 - No aliases.
> 2 - your one.
> 
> A simple write-through or write-back cache would not result in aliases
> for errors reported by the backing memory.

This seems a separate proposal, and needs more discussion because there
*are* aliases. While there is no HPA aliasing, there is a FRU
(field-replaceable-unit) aliasing. So if system-software wants to
determine what indicators to fire (i.e. replace cache-mem, replace
backing-mem, or both) to the tech servicing the node it needs some ACPI
help.

I would be ok to do:

 0 - Unknown (all bets are off, read the manual).
 1 - Reserved
 2 - Extended linear

...just to try to keep the list ordered by complexity for now.

However, I am also worried about the case where folks want to do "noisy
neighbor mitigation", which is something that has been attempted with
PMEM caches. This involves knowing the layout of cache conflicts which
need not be linear and involves reading the manual. So, I am not sure
defining a "no aliases" indicator now improves the Extended Linear
proposal, or is an improvement upon "read the manual".

> Assuming we don't get an address corruption (in which case everything
> dead anyway as uncontainable error), then poison can come from:
> 1) poison happens in the memory itself (fine, the DPA in CXL is enough)
> 2) poison happens in cache and is written back to memory. (fine
>    the DPA in CXL is enough).
> 3) poison happens in cache and is read by host. Synchronous handling and
>    the HPA is available and enough.
> 
> Not much we can do with 0, but 1 at least lets us know we have the
> single right answer.

That is, assuming that this is caching CXL. With CXL, the DPA
information is available to disambiguate the source of the poison, but
for memory-side-caches that are not backed by CXL, what does
system-software do with that "1" case?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches
  2024-06-20 18:51       ` Dan Williams
@ 2024-06-26  8:38         ` Jonathan Cameron
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Cameron @ 2024-06-26  8:38 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

On Thu, 20 Jun 2024 11:51:09 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> [..]
> > > > I'd drop the 'may assume'  Also after this change it's not reserved.
> > > > 0 explicitly means transparent cache addressing.    
> > > 
> > > I am just going to switch the parenthetical to "(Unknown Address Mode)"
> > > because "transparent" does not give any actionable information about
> > > alias layout in the SRAT address space. So system-software can make no
> > > assumptions about layout without consulting implementation specific
> > > documentation.  
> > 
> > I'd like an option to indicate that we know reported errors will not
> > involve problems with aliases. Something like...
> > 
> > 0 - Unknown (all bets are off, read the manual).
> > 1 - No aliases.
> > 2 - your one.
> > 
> > A simple write-through or write-back cache would not result in aliases
> > for errors reported by the backing memory.  
> 
> This seems a separate proposal, and needs more discussion because there
> *are* aliases. While there is no HPA aliasing, there is a FRU
> (field-replaceable-unit) aliasing. So if system-software wants to
> determine what indicators to fire (i.e. replace cache-mem, replace
> backing-mem, or both) to the tech servicing the node it needs some ACPI
> help.

There is  a case for FW first CPER etc (or a side cache specific driver,
ideally binding to a suitable ACPIXXXX but that's a different ECN :))
having to identify errors coming from a memory-side cache but I don't
see it as an issue that sits in this place in the spec (or even this spec).

For the CXL case, the event record tells you enough info on where poison
originated to rule out or in the CXL device as the problem.  There is
a gap I think in errors for memory-side cache and agreed that's a
different ECN.

> 
> I would be ok to do:
> 
>  0 - Unknown (all bets are off, read the manual).
>  1 - Reserved
>  2 - Extended linear
> 
> ...just to try to keep the list ordered by complexity for now.
> 
> However, I am also worried about the case where folks want to do "noisy
> neighbor mitigation", which is something that has been attempted with
> PMEM caches. This involves knowing the layout of cache conflicts which
> need not be linear and involves reading the manual. So, I am not sure
> defining a "no aliases" indicator now improves the Extended Linear
> proposal, or is an improvement upon "read the manual".


It tells you if you are trying to do poison repair you only need to write
one 'cacheline etc' from the host, not several.   I wouldn't attempt
to take it any further than that due the sort of trickery you mention.


> 
> > Assuming we don't get an address corruption (in which case everything
> > dead anyway as uncontainable error), then poison can come from:
> > 1) poison happens in the memory itself (fine, the DPA in CXL is enough)
> > 2) poison happens in cache and is written back to memory. (fine
> >    the DPA in CXL is enough).
> > 3) poison happens in cache and is read by host. Synchronous handling and
> >    the HPA is available and enough.
> > 
> > Not much we can do with 0, but 1 at least lets us know we have the
> > single right answer.  
> 
> That is, assuming that this is caching CXL. With CXL, the DPA
> information is available to disambiguate the source of the poison, but
> for memory-side-caches that are not backed by CXL, what does
> system-software do with that "1" case?

If it got an HPA it does an arch specific poison clear on the HPA address
or isolates the page with that single address. If it didn't you have
no useful info - wait for synchronous poison.

Jonathan



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-06-26  8:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-24 19:05 [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches Dan Williams
2024-06-05  9:10 ` Jonathan Cameron
2024-06-05 10:51   ` Jonathan Cameron
2024-06-17 23:24   ` Dan Williams
2024-06-20 17:37     ` Jonathan Cameron
2024-06-20 18:51       ` Dan Williams
2024-06-26  8:38         ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox