[ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches

public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed

* [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches
@ 2024-05-10 23:00 Dan Williams
  2024-05-17 16:45 ` Jonathan Cameron
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Williams @ 2024-05-10 23:00 UTC (permalink / raw)
  To: linux-cxl, linux-acpi; +Cc: mahesh.natu, rafael

# Title: Enumerate "Inclusive Linear Address Mode" memory-side caches

# Status: Draft v2

# Document: ACPI Specification 6.6

# License
SPDX-License Identifier: CC-BY-4.0

# Submitter:
* Sponsor: Dan Williams, Intel
* Creators/Contributors:
    * Andy Rudoff, retired
    * Mahesh Natu, Intel
    * Ishwar Agarwal, Intel

# Changelog
* v2: Clarify the "Inclusive" term as "including the capacity of the cache
  in the SRAT range length"
* v2: Clarify that 0 is an undeclared / transparent Address Mode, and
  that Address Mode values other than 1 are Reserved.

# Summary of the Change
Enable an OSPM to enumerate that the capacity for a memory-side cache is
"included" in an SRAT range. Typically the "Memory Side Cache Size"
enumerated in the HMAT is "excluded" from the SRAT range length because
it is a transparent cache of the SRAT capacity. The enumeration of this
addressing mode enables OSPM memory RAS (Reliability, Availability, and
Serviceability) flows.

Recall that the CXL specification allows for platform address ranges to
be interleaved across CXL and non-CXL targets. CXL 3.1 Table 9-22 CFMWS
Structure states "If the Interleave Set spans non-CXL domains, this list
may contain values that do not match \_UID field in any CHBS structures.
These entries represent Interleave Targets that are not CXL Host
Bridges". For an OSPM this means address translation needs to be
prepared for non-CXL targets. Now consider the case when that CXL
address range is flagged as a memory side cache in the ACPI HMAT.
Address translation needs to consider that the decode for an error may
impact multiple components (FRUs fields replaceable units).

Now consider the implications of ["Flat Memory Mode" (Intel presentation
at Hot Chips
2023)](https://cdrdv2-public.intel.com/787386/Hot%20Chips%20-%20Aug%2023%20-%20BHS%20and%20Granite%20Rapid%20-%20Xeon%20-%20Architecture%20-%20Public.pdf).
This cache geometry implies an address space that includes the
memory-side cache size in the reported address range. For example, a
typical address space layout for a memory-side-cache of 32GB of DDR
fronting 64GB of CXL would report 64GB in the "Length" field of the
SRAT's "Memory Affinity Structure" and 32GB in the "Memory Side Cache
Size" field of the HMAT's "Memory Side Cache Information Structure". An
inclusive address-space layout of the same configuration would report
96GB in the "Length" field of the SRAT's "Memory Affinity Structure" and
32GB in the "Memory Side Cache Size" field of the HMAT's "Memory Side
Cache Information Structure". The implication for address translation in
the inclusive case, is that there are N potential aliased address
impacted by a memory error where N is the ratio of:

SRAT.MemoryAffinityStructure.Length /
HMAT.MemorySideCacheInformation.CacheSize

This change request is not exclusive to CXL, the concept is applicable
to any memory-side-cache configuration that the HMAT+SRAT can describe.
However, CXL is a primary motivator given the OSPM role in address
translation for device-physical-address (DPA) events being translated to
impacted host-physical-address (HPA) events.

# Benefits of the Change
An OSPM, when it knows about inclusive cache address space, can take
actions like quarantine / offline all the impacted aliased pages to
prevent further consumption of poison, or run repair operations on all
the affected targets. Without this change an OSPM may not accurately
identify the HPA associated with a given CXL FRU event, or it may
misunderstand that an SRAT memory affinity range is an amalgam of CXL
and cache capacity.

# Impact of the Change
The proposed "Address Mode" field to convey this configuration consumes
the 2 Reserved bytes following the "Cache Attributes" field in the
"Memory Side Cache Information Structure". The default reserved value of
0 indicates the status quo of an undeclared addressing mode where the
expectation is that it is safe to assume a transparent cache where the
cache-capacity is not included in the SRAT range capacity. An OSPM that
knows about new values can perform address decode according to the
proposed details below and a legacy OSPM will ignore it as a Reserved
field.

# References
* Compute Express Link Specification v3.1,
<https://www.computeexpresslink.org/>

# Detailed Description of the Change

* Section Table 5.149: Memory Side Cache Information Structure redefine
  the 2 Reserved bytes starting at offset 28 as "Address Mode":

    * 0 - Reserved (OSPM may assume transparent cache addressing)
    * 1 - Inclusive linear (N aliases linearly mapped)
    * 2..65535 - Reserved (Unknown Address Mode)

* Extend the implementation note after Table 5.149 to explain how to
  interpret the "Inclusive linear" mode.

    * "When Address Mode is 1 'Inclusive Linear' it indicates that there
      are N directly addressable aliases of a given cacheline
      where N is the ratio of target memory proximity domain size and
      the memory side cache size.  Where the N aliased addresses for a
      given cacheline all share the same result for the operation
      'address modulo cache size'."

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches
  2024-05-10 23:00 [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches Dan Williams
@ 2024-05-17 16:45 ` Jonathan Cameron
  2024-05-17 20:20   ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Cameron @ 2024-05-17 16:45 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

On Fri, 10 May 2024 16:00:24 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> # Title: Enumerate "Inclusive Linear Address Mode" memory-side caches

So pretty much all my feedback is about using inclusive in anything
to do with caches.  Term usually means something very different from
what you have here and it confused me. As an example, consider a dataset
that fits entirely in CPU  L3 / L2 capacity.
In that case the situation you describe here looks like an exclusive L2 / L3
cache (line sits in one or the other but not both).

Maybe just describe the problem and skip the exact cause?

Enumerate "Unrecoverable aliases in direct mapped memory-side caches"

Whilst the CXL side of things (and I assume your hardware migration engine)
don't provide a way to recover this, it would be possible to build
a system that otherwise looked like you describe that did provide access
to the tag bits and so wouldn't present the aliasing problem.

> 
> # Status: Draft v2
> 
> # Document: ACPI Specification 6.6
> 
> # License
> SPDX-License Identifier: CC-BY-4.0
> 
> # Submitter:
> * Sponsor: Dan Williams, Intel
> * Creators/Contributors:
>     * Andy Rudoff, retired
>     * Mahesh Natu, Intel
>     * Ishwar Agarwal, Intel
> 
> # Changelog
> * v2: Clarify the "Inclusive" term as "including the capacity of the cache
>   in the SRAT range length"
> * v2: Clarify that 0 is an undeclared / transparent Address Mode, and
>   that Address Mode values other than 1 are Reserved.
> 
> # Summary of the Change
> Enable an OSPM to enumerate that the capacity for a memory-side cache is
> "included" in an SRAT range. Typically the "Memory Side Cache Size"
> enumerated in the HMAT is "excluded" from the SRAT range length because
> it is a transparent cache of the SRAT capacity. The enumeration of this
> addressing mode enables OSPM memory RAS (Reliability, Availability, and
> Serviceability) flows.

'excluded' somehow implies it exists as something we might include but
we don't.  'Not relevant' would be clearer wording I think.

> 
> Recall that the CXL specification allows for platform address ranges to
> be interleaved across CXL and non-CXL targets. CXL 3.1 Table 9-22 CFMWS
> Structure states "If the Interleave Set spans non-CXL domains, this list
> may contain values that do not match \_UID field in any CHBS structures.
> These entries represent Interleave Targets that are not CXL Host
> Bridges". For an OSPM this means address translation needs to be
> prepared for non-CXL targets. Now consider the case when that CXL
> address range is flagged as a memory side cache in the ACPI HMAT.

A CXL address range can be flagged as having a memory-side cache in
front of it bus as you've state normally wouldn't have separate HPA
ranges. The interleave stuff doesn't get you to what you describe
here as it's well defined, not a transparent cache like a
memory-side cache.  A given cacheline is in a known FRU, not potentially
multiple ones. Hence I'm not sure this paragraph is particularly useful.

> Address translation needs to consider that the decode for an error may
> impact multiple components (FRUs fields replaceable units).
> 
> Now consider the implications of ["Flat Memory Mode" (Intel presentation
> at Hot Chips
> 2023)](https://cdrdv2-public.intel.com/787386/Hot%20Chips%20-%20Aug%2023%20-%20BHS%20and%20Granite%20Rapid%20-%20Xeon%20-%20Architecture%20-%20Public.pdf).

Other than telling us someone put it on a slide, that slide provides
very little useful info!

> This cache geometry implies an address space that includes the
> memory-side cache size in the reported address range. For example, a
> typical address space layout for a memory-side-cache of 32GB of DDR
> fronting 64GB of CXL would report 64GB in the "Length" field of the
> SRAT's "Memory Affinity Structure" and 32GB in the "Memory Side Cache
> Size" field of the HMAT's "Memory Side Cache Information Structure".

> An
> inclusive address-space layout of the same configuration would report
> 96GB in the "Length" field of the SRAT's "Memory Affinity Structure" and
> 32GB in the "Memory Side Cache Size" field of the HMAT's "Memory Side
> Cache Information Structure". The implication for address translation in
> the inclusive case, is that there are N potential aliased address
> impacted by a memory error where N is the ratio of:
> 
> SRAT.MemoryAffinityStructure.Length /
> HMAT.MemorySideCacheInformation.CacheSize

So in your example a memory error can affect any of 3 addresses.

That feels like it is assuming a particular caching strategy without
expressly stating it. Let us take it to extreme.  Make it a fully
associative non-inclusive DDR cache (sure that is insane, but bare
with me). Now any potential problem affects all addresses as a given error
in the memory-side cache might affect anything - given it's fully associative
it's also possible an error in the CXL memory might also be any cacheline
in the system.

The memory-side cache description does include the option of specifying
the cache is direct mapped so if that is set your assumed mapping is valid.
If someone set the 'complex cache indexing' option then I think all bets
are off. To be useful you should rule that out in your spec change.

> 
> This change request is not exclusive to CXL, the concept is applicable
> to any memory-side-cache configuration that the HMAT+SRAT can describe.
> However, CXL is a primary motivator given the OSPM role in address
> translation for device-physical-address (DPA) events being translated to
> impacted host-physical-address (HPA) events.
> 
> # Benefits of the Change
> An OSPM, when it knows about inclusive cache address space, can take
> actions like quarantine / offline all the impacted aliased pages to
> prevent further consumption of poison, or run repair operations on all
> the affected targets. Without this change an OSPM may not accurately
> identify the HPA associated with a given CXL FRU event, or it may
> misunderstand that an SRAT memory affinity range is an amalgam of CXL
> and cache capacity.

Could you add a cache attribute to say it's a non-inclusive / exclusive
cache? That combined with direct-mapped would I think provide the relevant
indication.  It still runs into the problem that advanced hardware
could still resolve which alias is the problem. So maybe we are better
off sticking to describing that fact there is an alias issue for any
reported errors that cannot be resolved (presumably you can poke the
the aliased entries and see which one gives poison via synchronous access)

Note that I'm not keen on the use of inclusive for your range description
because that terminology means the exact opposite of what you intend
when applied to a normal cache! I can't think of a better term though
but the bikeshed should not be blue.

> 
> # Impact of the Change
> The proposed "Address Mode" field to convey this configuration consumes
> the 2 Reserved bytes following the "Cache Attributes" field in the
> "Memory Side Cache Information Structure". The default reserved value of
> 0 indicates the status quo of an undeclared addressing mode where the
> expectation is that it is safe to assume a transparent cache where the
> cache-capacity is not included in the SRAT range capacity. An OSPM that
> knows about new values can perform address decode according to the
> proposed details below and a legacy OSPM will ignore it as a Reserved
> field.
> 
> # References
> * Compute Express Link Specification v3.1,
> <https://www.computeexpresslink.org/>
> 
> # Detailed Description of the Change
> 
> * Section Table 5.149: Memory Side Cache Information Structure redefine
>   the 2 Reserved bytes starting at offset 28 as "Address Mode":
> 
>     * 0 - Reserved (OSPM may assume transparent cache addressing)
>     * 1 - Inclusive linear (N aliases linearly mapped)
>     * 2..65535 - Reserved (Unknown Address Mode)

As with the fields in the earlier Cache Attributes perhaps better to just
give a few bits and reserve the rest for other uses.

> 
> * Extend the implementation note after Table 5.149 to explain how to
>   interpret the "Inclusive linear" mode.
> 
>     * "When Address Mode is 1 'Inclusive Linear' it indicates that there
>       are N directly addressable aliases of a given cacheline
>       where N is the ratio of target memory proximity domain size and
>       the memory side cache size.  Where the N aliased addresses for a
>       given cacheline all share the same result for the operation
>       'address modulo cache size'."

That description is somewhat tighter than the free form one in the intro
so answered a lot of questions I had before getting this far.

Jonathan

> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches
  2024-05-17 16:45 ` Jonathan Cameron
@ 2024-05-17 20:20   ` Dan Williams
  2024-05-20 11:53     ` Jonathan Cameron
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Williams @ 2024-05-17 20:20 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

Jonathan Cameron wrote:
> On Fri, 10 May 2024 16:00:24 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > # Title: Enumerate "Inclusive Linear Address Mode" memory-side caches
> 
> So pretty much all my feedback is about using inclusive in anything
> to do with caches.  Term usually means something very different from
> what you have here and it confused me. As an example, consider a dataset
> that fits entirely in CPU  L3 / L2 capacity.
> In that case the situation you describe here looks like an exclusive L2 / L3
> cache (line sits in one or the other but not both).

I clarify in the text that this term is an attribute of the *address-space* not
the cache hierarchy. If HMAT ever needed to described that a multi-level
memory-side cache was Inclusive or Exclusive then I would likely steal bit3 of
Cache Attributes field to enumerate that detail, but it is not clear that detail
matters to any OS mechanism or policy.

> Maybe just describe the problem and skip the exact cause?
> 
> Enumerate "Unrecoverable aliases in direct mapped memory-side caches"

I have read that several times and I can not map that title back to the property
this "address mode" enumeration is trying to describe.

I would prefer to just pile on more explicit clarifications to overcome that
instinct to map the word "Inclusive" to a multi-level cache attribute. Something
like "note, 'Inclusive Linear' address-mode not to be confused with
'Inclusive/Exclusive' multi-level cache organization".

> Whilst the CXL side of things (and I assume your hardware migration engine)
> don't provide a way to recover this, it would be possible to build
> a system that otherwise looked like you describe that did provide access
> to the tag bits and so wouldn't present the aliasing problem.

Aliasing problem? All direct-mapped caches have aliases, it just happens that
this address mode allows direct-addressability of at least one alias.

> > 
> > # Status: Draft v2
> > 
> > # Document: ACPI Specification 6.6
> > 
> > # License
> > SPDX-License Identifier: CC-BY-4.0
> > 
> > # Submitter:
> > * Sponsor: Dan Williams, Intel
> > * Creators/Contributors:
> >     * Andy Rudoff, retired
> >     * Mahesh Natu, Intel
> >     * Ishwar Agarwal, Intel
> > 
> > # Changelog
> > * v2: Clarify the "Inclusive" term as "including the capacity of the cache
> >   in the SRAT range length"
> > * v2: Clarify that 0 is an undeclared / transparent Address Mode, and
> >   that Address Mode values other than 1 are Reserved.
> > 
> > # Summary of the Change
> > Enable an OSPM to enumerate that the capacity for a memory-side cache is
> > "included" in an SRAT range. Typically the "Memory Side Cache Size"
> > enumerated in the HMAT is "excluded" from the SRAT range length because
> > it is a transparent cache of the SRAT capacity. The enumeration of this
> > addressing mode enables OSPM memory RAS (Reliability, Availability, and
> > Serviceability) flows.
> 
> 'excluded' somehow implies it exists as something we might include but
> we don't.  'Not relevant' would be clearer wording I think.

But it is relevant. If the near memory (cache memory) is 64GB and the far memory
(backing store) is 64GB then the SRAT range is 64GB (cache-excluded). With this
new mode the SRAT range is 128GB.

> > Recall that the CXL specification allows for platform address ranges to
> > be interleaved across CXL and non-CXL targets. CXL 3.1 Table 9-22 CFMWS
> > Structure states "If the Interleave Set spans non-CXL domains, this list
> > may contain values that do not match \_UID field in any CHBS structures.
> > These entries represent Interleave Targets that are not CXL Host
> > Bridges". For an OSPM this means address translation needs to be
> > prepared for non-CXL targets. Now consider the case when that CXL
> > address range is flagged as a memory side cache in the ACPI HMAT.
> 
> A CXL address range can be flagged as having a memory-side cache in
> front of it bus as you've state normally wouldn't have separate HPA
> ranges. The interleave stuff doesn't get you to what you describe
> here as it's well defined, not a transparent cache like a
> memory-side cache.  A given cacheline is in a known FRU, not potentially
> multiple ones. Hence I'm not sure this paragraph is particularly useful.

It was an attempt to show precedent for why Linux needs to care about the memory
organization and how CFMWS does not achieve this description. That said, as this
is text that only appears in the justification for the ECN I do not mind
dropping it.

> > Address translation needs to consider that the decode for an error may
> > impact multiple components (FRUs fields replaceable units).
> > 
> > Now consider the implications of ["Flat Memory Mode" (Intel presentation
> > at Hot Chips
> > 2023)](https://cdrdv2-public.intel.com/787386/Hot%20Chips%20-%20Aug%2023%20-%20BHS%20and%20Granite%20Rapid%20-%20Xeon%20-%20Architecture%20-%20Public.pdf).
> 
> Other than telling us someone put it on a slide, that slide provides
> very little useful info!

Hence this write-up in the ECN, felt it was better than nothing to include a
picture for reference.

> > This cache geometry implies an address space that includes the
> > memory-side cache size in the reported address range. For example, a
> > typical address space layout for a memory-side-cache of 32GB of DDR
> > fronting 64GB of CXL would report 64GB in the "Length" field of the
> > SRAT's "Memory Affinity Structure" and 32GB in the "Memory Side Cache
> > Size" field of the HMAT's "Memory Side Cache Information Structure".
> 
> > An
> > inclusive address-space layout of the same configuration would report
> > 96GB in the "Length" field of the SRAT's "Memory Affinity Structure" and
> > 32GB in the "Memory Side Cache Size" field of the HMAT's "Memory Side
> > Cache Information Structure". The implication for address translation in
> > the inclusive case, is that there are N potential aliased address
> > impacted by a memory error where N is the ratio of:
> > 
> > SRAT.MemoryAffinityStructure.Length /
> > HMAT.MemorySideCacheInformation.CacheSize
> 
> So in your example a memory error can affect any of 3 addresses.
> 
> That feels like it is assuming a particular caching strategy without
> expressly stating it. Let us take it to extreme.  Make it a fully
> associative non-inclusive DDR cache (sure that is insane, but bare
> with me). Now any potential problem affects all addresses as a given error
> in the memory-side cache might affect anything - given it's fully associative
> it's also possible an error in the CXL memory might also be any cacheline
> in the system.
> 
> The memory-side cache description does include the option of specifying
> the cache is direct mapped so if that is set your assumed mapping is valid.
> If someone set the 'complex cache indexing' option then I think all bets
> are off. To be useful you should rule that out in your spec change.

Sure, "Linear" implies direct-mapped since fully-set associative is a
non-linear arrangement.

> > This change request is not exclusive to CXL, the concept is applicable
> > to any memory-side-cache configuration that the HMAT+SRAT can describe.
> > However, CXL is a primary motivator given the OSPM role in address
> > translation for device-physical-address (DPA) events being translated to
> > impacted host-physical-address (HPA) events.
> > 
> > # Benefits of the Change
> > An OSPM, when it knows about inclusive cache address space, can take
> > actions like quarantine / offline all the impacted aliased pages to
> > prevent further consumption of poison, or run repair operations on all
> > the affected targets. Without this change an OSPM may not accurately
> > identify the HPA associated with a given CXL FRU event, or it may
> > misunderstand that an SRAT memory affinity range is an amalgam of CXL
> > and cache capacity.
> 
> Could you add a cache attribute to say it's a non-inclusive / exclusive
> cache? That combined with direct-mapped would I think provide the relevant
> indication.  It still runs into the problem that advanced hardware
> could still resolve which alias is the problem. So maybe we are better
> off sticking to describing that fact there is an alias issue for any
> reported errors that cannot be resolved (presumably you can poke the
> the aliased entries and see which one gives poison via synchronous access)

I still disagree with the implication that "inclusion" is a property of the
cache and not the address layout for this ECN.

> Note that I'm not keen on the use of inclusive for your range description
> because that terminology means the exact opposite of what you intend
> when applied to a normal cache! I can't think of a better term though
> but the bikeshed should not be blue.

I am sticking with "include" since cache capacity is included in the SRAT
range, and will move off that term when/if someone comes up with something
better.

[..]
> > 
> > * Extend the implementation note after Table 5.149 to explain how to
> >   interpret the "Inclusive linear" mode.
> > 
> >     * "When Address Mode is 1 'Inclusive Linear' it indicates that there
> >       are N directly addressable aliases of a given cacheline
> >       where N is the ratio of target memory proximity domain size and
> >       the memory side cache size.  Where the N aliased addresses for a
> >       given cacheline all share the same result for the operation
> >       'address modulo cache size'."
> 
> That description is somewhat tighter than the free form one in the intro
> so answered a lot of questions I had before getting this far.

Happy to delete all of the text outside of "Detailed Description of the Change"
since none of it will be included in ACPI spec.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches
  2024-05-17 20:20   ` Dan Williams
@ 2024-05-20 11:53     ` Jonathan Cameron
  2024-05-21 15:54       ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Cameron @ 2024-05-20 11:53 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

On Fri, 17 May 2024 13:20:06 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Fri, 10 May 2024 16:00:24 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >   
> > > # Title: Enumerate "Inclusive Linear Address Mode" memory-side caches  
> > 
> > So pretty much all my feedback is about using inclusive in anything
> > to do with caches.  Term usually means something very different from
> > what you have here and it confused me. As an example, consider a dataset
> > that fits entirely in CPU  L3 / L2 capacity.
> > In that case the situation you describe here looks like an exclusive L2 / L3
> > cache (line sits in one or the other but not both).  
> 
> I clarify in the text that this term is an attribute of the *address-space* not
> the cache hierarchy. If HMAT ever needed to described that a multi-level
> memory-side cache was Inclusive or Exclusive then I would likely steal bit3 of
> Cache Attributes field to enumerate that detail, but it is not clear that detail
> matters to any OS mechanism or policy.
> 
> > Maybe just describe the problem and skip the exact cause?
> > 
> > Enumerate "Unrecoverable aliases in direct mapped memory-side caches"  
> 
> I have read that several times and I can not map that title back to the property
> this "address mode" enumeration is trying to describe.
> 
> I would prefer to just pile on more explicit clarifications to overcome that
> instinct to map the word "Inclusive" to a multi-level cache attribute. Something
> like "note, 'Inclusive Linear' address-mode not to be confused with
> 'Inclusive/Exclusive' multi-level cache organization".

Hmm. I'll go with 'maybe'. When you have to add a bunch of distinctions of
a term that is applying to a 'cache' to say that you don't mean the standard
meaning for caches, it feels like a new term is a better path. 

Possibly as I suggest later a hyphen might avoid misreads.
inclusive-linear address mode.

> 
> > Whilst the CXL side of things (and I assume your hardware migration engine)
> > don't provide a way to recover this, it would be possible to build
> > a system that otherwise looked like you describe that did provide access
> > to the tag bits and so wouldn't present the aliasing problem.  
> 
> Aliasing problem? All direct-mapped caches have aliases, it just happens that
> this address mode allows direct-addressability of at least one alias.

As I understand this the problem is you get address A in the error record,
but that actually means any of A, A + N, A + 2N etc and the issue is you
have no way of recovering which alias you have. 

Another implementation might have the same aliasing in the cache, but allow
for establishing which one you have (the hardware inherently has to know that
but I presume in this case doesn't provide a way to look it up - or if it
does, then issue here is that the OS querying of the CXL device doesn't know
about that interface?).  So I think the critical here is that information is
not available, not that aliasing occurs.

> 
> > > 
> > > # Status: Draft v2
> > > 
> > > # Document: ACPI Specification 6.6
> > > 
> > > # License
> > > SPDX-License Identifier: CC-BY-4.0
> > > 
> > > # Submitter:
> > > * Sponsor: Dan Williams, Intel
> > > * Creators/Contributors:
> > >     * Andy Rudoff, retired
> > >     * Mahesh Natu, Intel
> > >     * Ishwar Agarwal, Intel
> > > 
> > > # Changelog
> > > * v2: Clarify the "Inclusive" term as "including the capacity of the cache
> > >   in the SRAT range length"
> > > * v2: Clarify that 0 is an undeclared / transparent Address Mode, and
> > >   that Address Mode values other than 1 are Reserved.
> > > 
> > > # Summary of the Change
> > > Enable an OSPM to enumerate that the capacity for a memory-side cache is
> > > "included" in an SRAT range. Typically the "Memory Side Cache Size"
> > > enumerated in the HMAT is "excluded" from the SRAT range length because
> > > it is a transparent cache of the SRAT capacity. The enumeration of this
> > > addressing mode enables OSPM memory RAS (Reliability, Availability, and
> > > Serviceability) flows.  
> > 
> > 'excluded' somehow implies it exists as something we might include but
> > we don't.  'Not relevant' would be clearer wording I think.  
> 
> But it is relevant. If the near memory (cache memory) is 64GB and the far memory
> (backing store) is 64GB then the SRAT range is 64GB (cache-excluded). With this
> new mode the SRAT range is 128GB.

Sure, but this is a cache, not normal memory so in what people normally expect
from a memory-side cache (write through / write back etc) there is no
reason to ever include it in SRAT. Hence it's not 'excluded' it just has
nothing to do with SRAT which is about memory, not caches. I'm not
arguing it is irrelevant in your case (where it clearly is because it is part
of the memory), but it is irrelevant in for example a write-through cache.
Saying it was excluded is implying a lot more that 'not-included' would for
example.

> 
> > > Recall that the CXL specification allows for platform address ranges to
> > > be interleaved across CXL and non-CXL targets. CXL 3.1 Table 9-22 CFMWS
> > > Structure states "If the Interleave Set spans non-CXL domains, this list
> > > may contain values that do not match \_UID field in any CHBS structures.
> > > These entries represent Interleave Targets that are not CXL Host
> > > Bridges". For an OSPM this means address translation needs to be
> > > prepared for non-CXL targets. Now consider the case when that CXL
> > > address range is flagged as a memory side cache in the ACPI HMAT.  
> > 
> > A CXL address range can be flagged as having a memory-side cache in
> > front of it bus as you've state normally wouldn't have separate HPA
> > ranges. The interleave stuff doesn't get you to what you describe
> > here as it's well defined, not a transparent cache like a
> > memory-side cache.  A given cacheline is in a known FRU, not potentially
> > multiple ones. Hence I'm not sure this paragraph is particularly useful.  
> 
> It was an attempt to show precedent for why Linux needs to care about the memory
> organization and how CFMWS does not achieve this description. That said, as this
> is text that only appears in the justification for the ECN I do not mind
> dropping it.

I think it risks confusion given it's not directly relevant to this.

> 
> > > Address translation needs to consider that the decode for an error may
> > > impact multiple components (FRUs fields replaceable units).
> > > 
> > > Now consider the implications of ["Flat Memory Mode" (Intel presentation
> > > at Hot Chips
> > > 2023)](https://cdrdv2-public.intel.com/787386/Hot%20Chips%20-%20Aug%2023%20-%20BHS%20and%20Granite%20Rapid%20-%20Xeon%20-%20Architecture%20-%20Public.pdf).  
> > 
> > Other than telling us someone put it on a slide, that slide provides
> > very little useful info!  
> 
> Hence this write-up in the ECN, felt it was better than nothing to include a
> picture for reference.

I think that left me more confused than anything ;)

> 
> > > This cache geometry implies an address space that includes the
> > > memory-side cache size in the reported address range. For example, a
> > > typical address space layout for a memory-side-cache of 32GB of DDR
> > > fronting 64GB of CXL would report 64GB in the "Length" field of the
> > > SRAT's "Memory Affinity Structure" and 32GB in the "Memory Side Cache
> > > Size" field of the HMAT's "Memory Side Cache Information Structure".  
> >   
> > > An
> > > inclusive address-space layout of the same configuration would report
> > > 96GB in the "Length" field of the SRAT's "Memory Affinity Structure" and
> > > 32GB in the "Memory Side Cache Size" field of the HMAT's "Memory Side
> > > Cache Information Structure". The implication for address translation in
> > > the inclusive case, is that there are N potential aliased address
> > > impacted by a memory error where N is the ratio of:
> > > 
> > > SRAT.MemoryAffinityStructure.Length /
> > > HMAT.MemorySideCacheInformation.CacheSize  
> > 
> > So in your example a memory error can affect any of 3 addresses.
> > 
> > That feels like it is assuming a particular caching strategy without
> > expressly stating it. Let us take it to extreme.  Make it a fully
> > associative non-inclusive DDR cache (sure that is insane, but bare
> > with me). Now any potential problem affects all addresses as a given error
> > in the memory-side cache might affect anything - given it's fully associative
> > it's also possible an error in the CXL memory might also be any cacheline
> > in the system.
> > 
> > The memory-side cache description does include the option of specifying
> > the cache is direct mapped so if that is set your assumed mapping is valid.
> > If someone set the 'complex cache indexing' option then I think all bets
> > are off. To be useful you should rule that out in your spec change.  
> 
> Sure, "Linear" implies direct-mapped since fully-set associative is a
> non-linear arrangement.

Sure, but you haven't introduced linear yet so this reads as more general
than intended.  I'd call out explicitly that if your new mode is set then
direct-mapped must also be set.

> 
> > > This change request is not exclusive to CXL, the concept is applicable
> > > to any memory-side-cache configuration that the HMAT+SRAT can describe.
> > > However, CXL is a primary motivator given the OSPM role in address
> > > translation for device-physical-address (DPA) events being translated to
> > > impacted host-physical-address (HPA) events.
> > > 
> > > # Benefits of the Change
> > > An OSPM, when it knows about inclusive cache address space, can take
> > > actions like quarantine / offline all the impacted aliased pages to
> > > prevent further consumption of poison, or run repair operations on all
> > > the affected targets. Without this change an OSPM may not accurately
> > > identify the HPA associated with a given CXL FRU event, or it may
> > > misunderstand that an SRAT memory affinity range is an amalgam of CXL
> > > and cache capacity.  
> > 
> > Could you add a cache attribute to say it's a non-inclusive / exclusive
> > cache? That combined with direct-mapped would I think provide the relevant
> > indication.  It still runs into the problem that advanced hardware
> > could still resolve which alias is the problem. So maybe we are better
> > off sticking to describing that fact there is an alias issue for any
> > reported errors that cannot be resolved (presumably you can poke the
> > the aliased entries and see which one gives poison via synchronous access)  
> 
> I still disagree with the implication that "inclusion" is a property of the
> cache and not the address layout for this ECN.

It's an ECN about caches - the chance of misunderstanding is high.
Maybe there isn't a better option, but it definitely makes me feel uncomfortable.


> 
> > Note that I'm not keen on the use of inclusive for your range description
> > because that terminology means the exact opposite of what you intend
> > when applied to a normal cache! I can't think of a better term though
> > but the bikeshed should not be blue.  
> 
> I am sticking with "include" since cache capacity is included in the SRAT
> range, and will move off that term when/if someone comes up with something
> better.

Maybe hyphen will help? Inclusive-linear Address mode?
to avoid reading this as separate adjectives as in that this is an
'inclusive' cache that has a 'linear address' mode?

> 
> [..]
> > > 
> > > * Extend the implementation note after Table 5.149 to explain how to
> > >   interpret the "Inclusive linear" mode.
> > > 
> > >     * "When Address Mode is 1 'Inclusive Linear' it indicates that there
> > >       are N directly addressable aliases of a given cacheline
> > >       where N is the ratio of target memory proximity domain size and
> > >       the memory side cache size.  Where the N aliased addresses for a
> > >       given cacheline all share the same result for the operation
> > >       'address modulo cache size'."  
> > 
> > That description is somewhat tighter than the free form one in the intro
> > so answered a lot of questions I had before getting this far.  
> 
> Happy to delete all of the text outside of "Detailed Description of the Change"
> since none of it will be included in ACPI spec.
ASWG always like an explanation though and that stuff is helpful when
we are trying to figure out intent in years to come.

Jonathan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches
  2024-05-20 11:53     ` Jonathan Cameron
@ 2024-05-21 15:54       ` Dan Williams
  2024-05-23 11:49         ` Jonathan Cameron
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Williams @ 2024-05-21 15:54 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

Jonathan Cameron wrote:
> On Fri, 17 May 2024 13:20:06 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
[..]
> > I would prefer to just pile on more explicit clarifications to overcome that
> > instinct to map the word "Inclusive" to a multi-level cache attribute. Something
> > like "note, 'Inclusive Linear' address-mode not to be confused with
> > 'Inclusive/Exclusive' multi-level cache organization".
> 
> Hmm. I'll go with 'maybe'. When you have to add a bunch of distinctions of
> a term that is applying to a 'cache' to say that you don't mean the standard
> meaning for caches, it feels like a new term is a better path. 

Right, but that's the crux I am talking about the address organization and not
the cache policy, and it seems all too easy fall into the trap of considering
this to be a caching-policy attribute.

> Possibly as I suggest later a hyphen might avoid misreads.
> inclusive-linear address mode.

Yeah, that is a good idea.

> > > Whilst the CXL side of things (and I assume your hardware migration engine)
> > > don't provide a way to recover this, it would be possible to build
> > > a system that otherwise looked like you describe that did provide access
> > > to the tag bits and so wouldn't present the aliasing problem.  
> > 
> > Aliasing problem? All direct-mapped caches have aliases, it just happens that
> > this address mode allows direct-addressability of at least one alias.
> 
> As I understand this the problem is you get address A in the error record,
> but that actually means any of A, A + N, A + 2N etc and the issue is you
> have no way of recovering which alias you have. 
> 
> Another implementation might have the same aliasing in the cache, but allow
> for establishing which one you have (the hardware inherently has to know that
> but I presume in this case doesn't provide a way to look it up - or if it
> does, then issue here is that the OS querying of the CXL device doesn't know
> about that interface?).  So I think the critical here is that information is
> not available, not that aliasing occurs.

The critical information is that the address range is extended by the cache
capacity compared to the typical case. Maybe "extended-linear" is the term I was
searching for last Friday when I could not think of a better bikeshed color?

The reason an "extended-linear" indicator is important is for the driver to
recognize that the CXL address range programmed into the decoders is only a
subset of the system-physical address ranges that may route traffic to CXL. So
when the memory-side-cache is in this "extended" mode there are more addresses
that may route to CXL.

Again, whether the address mode is extended-linear, or "non-extended"-linear the
math to find aliases is the same. Rather, Linux needs this indication to break
its assumptions around which system-physical-address ranges may decode to CXL,
and avoid misinterpretations of ACPI SRAT/HMAT and CEDT.CFMWS.

[..]
> > But it is relevant. If the near memory (cache memory) is 64GB and the far memory
> > (backing store) is 64GB then the SRAT range is 64GB (cache-excluded). With this
> > new mode the SRAT range is 128GB.
> 
> Sure, but this is a cache, not normal memory so in what people normally expect
> from a memory-side cache (write through / write back etc) there is no
> reason to ever include it in SRAT. Hence it's not 'excluded' it just has
> nothing to do with SRAT which is about memory, not caches. I'm not
> arguing it is irrelevant in your case (where it clearly is because it is part
> of the memory), but it is irrelevant in for example a write-through cache.
> Saying it was excluded is implying a lot more that 'not-included' would for
> example.

Ok, that's starting to get through, and it seems to support the proposal to call
this address-mode "extended-linear".

[..]
> > It was an attempt to show precedent for why Linux needs to care about the memory
> > organization and how CFMWS does not achieve this description. That said, as this
> > is text that only appears in the justification for the ECN I do not mind
> > dropping it.
> 
> I think it risks confusion given it's not directly relevant to this.

...to be deleted for the next rev.

> > > > Address translation needs to consider that the decode for an error may
> > > > impact multiple components (FRUs fields replaceable units).
> > > > 
> > > > Now consider the implications of ["Flat Memory Mode" (Intel presentation
> > > > at Hot Chips
> > > > 2023)](https://cdrdv2-public.intel.com/787386/Hot%20Chips%20-%20Aug%2023%20-%20BHS%20and%20Granite%20Rapid%20-%20Xeon%20-%20Architecture%20-%20Public.pdf).  
> > > 
> > > Other than telling us someone put it on a slide, that slide provides
> > > very little useful info!  
> > 
> > Hence this write-up in the ECN, felt it was better than nothing to include a
> > picture for reference.
> 
> I think that left me more confused than anything ;)

...to be deleted for the next rev.

[..]
> > Sure, "Linear" implies direct-mapped since fully-set associative is a
> > non-linear arrangement.
> 
> Sure, but you haven't introduced linear yet so this reads as more general
> than intended.  I'd call out explicitly that if your new mode is set then
> direct-mapped must also be set.

Will do.

[..]
> > I still disagree with the implication that "inclusion" is a property of the
> > cache and not the address layout for this ECN.
> 
> It's an ECN about caches - the chance of misunderstanding is high.
> Maybe there isn't a better option, but it definitely makes me feel uncomfortable.
[..]
> Maybe hyphen will help? Inclusive-linear Address mode?
> to avoid reading this as separate adjectives as in that this is an
> 'inclusive' cache that has a 'linear address' mode?

Try this on for size:

* "When Address Mode is 1 'Extended-Linear' it indicates that the associated
   address range (SRAT.MemoryAffinityStructure.Length) is comprised of the
   backing store capacity extended by the cache capacity. It is arranged such
   that there are N directly addressable aliases of a given cacheline where N is
   the ratio of target memory proximity domain size and the memory side cache
   size. Where the N aliased addresses for a given cacheline all share the same
   result for the operation 'address modulo cache size'. This setting is only
   allowed when 'Cache Associativity' is 'Direct Map'."  

[..]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches
  2024-05-21 15:54       ` Dan Williams
@ 2024-05-23 11:49         ` Jonathan Cameron
  2024-05-23 16:36           ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Cameron @ 2024-05-23 11:49 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

> 
> > > > Whilst the CXL side of things (and I assume your hardware migration engine)
> > > > don't provide a way to recover this, it would be possible to build
> > > > a system that otherwise looked like you describe that did provide access
> > > > to the tag bits and so wouldn't present the aliasing problem.    
> > > 
> > > Aliasing problem? All direct-mapped caches have aliases, it just happens that
> > > this address mode allows direct-addressability of at least one alias.  
> > 
> > As I understand this the problem is you get address A in the error record,
> > but that actually means any of A, A + N, A + 2N etc and the issue is you
> > have no way of recovering which alias you have. 
> > 
> > Another implementation might have the same aliasing in the cache, but allow
> > for establishing which one you have (the hardware inherently has to know that
> > but I presume in this case doesn't provide a way to look it up - or if it
> > does, then issue here is that the OS querying of the CXL device doesn't know
> > about that interface?).  So I think the critical here is that information is
> > not available, not that aliasing occurs.  
> 
> The critical information is that the address range is extended by the cache
> capacity compared to the typical case. Maybe "extended-linear" is the term I was
> searching for last Friday when I could not think of a better bikeshed color?
> 
> The reason an "extended-linear" indicator is important is for the driver to
> recognize that the CXL address range programmed into the decoders is only a
> subset of the system-physical address ranges that may route traffic to CXL. So
> when the memory-side-cache is in this "extended" mode there are more addresses
> that may route to CXL.

I think we need to be careful with decoders here because the extra translation in the
path means they aren't in HPA space as such.  They are in a new HPA+ space.
In your case I think the translation is such that addresses are the bottom of the
HPA window, but they could just as easily be the top of the HPA window or not
within it at all...

|     HPA window 1 - Length = Cache + CXL                 |
|   HPA+ window 1 - Length = CXL only |

or
|     HPA window 1 - Length = Cache + CXL                |
                   |   HPA+ window 1 - Length = CXL only |

or for giggles

|     HPA window 1 - Length = Cache + CXL                |
                                                   |   HPA+ window - Length = CXL only |

last one might seem odd but if you are packing multiple of these you might get
|     HPA window 1 - Length = Cache + CXL      |   HPA window 2 Ln = Cache + CXL           |
|   HPA+ window 1 - Length = CXL only |  HPA+ window 2  Len = CXL only|

To reduce decoder costs in the fabric (yeah we don't do this today but the
bios might :)

So should the text say anything about decoder address vs (SRAT / HMAT addressing)
Maybe reasonable to say it's contained and aligned so modulo maths works?
This is a bit odd as HMAT wouldn't typically provide this info, but this addressing
mode already incorporates it sort of...



> 
> Again, whether the address mode is extended-linear, or "non-extended"-linear the
> math to find aliases is the same. Rather, Linux needs this indication to break
> its assumptions around which system-physical-address ranges may decode to CXL,
> and avoid misinterpretations of ACPI SRAT/HMAT and CEDT.CFMWS.
> 
> [..]
> > > But it is relevant. If the near memory (cache memory) is 64GB and the far memory
> > > (backing store) is 64GB then the SRAT range is 64GB (cache-excluded). With this
> > > new mode the SRAT range is 128GB.  
> > 
> > Sure, but this is a cache, not normal memory so in what people normally expect
> > from a memory-side cache (write through / write back etc) there is no
> > reason to ever include it in SRAT. Hence it's not 'excluded' it just has
> > nothing to do with SRAT which is about memory, not caches. I'm not
> > arguing it is irrelevant in your case (where it clearly is because it is part
> > of the memory), but it is irrelevant in for example a write-through cache.
> > Saying it was excluded is implying a lot more that 'not-included' would for
> > example.  
> 
> Ok, that's starting to get through, and it seems to support the proposal to call
> this address-mode "extended-linear".

That works. 

> [..]
> > > I still disagree with the implication that "inclusion" is a property of the
> > > cache and not the address layout for this ECN.  
> > 
> > It's an ECN about caches - the chance of misunderstanding is high.
> > Maybe there isn't a better option, but it definitely makes me feel uncomfortable.  
> [..]
> > Maybe hyphen will help? Inclusive-linear Address mode?
> > to avoid reading this as separate adjectives as in that this is an
> > 'inclusive' cache that has a 'linear address' mode?  
> 
> Try this on for size:
> 
> * "When Address Mode is 1 'Extended-Linear' it indicates that the associated
>    address range (SRAT.MemoryAffinityStructure.Length) is comprised of the
>    backing store capacity extended by the cache capacity. It is arranged such
>    that there are N directly addressable aliases of a given cacheline where N is
>    the ratio of target memory proximity domain size and the memory side cache
>    size. Where the N aliased addresses for a given cacheline all share the same
>    result for the operation 'address modulo cache size'. This setting is only
>    allowed when 'Cache Associativity' is 'Direct Map'."  
> 
> 
I don't promise not to change my mind, but today LGTM.

> [..]


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches
  2024-05-23 11:49         ` Jonathan Cameron
@ 2024-05-23 16:36           ` Dan Williams
  2024-05-24 11:31             ` Jonathan Cameron
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Williams @ 2024-05-23 16:36 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

Jonathan Cameron wrote:
> > 
> > > > > Whilst the CXL side of things (and I assume your hardware migration engine)
> > > > > don't provide a way to recover this, it would be possible to build
> > > > > a system that otherwise looked like you describe that did provide access
> > > > > to the tag bits and so wouldn't present the aliasing problem.    
> > > > 
> > > > Aliasing problem? All direct-mapped caches have aliases, it just happens that
> > > > this address mode allows direct-addressability of at least one alias.  
> > > 
> > > As I understand this the problem is you get address A in the error record,
> > > but that actually means any of A, A + N, A + 2N etc and the issue is you
> > > have no way of recovering which alias you have. 
> > > 
> > > Another implementation might have the same aliasing in the cache, but allow
> > > for establishing which one you have (the hardware inherently has to know that
> > > but I presume in this case doesn't provide a way to look it up - or if it
> > > does, then issue here is that the OS querying of the CXL device doesn't know
> > > about that interface?).  So I think the critical here is that information is
> > > not available, not that aliasing occurs.  
> > 
> > The critical information is that the address range is extended by the cache
> > capacity compared to the typical case. Maybe "extended-linear" is the term I was
> > searching for last Friday when I could not think of a better bikeshed color?
> > 
> > The reason an "extended-linear" indicator is important is for the driver to
> > recognize that the CXL address range programmed into the decoders is only a
> > subset of the system-physical address ranges that may route traffic to CXL. So
> > when the memory-side-cache is in this "extended" mode there are more addresses
> > that may route to CXL.
> 
> I think we need to be careful with decoders here because the extra translation in the
> path means they aren't in HPA space as such.  They are in a new HPA+ space.
> In your case I think the translation is such that addresses are the bottom of the
> HPA window, but they could just as easily be the top of the HPA window or not
> within it at all...

No need for an HPA+ concept. This is just an HPA vs SPA distinction,
similar to what we dealt with here:

0cab68720598 cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window

Typically HPA and SPA are a 1:1 relationship, but in this case there is
a memory-side cache that sometimes translates the SRAT SPA to CXL HPA vs
DDR HPA. For any given SPA in the SRAT range there is no way to know
whether it is currently dynamically mapped to CXL or DDR.

> |     HPA window 1 - Length = Cache + CXL                 |
> |   HPA+ window 1 - Length = CXL only |

HPA windows are never impacted by this memory side cache addressing.


> 
> or
> |     HPA window 1 - Length = Cache + CXL                |
>                    |   HPA+ window 1 - Length = CXL only |
> 
> or for giggles
> 
> |     HPA window 1 - Length = Cache + CXL                |
>                                                    |   HPA+ window - Length = CXL only |
> 
> last one might seem odd but if you are packing multiple of these you might get
> |     HPA window 1 - Length = Cache + CXL      |   HPA window 2 Ln = Cache + CXL           |
> |   HPA+ window 1 - Length = CXL only |  HPA+ window 2  Len = CXL only|
> 
> To reduce decoder costs in the fabric (yeah we don't do this today but the
> bios might :)

No, BIOS should have no opporunity to confuse "HPA" layout. Let me see
if I can cutoff this line of confusion in the next rev and explicitly
call out SPA vs HPA expectations.

> So should the text say anything about decoder address vs (SRAT / HMAT addressing)
> Maybe reasonable to say it's contained and aligned so modulo maths works?
> This is a bit odd as HMAT wouldn't typically provide this info, but this addressing
> mode already incorporates it sort of...

SRAT portrays capacity, HMAT portrays cache and address organization.
There is no need for bringing CXL decoder concepts into the HMAT.

[..]
> > > > I still disagree with the implication that "inclusion" is a property of the
> > > > cache and not the address layout for this ECN.  
> > > 
> > > It's an ECN about caches - the chance of misunderstanding is high.
> > > Maybe there isn't a better option, but it definitely makes me feel uncomfortable.  
> > [..]
> > > Maybe hyphen will help? Inclusive-linear Address mode?
> > > to avoid reading this as separate adjectives as in that this is an
> > > 'inclusive' cache that has a 'linear address' mode?  
> > 
> > Try this on for size:
> > 
> > * "When Address Mode is 1 'Extended-Linear' it indicates that the associated
> >    address range (SRAT.MemoryAffinityStructure.Length) is comprised of the
> >    backing store capacity extended by the cache capacity. It is arranged such
> >    that there are N directly addressable aliases of a given cacheline where N is
> >    the ratio of target memory proximity domain size and the memory side cache
> >    size. Where the N aliased addresses for a given cacheline all share the same
> >    result for the operation 'address modulo cache size'. This setting is only
> >    allowed when 'Cache Associativity' is 'Direct Map'."  
> > 
> > 
> I don't promise not to change my mind, but today LGTM.

This sounds very similar to the voice that is always in my mind when
reviewing code, reminds me of one of my favorite Star Wars quotes, "I am
altering the deal, pray I do not alter it any further."

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches
  2024-05-23 16:36           ` Dan Williams
@ 2024-05-24 11:31             ` Jonathan Cameron
  2024-05-24 17:49               ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Cameron @ 2024-05-24 11:31 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

On Thu, 23 May 2024 09:36:01 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > >   
> > > > > > Whilst the CXL side of things (and I assume your hardware migration engine)
> > > > > > don't provide a way to recover this, it would be possible to build
> > > > > > a system that otherwise looked like you describe that did provide access
> > > > > > to the tag bits and so wouldn't present the aliasing problem.      
> > > > > 
> > > > > Aliasing problem? All direct-mapped caches have aliases, it just happens that
> > > > > this address mode allows direct-addressability of at least one alias.    
> > > > 
> > > > As I understand this the problem is you get address A in the error record,
> > > > but that actually means any of A, A + N, A + 2N etc and the issue is you
> > > > have no way of recovering which alias you have. 
> > > > 
> > > > Another implementation might have the same aliasing in the cache, but allow
> > > > for establishing which one you have (the hardware inherently has to know that
> > > > but I presume in this case doesn't provide a way to look it up - or if it
> > > > does, then issue here is that the OS querying of the CXL device doesn't know
> > > > about that interface?).  So I think the critical here is that information is
> > > > not available, not that aliasing occurs.    
> > > 
> > > The critical information is that the address range is extended by the cache
> > > capacity compared to the typical case. Maybe "extended-linear" is the term I was
> > > searching for last Friday when I could not think of a better bikeshed color?
> > > 
> > > The reason an "extended-linear" indicator is important is for the driver to
> > > recognize that the CXL address range programmed into the decoders is only a
> > > subset of the system-physical address ranges that may route traffic to CXL. So
> > > when the memory-side-cache is in this "extended" mode there are more addresses
> > > that may route to CXL.  
> > 
> > I think we need to be careful with decoders here because the extra translation in the
> > path means they aren't in HPA space as such.  They are in a new HPA+ space.
> > In your case I think the translation is such that addresses are the bottom of the
> > HPA window, but they could just as easily be the top of the HPA window or not
> > within it at all...  
> 
> No need for an HPA+ concept. This is just an HPA vs SPA distinction,
> similar to what we dealt with here:
> 
> 0cab68720598 cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window

Sure, if we can avoid a reference to 'subset' then I think this is
fine - or avoid relating this to decoders at all.

> 
> Typically HPA and SPA are a 1:1 relationship, but in this case there is
> a memory-side cache that sometimes translates the SRAT SPA to CXL HPA vs
> DDR HPA. For any given SPA in the SRAT range there is no way to know
> whether it is currently dynamically mapped to CXL or DDR.
> 
> > |     HPA window 1 - Length = Cache + CXL                 |
> > |   HPA+ window 1 - Length = CXL only |  
> 
> HPA windows are never impacted by this memory side cache addressing.
> 
> 
> > 
> > or
> > |     HPA window 1 - Length = Cache + CXL                |
> >                    |   HPA+ window 1 - Length = CXL only |
> > 
> > or for giggles
> > 
> > |     HPA window 1 - Length = Cache + CXL                |
> >                                                    |   HPA+ window - Length = CXL only |
> > 
> > last one might seem odd but if you are packing multiple of these you might get
> > |     HPA window 1 - Length = Cache + CXL      |   HPA window 2 Ln = Cache + CXL           |
> > |   HPA+ window 1 - Length = CXL only |  HPA+ window 2  Len = CXL only|
> > 
> > To reduce decoder costs in the fabric (yeah we don't do this today but the
> > bios might :)  
> 
> No, BIOS should have no opporunity to confuse "HPA" layout. Let me see
> if I can cutoff this line of confusion in the next rev and explicitly
> call out SPA vs HPA expectations.
> 
> > So should the text say anything about decoder address vs (SRAT / HMAT addressing)
> > Maybe reasonable to say it's contained and aligned so modulo maths works?
> > This is a bit odd as HMAT wouldn't typically provide this info, but this addressing
> > mode already incorporates it sort of...  
> 
> SRAT portrays capacity, HMAT portrays cache and address organization.
> There is no need for bringing CXL decoder concepts into the HMAT.

Absolutely - avoid any reference to decoders and we are fine.

> 
> [..]
> > > > > I still disagree with the implication that "inclusion" is a property of the
> > > > > cache and not the address layout for this ECN.    
> > > > 
> > > > It's an ECN about caches - the chance of misunderstanding is high.
> > > > Maybe there isn't a better option, but it definitely makes me feel uncomfortable.    
> > > [..]  
> > > > Maybe hyphen will help? Inclusive-linear Address mode?
> > > > to avoid reading this as separate adjectives as in that this is an
> > > > 'inclusive' cache that has a 'linear address' mode?    
> > > 
> > > Try this on for size:
> > > 
> > > * "When Address Mode is 1 'Extended-Linear' it indicates that the associated
> > >    address range (SRAT.MemoryAffinityStructure.Length) is comprised of the
> > >    backing store capacity extended by the cache capacity. It is arranged such
> > >    that there are N directly addressable aliases of a given cacheline where N is
> > >    the ratio of target memory proximity domain size and the memory side cache
> > >    size. Where the N aliased addresses for a given cacheline all share the same
> > >    result for the operation 'address modulo cache size'. This setting is only
> > >    allowed when 'Cache Associativity' is 'Direct Map'."  
> > > 
> > >   
> > I don't promise not to change my mind, but today LGTM.  
> 
> This sounds very similar to the voice that is always in my mind when
> reviewing code, reminds me of one of my favorite Star Wars quotes, "I am
> altering the deal, pray I do not alter it any further."

:)

Jonathan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches
  2024-05-24 11:31             ` Jonathan Cameron
@ 2024-05-24 17:49               ` Dan Williams
  0 siblings, 0 replies; 9+ messages in thread
From: Dan Williams @ 2024-05-24 17:49 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, linux-acpi, mahesh.natu, rafael

Jonathan Cameron wrote:
[..]
> > > So should the text say anything about decoder address vs (SRAT / HMAT addressing)
> > > Maybe reasonable to say it's contained and aligned so modulo maths works?
> > > This is a bit odd as HMAT wouldn't typically provide this info, but this addressing
> > > mode already incorporates it sort of...  
> > 
> > SRAT portrays capacity, HMAT portrays cache and address organization.
> > There is no need for bringing CXL decoder concepts into the HMAT.
> 
> Absolutely - avoid any reference to decoders and we are fine.

Well no, because the implications of this addressing mode relative to
CXL decoder settings is the whole reason why "we", Linux CXL community,
are motivated to submit a code-first ECN proposal to explicitly
advertise this addressing mode. The Linux CXL subsystem needs to know
about this addressing mode because of the mismatch between endpoint
decoders and SPA ranges relative to endpoint HPA decode ranges.

Will try to make this point clear because I do not to see a path to
describe the motivation for this ECN without talking about the CXL
problem.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-05-24 17:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-10 23:00 [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches Dan Williams
2024-05-17 16:45 ` Jonathan Cameron
2024-05-17 20:20   ` Dan Williams
2024-05-20 11:53     ` Jonathan Cameron
2024-05-21 15:54       ` Dan Williams
2024-05-23 11:49         ` Jonathan Cameron
2024-05-23 16:36           ` Dan Williams
2024-05-24 11:31             ` Jonathan Cameron
2024-05-24 17:49               ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox