From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: <linux-cxl@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
<mahesh.natu@intel.com>, <rafael@kernel.org>
Subject: Re: [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches
Date: Fri, 24 May 2024 12:31:55 +0100 [thread overview]
Message-ID: <20240524123155.00005b0a@Huawei.com> (raw)
In-Reply-To: <664f707125537_18c429438@dwillia2-mobl3.amr.corp.intel.com.notmuch>
On Thu, 23 May 2024 09:36:01 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> Jonathan Cameron wrote:
> > >
> > > > > > Whilst the CXL side of things (and I assume your hardware migration engine)
> > > > > > don't provide a way to recover this, it would be possible to build
> > > > > > a system that otherwise looked like you describe that did provide access
> > > > > > to the tag bits and so wouldn't present the aliasing problem.
> > > > >
> > > > > Aliasing problem? All direct-mapped caches have aliases, it just happens that
> > > > > this address mode allows direct-addressability of at least one alias.
> > > >
> > > > As I understand this the problem is you get address A in the error record,
> > > > but that actually means any of A, A + N, A + 2N etc and the issue is you
> > > > have no way of recovering which alias you have.
> > > >
> > > > Another implementation might have the same aliasing in the cache, but allow
> > > > for establishing which one you have (the hardware inherently has to know that
> > > > but I presume in this case doesn't provide a way to look it up - or if it
> > > > does, then issue here is that the OS querying of the CXL device doesn't know
> > > > about that interface?). So I think the critical here is that information is
> > > > not available, not that aliasing occurs.
> > >
> > > The critical information is that the address range is extended by the cache
> > > capacity compared to the typical case. Maybe "extended-linear" is the term I was
> > > searching for last Friday when I could not think of a better bikeshed color?
> > >
> > > The reason an "extended-linear" indicator is important is for the driver to
> > > recognize that the CXL address range programmed into the decoders is only a
> > > subset of the system-physical address ranges that may route traffic to CXL. So
> > > when the memory-side-cache is in this "extended" mode there are more addresses
> > > that may route to CXL.
> >
> > I think we need to be careful with decoders here because the extra translation in the
> > path means they aren't in HPA space as such. They are in a new HPA+ space.
> > In your case I think the translation is such that addresses are the bottom of the
> > HPA window, but they could just as easily be the top of the HPA window or not
> > within it at all...
>
> No need for an HPA+ concept. This is just an HPA vs SPA distinction,
> similar to what we dealt with here:
>
> 0cab68720598 cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window
Sure, if we can avoid a reference to 'subset' then I think this is
fine - or avoid relating this to decoders at all.
>
> Typically HPA and SPA are a 1:1 relationship, but in this case there is
> a memory-side cache that sometimes translates the SRAT SPA to CXL HPA vs
> DDR HPA. For any given SPA in the SRAT range there is no way to know
> whether it is currently dynamically mapped to CXL or DDR.
>
> > | HPA window 1 - Length = Cache + CXL |
> > | HPA+ window 1 - Length = CXL only |
>
> HPA windows are never impacted by this memory side cache addressing.
>
>
> >
> > or
> > | HPA window 1 - Length = Cache + CXL |
> > | HPA+ window 1 - Length = CXL only |
> >
> > or for giggles
> >
> > | HPA window 1 - Length = Cache + CXL |
> > | HPA+ window - Length = CXL only |
> >
> > last one might seem odd but if you are packing multiple of these you might get
> > | HPA window 1 - Length = Cache + CXL | HPA window 2 Ln = Cache + CXL |
> > | HPA+ window 1 - Length = CXL only | HPA+ window 2 Len = CXL only|
> >
> > To reduce decoder costs in the fabric (yeah we don't do this today but the
> > bios might :)
>
> No, BIOS should have no opporunity to confuse "HPA" layout. Let me see
> if I can cutoff this line of confusion in the next rev and explicitly
> call out SPA vs HPA expectations.
>
> > So should the text say anything about decoder address vs (SRAT / HMAT addressing)
> > Maybe reasonable to say it's contained and aligned so modulo maths works?
> > This is a bit odd as HMAT wouldn't typically provide this info, but this addressing
> > mode already incorporates it sort of...
>
> SRAT portrays capacity, HMAT portrays cache and address organization.
> There is no need for bringing CXL decoder concepts into the HMAT.
Absolutely - avoid any reference to decoders and we are fine.
>
> [..]
> > > > > I still disagree with the implication that "inclusion" is a property of the
> > > > > cache and not the address layout for this ECN.
> > > >
> > > > It's an ECN about caches - the chance of misunderstanding is high.
> > > > Maybe there isn't a better option, but it definitely makes me feel uncomfortable.
> > > [..]
> > > > Maybe hyphen will help? Inclusive-linear Address mode?
> > > > to avoid reading this as separate adjectives as in that this is an
> > > > 'inclusive' cache that has a 'linear address' mode?
> > >
> > > Try this on for size:
> > >
> > > * "When Address Mode is 1 'Extended-Linear' it indicates that the associated
> > > address range (SRAT.MemoryAffinityStructure.Length) is comprised of the
> > > backing store capacity extended by the cache capacity. It is arranged such
> > > that there are N directly addressable aliases of a given cacheline where N is
> > > the ratio of target memory proximity domain size and the memory side cache
> > > size. Where the N aliased addresses for a given cacheline all share the same
> > > result for the operation 'address modulo cache size'. This setting is only
> > > allowed when 'Cache Associativity' is 'Direct Map'."
> > >
> > >
> > I don't promise not to change my mind, but today LGTM.
>
> This sounds very similar to the voice that is always in my mind when
> reviewing code, reminds me of one of my favorite Star Wars quotes, "I am
> altering the deal, pray I do not alter it any further."
:)
Jonathan
next prev parent reply other threads:[~2024-05-24 11:31 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-10 23:00 [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches Dan Williams
2024-05-17 16:45 ` Jonathan Cameron
2024-05-17 20:20 ` Dan Williams
2024-05-20 11:53 ` Jonathan Cameron
2024-05-21 15:54 ` Dan Williams
2024-05-23 11:49 ` Jonathan Cameron
2024-05-23 16:36 ` Dan Williams
2024-05-24 11:31 ` Jonathan Cameron [this message]
2024-05-24 17:49 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240524123155.00005b0a@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=dan.j.williams@intel.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=mahesh.natu@intel.com \
--cc=rafael@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox