Linux CXL
 help / color / mirror / Atom feed
From: Gregory Price <gregory.price@memverge.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org, Dave Jiang <dave.jiang@intel.com>
Subject: Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
Date: Sun, 16 Apr 2023 00:05:15 -0400	[thread overview]
Message-ID: <ZDtz++B25suuoYoY@memverge.com> (raw)
In-Reply-To: <ZCUPlnQYVRCumy2O@memverge.com>

On Thu, Mar 30, 2023 at 12:27:02AM -0400, Gregory Price wrote:
> On Wed, Mar 29, 2023 at 11:33:05PM -0700, Dan Williams wrote:
> > Gregory Price wrote:
> > > On Wed, Mar 29, 2023 at 09:21:25PM -0700, Dan Williams wrote:
> > > > > Is the DVSEC Range Register expected to be programmed by bios, and are
> > > > > not being programmed correctly?
> > > > 
> > > > This debug experiment makes me think perhaps the *device* is at fault,
> > > > not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
> > > > to set up the decode as expected, but reads return 0? That's the only
> > > > way that I can see that forcing that offset results in successfully
> > > > talking to memory.
> > > > 
> > > 
> > > Oh, i meant to add that i tested whether the memory is accessible via
> > > numactl --membind=1 with both memhog and a python prompt, and things
> > > worked just fine.  So memory works.
> > 
> > One other theory is that the device is correct, but the platform CXL
> > window accepts transactions at an offset and then removes that offset
> > when transmitting the address down the CXL port. So device thinks its
> > decoding 0x0 and never sees the offset removed by the host bridge.
> 
> Wouldn't that be against the spec?  I thought the device intended to
> receive HPA and do its decode accordingly.
> 
> Otherwise you could have multiple devices programmed capable of decoding
> 0x0, which is already the device address, so there's nothing to
> "decode".
> 
> I'll follow up as I learn more, this is concerning.  Certainly explains
> why every time I switch hardware nothing seems to work quite right.
> 
> ~Gregory


Follow up on this:  I attempted to actually write Memory_Base_Lo/Hi with
the appriopriate values, validated the write went through, and then
utilize the memory.

This test lead to a total system lockup, implying a major bus error
and/or device failure.  No stack trace or any indication of exactly what
went wrong - but it's pretty obvious there's a device issue.

if you remember, my initial test was to simply shift the base to the
base of the CXL Fixed Memory Window:

@@ -342,6 +344,7 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,

                base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;

+               base += 0x4050000000;
                info->dvsec_range[i] = (struct range) {
                        .start = base,
                        .end = base + size - 1


To this I added some writebacks to the registers to set them what they
"should be" (i.e. what the BIOS should have done).


The other thing i noticed was a combination of DVSEC flags that seemed
interesting:

DVSEC CXL Capability: c0de
Mem_HwInit_mode == 1  (harware + firmware on device do init)

DVSEC CXL Range 1 Size Low: 804
memory_info_valid == 0
memory_active == 0

So this appears to have been BOTH a BIOS *and* a device issue, but we
should also be checking these bits before using the info from these
registers.



Something to note from the Spec: RCD Discovery (9.11.6)

4. If Mem_HwInit_Mode=1
... snip ...

- Each HDM range is later exposed to the OS as a separate, memory-only
  NUMA node via ACPI SRAT.

— System Firmware obtains CDAT from the UEFI device driver or directly
  from the device via Table Access DOE (see Section 8.1.11) and then
  uses this information during construction of the memory map, ACPI
  SRAT, and ACPI HMAT. See ACPI Specification, CDAT Specification, and
  UEFI Specification for further details.

It also sounds like trying to place this device in dax-device mode isn't
really the intended use-case, and if that's the case maybe that should
be disallowed?



All this considered:  It's still head-scratching why shifting the HPA
read from the range register successfully produces a "working device",
but I suppose that's the definition of "undefined behavior" :]

~Gregory

  reply	other threads:[~2023-04-18  1:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-29 21:35 [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration Dan Williams
2023-03-29 16:04 ` Gregory Price
2023-03-30  4:21   ` Dan Williams
2023-03-29 17:20     ` Gregory Price
2023-05-16  6:43       ` Gregory Price
2023-03-29 17:21     ` Gregory Price
2023-03-30  6:33       ` Dan Williams
2023-03-30  4:27         ` Gregory Price
2023-04-16  4:05           ` Gregory Price [this message]
2023-04-18  5:51             ` Dan Williams
2023-04-20  0:50               ` Gregory Price
2023-03-30  0:06 ` Dave Jiang
2023-03-30 17:00 ` Jonathan Cameron
2023-03-30 18:24   ` Dan Williams
2023-04-03 23:06 ` [PATCH v2] " Dan Williams
2023-04-03 23:44   ` Dave Jiang
2023-04-04  0:08     ` Dan Williams
2023-04-04  0:16       ` Dave Jiang
2023-04-04  9:19   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDtz++B25suuoYoY@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox