Linux CXL
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Gregory Price <gregory.price@memverge.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: <linux-cxl@vger.kernel.org>, Dave Jiang <dave.jiang@intel.com>
Subject: Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
Date: Mon, 17 Apr 2023 22:51:41 -0700	[thread overview]
Message-ID: <643e2fed87bd3_556e2941b@dwillia2-mobl3.amr.corp.intel.com.notmuch> (raw)
In-Reply-To: <ZDtz++B25suuoYoY@memverge.com>

Gregory Price wrote:
> On Thu, Mar 30, 2023 at 12:27:02AM -0400, Gregory Price wrote:
> > On Wed, Mar 29, 2023 at 11:33:05PM -0700, Dan Williams wrote:
> > > Gregory Price wrote:
> > > > On Wed, Mar 29, 2023 at 09:21:25PM -0700, Dan Williams wrote:
> > > > > > Is the DVSEC Range Register expected to be programmed by bios, and are
> > > > > > not being programmed correctly?
> > > > > 
> > > > > This debug experiment makes me think perhaps the *device* is at fault,
> > > > > not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
> > > > > to set up the decode as expected, but reads return 0? That's the only
> > > > > way that I can see that forcing that offset results in successfully
> > > > > talking to memory.
> > > > > 
> > > > 
> > > > Oh, i meant to add that i tested whether the memory is accessible via
> > > > numactl --membind=1 with both memhog and a python prompt, and things
> > > > worked just fine.  So memory works.
> > > 
> > > One other theory is that the device is correct, but the platform CXL
> > > window accepts transactions at an offset and then removes that offset
> > > when transmitting the address down the CXL port. So device thinks its
> > > decoding 0x0 and never sees the offset removed by the host bridge.
> > 
> > Wouldn't that be against the spec?  I thought the device intended to
> > receive HPA and do its decode accordingly.
> > 
> > Otherwise you could have multiple devices programmed capable of decoding
> > 0x0, which is already the device address, so there's nothing to
> > "decode".
> > 
> > I'll follow up as I learn more, this is concerning.  Certainly explains
> > why every time I switch hardware nothing seems to work quite right.
> > 
> > ~Gregory
> 
> 
> Follow up on this:  I attempted to actually write Memory_Base_Lo/Hi with
> the appriopriate values, validated the write went through, and then
> utilize the memory.
> 
> This test lead to a total system lockup, implying a major bus error
> and/or device failure.  No stack trace or any indication of exactly what
> went wrong - but it's pretty obvious there's a device issue.
> 
> if you remember, my initial test was to simply shift the base to the
> base of the CXL Fixed Memory Window:
> 
> @@ -342,6 +344,7 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,
> 
>                 base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
> 
> +               base += 0x4050000000;
>                 info->dvsec_range[i] = (struct range) {
>                         .start = base,
>                         .end = base + size - 1
> 
> 
> To this I added some writebacks to the registers to set them what they
> "should be" (i.e. what the BIOS should have done).
> 
> 
> The other thing i noticed was a combination of DVSEC flags that seemed
> interesting:
> 
> DVSEC CXL Capability: c0de
> Mem_HwInit_mode == 1  (harware + firmware on device do init)
> 
> DVSEC CXL Range 1 Size Low: 804
> memory_info_valid == 0
> memory_active == 0

Is this CXL_DVSEC_RANGE_SIZE_LOW(1) or literally "DVSEC CXL Range 1 Size
Low" which Linux calls the offset CXL_DVSEC_RANGE_SIZE_LOW(0)? If its
the former, it's valid for the second range to be disabled. Linux stack
should have failed cxl_await_media_ready() otherwise if this is
referring to CXL_DVSEC_RANGE_SIZE_LOW(0).

> 
> So this appears to have been BOTH a BIOS *and* a device issue, but we
> should also be checking these bits before using the info from these
> registers.
> 
> 
> 
> Something to note from the Spec: RCD Discovery (9.11.6)
> 
> 4. If Mem_HwInit_Mode=1
> ... snip ...
> 
> - Each HDM range is later exposed to the OS as a separate, memory-only
>   NUMA node via ACPI SRAT.
> 
> — System Firmware obtains CDAT from the UEFI device driver or directly
>   from the device via Table Access DOE (see Section 8.1.11) and then
>   uses this information during construction of the memory map, ACPI
>   SRAT, and ACPI HMAT. See ACPI Specification, CDAT Specification, and
>   UEFI Specification for further details.
> 
> It also sounds like trying to place this device in dax-device mode isn't
> really the intended use-case, and if that's the case maybe that should
> be disallowed?

I don't follow. Any CXL.mem in the end is just a physical address range,
any physical address range can be passed to dax or the core-mm.

> All this considered:  It's still head-scratching why shifting the HPA
> read from the range register successfully produces a "working device",
> but I suppose that's the definition of "undefined behavior" :]

The needs the platform vendor to weigh in, but that may not be possible
if this is evaluation hardware.

  reply	other threads:[~2023-04-18  5:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-29 21:35 [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration Dan Williams
2023-03-29 16:04 ` Gregory Price
2023-03-30  4:21   ` Dan Williams
2023-03-29 17:20     ` Gregory Price
2023-05-16  6:43       ` Gregory Price
2023-03-29 17:21     ` Gregory Price
2023-03-30  6:33       ` Dan Williams
2023-03-30  4:27         ` Gregory Price
2023-04-16  4:05           ` Gregory Price
2023-04-18  5:51             ` Dan Williams [this message]
2023-04-20  0:50               ` Gregory Price
2023-03-30  0:06 ` Dave Jiang
2023-03-30 17:00 ` Jonathan Cameron
2023-03-30 18:24   ` Dan Williams
2023-04-03 23:06 ` [PATCH v2] " Dan Williams
2023-04-03 23:44   ` Dave Jiang
2023-04-04  0:08     ` Dan Williams
2023-04-04  0:16       ` Dave Jiang
2023-04-04  9:19   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=643e2fed87bd3_556e2941b@dwillia2-mobl3.amr.corp.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=gregory.price@memverge.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox