From: Dan Williams <dan.j.williams@intel.com>
To: Gregory Price <gregory.price@memverge.com>,
Dan Williams <dan.j.williams@intel.com>
Cc: <linux-cxl@vger.kernel.org>, Dave Jiang <dave.jiang@intel.com>
Subject: Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
Date: Mon, 17 Apr 2023 22:51:41 -0700 [thread overview]
Message-ID: <643e2fed87bd3_556e2941b@dwillia2-mobl3.amr.corp.intel.com.notmuch> (raw)
In-Reply-To: <ZDtz++B25suuoYoY@memverge.com>
Gregory Price wrote:
> On Thu, Mar 30, 2023 at 12:27:02AM -0400, Gregory Price wrote:
> > On Wed, Mar 29, 2023 at 11:33:05PM -0700, Dan Williams wrote:
> > > Gregory Price wrote:
> > > > On Wed, Mar 29, 2023 at 09:21:25PM -0700, Dan Williams wrote:
> > > > > > Is the DVSEC Range Register expected to be programmed by bios, and are
> > > > > > not being programmed correctly?
> > > > >
> > > > > This debug experiment makes me think perhaps the *device* is at fault,
> > > > > not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
> > > > > to set up the decode as expected, but reads return 0? That's the only
> > > > > way that I can see that forcing that offset results in successfully
> > > > > talking to memory.
> > > > >
> > > >
> > > > Oh, i meant to add that i tested whether the memory is accessible via
> > > > numactl --membind=1 with both memhog and a python prompt, and things
> > > > worked just fine. So memory works.
> > >
> > > One other theory is that the device is correct, but the platform CXL
> > > window accepts transactions at an offset and then removes that offset
> > > when transmitting the address down the CXL port. So device thinks its
> > > decoding 0x0 and never sees the offset removed by the host bridge.
> >
> > Wouldn't that be against the spec? I thought the device intended to
> > receive HPA and do its decode accordingly.
> >
> > Otherwise you could have multiple devices programmed capable of decoding
> > 0x0, which is already the device address, so there's nothing to
> > "decode".
> >
> > I'll follow up as I learn more, this is concerning. Certainly explains
> > why every time I switch hardware nothing seems to work quite right.
> >
> > ~Gregory
>
>
> Follow up on this: I attempted to actually write Memory_Base_Lo/Hi with
> the appriopriate values, validated the write went through, and then
> utilize the memory.
>
> This test lead to a total system lockup, implying a major bus error
> and/or device failure. No stack trace or any indication of exactly what
> went wrong - but it's pretty obvious there's a device issue.
>
> if you remember, my initial test was to simply shift the base to the
> base of the CXL Fixed Memory Window:
>
> @@ -342,6 +344,7 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,
>
> base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
>
> + base += 0x4050000000;
> info->dvsec_range[i] = (struct range) {
> .start = base,
> .end = base + size - 1
>
>
> To this I added some writebacks to the registers to set them what they
> "should be" (i.e. what the BIOS should have done).
>
>
> The other thing i noticed was a combination of DVSEC flags that seemed
> interesting:
>
> DVSEC CXL Capability: c0de
> Mem_HwInit_mode == 1 (harware + firmware on device do init)
>
> DVSEC CXL Range 1 Size Low: 804
> memory_info_valid == 0
> memory_active == 0
Is this CXL_DVSEC_RANGE_SIZE_LOW(1) or literally "DVSEC CXL Range 1 Size
Low" which Linux calls the offset CXL_DVSEC_RANGE_SIZE_LOW(0)? If its
the former, it's valid for the second range to be disabled. Linux stack
should have failed cxl_await_media_ready() otherwise if this is
referring to CXL_DVSEC_RANGE_SIZE_LOW(0).
>
> So this appears to have been BOTH a BIOS *and* a device issue, but we
> should also be checking these bits before using the info from these
> registers.
>
>
>
> Something to note from the Spec: RCD Discovery (9.11.6)
>
> 4. If Mem_HwInit_Mode=1
> ... snip ...
>
> - Each HDM range is later exposed to the OS as a separate, memory-only
> NUMA node via ACPI SRAT.
>
> — System Firmware obtains CDAT from the UEFI device driver or directly
> from the device via Table Access DOE (see Section 8.1.11) and then
> uses this information during construction of the memory map, ACPI
> SRAT, and ACPI HMAT. See ACPI Specification, CDAT Specification, and
> UEFI Specification for further details.
>
> It also sounds like trying to place this device in dax-device mode isn't
> really the intended use-case, and if that's the case maybe that should
> be disallowed?
I don't follow. Any CXL.mem in the end is just a physical address range,
any physical address range can be passed to dax or the core-mm.
> All this considered: It's still head-scratching why shifting the HPA
> read from the range register successfully produces a "working device",
> but I suppose that's the definition of "undefined behavior" :]
The needs the platform vendor to weigh in, but that may not be possible
if this is evaluation hardware.
next prev parent reply other threads:[~2023-04-18 5:51 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-29 21:35 [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration Dan Williams
2023-03-29 16:04 ` Gregory Price
2023-03-30 4:21 ` Dan Williams
2023-03-29 17:20 ` Gregory Price
2023-05-16 6:43 ` Gregory Price
2023-03-29 17:21 ` Gregory Price
2023-03-30 6:33 ` Dan Williams
2023-03-30 4:27 ` Gregory Price
2023-04-16 4:05 ` Gregory Price
2023-04-18 5:51 ` Dan Williams [this message]
2023-04-20 0:50 ` Gregory Price
2023-03-30 0:06 ` Dave Jiang
2023-03-30 17:00 ` Jonathan Cameron
2023-03-30 18:24 ` Dan Williams
2023-04-03 23:06 ` [PATCH v2] " Dan Williams
2023-04-03 23:44 ` Dave Jiang
2023-04-04 0:08 ` Dan Williams
2023-04-04 0:16 ` Dave Jiang
2023-04-04 9:19 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=643e2fed87bd3_556e2941b@dwillia2-mobl3.amr.corp.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=gregory.price@memverge.com \
--cc=linux-cxl@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox