Linux CXL
 help / color / mirror / Atom feed
From: Gregory Price <gregory.price@memverge.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org, Dave Jiang <dave.jiang@intel.com>
Subject: Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
Date: Wed, 29 Mar 2023 13:20:03 -0400	[thread overview]
Message-ID: <ZCRzQxmafPCCWJOx@memverge.com> (raw)
In-Reply-To: <64250e45bea49_c722294c2@dwillia2-mobl3.amr.corp.intel.com.notmuch>

On Wed, Mar 29, 2023 at 09:21:25PM -0700, Dan Williams wrote:
> Gregory Price wrote:
> [..]
> > I have an example host where *dpa_base ends up being 0x0 here, and as a
> > result later down the line the region fails to register with this:
> 
> First off, thanks for testing.
> 
> > [   21.974117] cxl_pci 0000:3f:00.0: mem0:decoder1.0 no CXL window for range 0x0:0x1fffffffff
> 
> So, this debug message is a statement about *HPA*, DPA is the device
> local offset, it just so happens that in this case the HPA base and
> DPA base are aligned.
> 
> [..]
> > (full cxl log with bonus prints i added)
> > [   21.607034] cxl_pci 0000:3f:00.0: No component registers (-19)
> > [   21.641831] cxl_pci 0000:3f:00.0: DOE: [d80] failed to cache protocols : -5
> > [   21.642866] cxl_pci 0000:3f:00.0: Failed to create MB object for MB @ d80
> > [   21.643686] cxl_pci 0000:3f:00.0: Failed to request region 0x0000000000001fff-0x000000000010201e
> 
> I assume that this is the attempt to map the RAS component registers
> given the driver load did not fail. Would be interested to see this
> platform's /proc/iomem, but this is separate from the DPA mapping issue.
> Perhaps this device tries to define the RAS capability with another
> block that was already mapped?
> 

I'll snag more information on this tomorrow when I can do a bit more.

> > [   21.644371] cxl add_dpa_res: (0, 1fffffffff)
> > [   21.645540] cxl add_dpa_res: (2000000000, 1fffffffff)
> 
> Ok, this device has a 128G ram region and no pmem.
> 
> > [   21.965692] cxl hdm dvsec range: (0, 1fffffffff)
> > [   21.967045] cxl emulating decoders: dpa_base(0)   <-  *dpa_base
> 
> That's ok...
> 
> > [   21.967778] cxl_add_to_region: searching for root decoder with address range(0, 1fffffffff)
> 
> Now it switches to search for a CFMWS for the HPA that just to be
> identity mapped to DPA by accident.
> 
> > [   21.972824] cxl match decoder: found root decoder, r1(1050000000, 304fffffff) r2(0, 1fffffffff)
> 
> I am assuming that this means the only CXL window defined for this
> platform is 1050000000-304fffffff?

Yes sorry, r1() is the root decoder which maps to the CFMW, r2 is the
device dvsec defined range.  I actually wrote this out while i was
working towards the second half of the experiment, just left this here
for completeness incase i got something wrong.

> 
> > [   21.974117] cxl_pci 0000:3f:00.0: mem0:decoder1.0 no CXL window for range 0x0:0x1fffffffff
> > [   21.974905] cxl discover_region: failed to add to region: 0x0-0x1fffffffff
> 
> Driver is giving up for cause because either the BIOS failed to program
> the range registers of the device correctly, or it failed to publish a
> window for where the device is decoding.
> 
> > Ultimately having trouble deciding if this is something broken with
> > bios, the setup_hdm_decoder code, or the discover_region code.  I'm not
> > PCI guru, but we should expect the rdm dvsec range to be in the range of
> > the root decoder window / CFMW:  r1(1050000000, 304fffffff)
> 
> Yes, either that or a CFWMS for 0-1fffffffff
>

Given this system is single-socket, and has only a single CXL memory
expander connected to it, and that a CFMWS for 0-1fff.. seems
irrational, I'm going to go with the Range Register is wrong.

> > confirming this is the issue, i *forced* the dvsec range register to read
> > out base+0x1050000000
> 
> Nice!
> 
> > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> > index 7328a2552411..6fc6df0f7b5a 100644
> > --- a/drivers/cxl/core/pci.c
> > +++ b/drivers/cxl/core/pci.c
> > @@ -340,12 +344,16 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,
> >                 if (rc)
> >                         return rc;
> > 
> >                 base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
> > 
> >                 info->dvsec_range[i] = (struct range) {
> > -                       .start = base,
> > -                       .end = base + size - 1
> > +                       .start = 0x1050000000 + base,
> > +                       .end = 0x1050000000 + base + size - 1
> >                 };
> > 
> > 
> > And this resulted in everything working "as one would expect"
> > 
> > [user@host0 ~]# ls /sys/bus/cxl/devices/
> > dax_region0  decoder0.0  decoder1.0  endpoint1  mem0  region0  root0
> > 
> > [user@host0 ~]# numactl --hardware
> > available: 2 nodes (0-1)
> > node 1 cpus:
> > node 1 size: 0 MB
> > node 1 free: 0 MB
> > node distances:
> > node   0   1
> >   0:  10  50
> >   1:  255  10
> > 
> > [user@host0 ~]# echo online_movable > /sys/bus/node/devices/node1/memory33/state
> > [user@host0 ~]# numactl --hardware
> > available: 2 nodes (0-1)
> > node 1 size: 2048 MB
> > node 1 free: 2048 MB
> > node distances:
> > node   0   1
> >   0:  10  50
> >   1:  255  10
> > 
> > 
> > Basically the question is:
> > 
> > Is the DVSEC Range Register expected to be programmed by bios, and are
> > not being programmed correctly?
> 
> This debug experiment makes me think perhaps the *device* is at fault,
> not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
> to set up the decode as expected, but reads return 0? That's the only
> way that I can see that forcing that offset results in successfully
> talking to memory.
> 

Noteably, this is the same system in which i discovered a bios bug with
the CEDT.CMFW base.  I also have a patch in the kernel that corrects the
CFMW interleave target indexing.

So it's entirely possible that the bios is not programming things
correctly and simply leaving the device indexed at 0x0, while the device
sets up most everything else.

But I will investigate a bit further.

> > Or is there something else missing here to correct for the CMFW base?
> [..]
> > If it's the former, then this patch set is gtg and i'm happy to add my
> > Tested-by tag.  If it's the latter, can we hotfix it before release?
> 
> Yeah, FWIW, I don't see anything wrong from the kernel side, and would
> be curious what the BIOS reports if it reads back what it wrote to the
> CXL DVSEC range registers.

At this point I concur, you can add my tested-by to each of the patches
on this line.  I will reach out to the platform and device vendors to
investigate further.  Thanks for the input.

Tested-by: Gregory Price <gregory.price@memverge.com>

  reply	other threads:[~2023-03-30  4:35 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-29 21:35 [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration Dan Williams
2023-03-29 16:04 ` Gregory Price
2023-03-30  4:21   ` Dan Williams
2023-03-29 17:20     ` Gregory Price [this message]
2023-05-16  6:43       ` Gregory Price
2023-03-29 17:21     ` Gregory Price
2023-03-30  6:33       ` Dan Williams
2023-03-30  4:27         ` Gregory Price
2023-04-16  4:05           ` Gregory Price
2023-04-18  5:51             ` Dan Williams
2023-04-20  0:50               ` Gregory Price
2023-03-30  0:06 ` Dave Jiang
2023-03-30 17:00 ` Jonathan Cameron
2023-03-30 18:24   ` Dan Williams
2023-04-03 23:06 ` [PATCH v2] " Dan Williams
2023-04-03 23:44   ` Dave Jiang
2023-04-04  0:08     ` Dan Williams
2023-04-04  0:16       ` Dave Jiang
2023-04-04  9:19   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCRzQxmafPCCWJOx@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox