Linux CXL
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Gregory Price <gregory.price@memverge.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: <linux-cxl@vger.kernel.org>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	<dave.jiang@intel.com>
Subject: Re: [PATCH 0/2] cxl: DVSEC Range emulation fixups
Date: Wed, 1 Mar 2023 10:46:29 -0800	[thread overview]
Message-ID: <63ff9d85215e_495bc294e2@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <Y/gPhE3hDnTEd6PI@memverge.com>

Hi Gregory,

Gregory Price wrote:
> On Tue, Feb 21, 2023 at 05:51:13PM -0800, Dan Williams wrote:
> > Jonathan points out that the kernel is too agressive in assuming that
> > DVSEC range registers are in use, reliably skip emulation if
> > 'mem_enabled' is not set. The helper devm_cxl_setup_emulated_hdm() is
> > needlessly redoing an allocation, clean that up.
> > 
> > ---
> > 
> > Dan Williams (2):
> >       cxl/hdm: Fix double allocation of @cxlhdm
> >       cxl/hdm: Skip emulation when driver manages mem_enable
> > 
> > 
> >  drivers/cxl/core/hdm.c |   65 ++++++++++++++++++------------------------------
> >  drivers/cxl/cxl.h      |    4 ++-
> >  drivers/cxl/port.c     |    2 +
> >  3 files changed, 28 insertions(+), 43 deletions(-)
> > 
> > base-commit: 23c198e3dfaabbc891681aecb0855b9e0ac791e1
> 
> 
> not *quite* sure what to make of this yet, but i get stack trace on boot
> on real hardware with this patch.  I'm debugging other issues with this
> hardware, so i'm not sure if it's related or not, but prior to this patch
> I did not have a stack trace.
> 
> 
> I think there's two issues here:
> 
> 1) The system I'm on fails to register a CFMW/root port decoder.  I'm
>    not entirely sure why, other than during cxl_decoder_add(), the
>    target map contains "[0,]" as the target id's, and the only
>    registered ports/decoders are the endpoints.
> 
>    I don't know whether this is because the hardware just doesn't have a
>    root decoder, or what.  But it makes the volatile region patches
>    non-functional, and i have to revert back to static configuration to
>    use the real cxl device (i.e. don't mark it EFI_MEMORY_SP).

It looks like the BIOS is trying to report something in the CEDT.CFMWS
but it looks

> 2) Per the second bit - there's no component registers being registered
>    for this cxl device (plus some spurious DOE error).

If the CEDT is broken then for RCH topologies the device component
registers will also be missing.

> 
> 
> The no root decoder thing has been throwing me for a loop, if you can
> help me shed some light on this i'd greatly appreciate it.  If a socket
> has no decoders, should we expect memory expanders to be managable via
> the volatile region system in the driver?
> 
> 
> relevant dmesg info
> 
> [   21.928436] cxl root0: Failed to populate active decoder targets

Would be interesting to know if decoder_populate_targets() is returning
-EINVAL or -ENXIO.

> [   21.929077] cxl_acpi ACPI0017:00: Failed to add decode range [0x1050000000 - 0x304fffffff]
> [   21.933150]  pci0000:3f: host supports CXL (restricted)

This signals this is an RCH topology.

> [... snip ...]
> [   21.965126] cxl_pci 0000:3f:00.0: No component registers (-19)
> [   22.001597] cxl_pci 0000:3f:00.0: DOE: [d80] failed to cache protocols : -5
> [   22.002351] cxl_pci 0000:3f:00.0: Failed to create MB object for MB @ d80
> [   22.003265] cxl_pci 0000:3f:00.0: Failed to request region 0x0000000000001fff-0x000000000010201e
> [... snip ...]
> [   22.339973] BUG: unable to handle page fault for address: 0000000000001000
> [   22.340584] #PF: supervisor read access in kernel mode
> [   22.346801] #PF: error_code(0x0000) - not-present page
> [   22.349059] PGD 1339ec067 P4D 0
> [   22.350877] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [   22.354558] CPU: 45 PID: 1351 Comm: systemd-udevd Not tainted 6.2.0+ #7
> [   22.358357] RIP: 0010:cxl_probe_component_regs+0x23/0x180 [cxl_core]

Can you send the output of:

scripts/faddr2line drivers/cxl/core/cxl_core.ko cxl_probe_component_regs+0x23

...from your kernel build directory?

I suspect this crash can be avoided with an explicit earlier check for
missing component registers, but that's not really a fix for this
failure.

Can you also send the log without these patches applied for comparison?

  reply	other threads:[~2023-03-01 18:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-22  1:51 [PATCH 0/2] cxl: DVSEC Range emulation fixups Dan Williams
2023-02-22  1:51 ` [PATCH 1/2] cxl/hdm: Fix double allocation of @cxlhdm Dan Williams
2023-02-22 12:53   ` Jonathan Cameron
2023-02-22 16:57   ` Dave Jiang
2023-02-22  1:51 ` [PATCH 2/2] cxl/hdm: Skip emulation when driver manages mem_enable Dan Williams
2023-02-22 13:22   ` Jonathan Cameron
2023-02-23  5:05     ` Dan Williams
2023-02-22 16:59   ` Dave Jiang
2023-03-31 16:33   ` Fan Ni
2023-02-24  1:14 ` [PATCH 0/2] cxl: DVSEC Range emulation fixups Gregory Price
2023-03-01 18:46   ` Dan Williams [this message]
2023-02-26  7:28     ` Gregory Price
2023-03-03 16:43     ` Gregory Price
2023-03-21 17:17     ` Gregory Price
2023-03-23 17:56       ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63ff9d85215e_495bc294e2@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=dave.jiang@intel.com \
    --cc=gregory.price@memverge.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox