From: Dan Williams <dan.j.williams@intel.com>
To: Gregory Price <gregory.price@memverge.com>,
Dan Williams <dan.j.williams@intel.com>
Cc: <linux-cxl@vger.kernel.org>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
<dave.jiang@intel.com>
Subject: Re: [PATCH 0/2] cxl: DVSEC Range emulation fixups
Date: Wed, 1 Mar 2023 10:46:29 -0800 [thread overview]
Message-ID: <63ff9d85215e_495bc294e2@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <Y/gPhE3hDnTEd6PI@memverge.com>
Hi Gregory,
Gregory Price wrote:
> On Tue, Feb 21, 2023 at 05:51:13PM -0800, Dan Williams wrote:
> > Jonathan points out that the kernel is too agressive in assuming that
> > DVSEC range registers are in use, reliably skip emulation if
> > 'mem_enabled' is not set. The helper devm_cxl_setup_emulated_hdm() is
> > needlessly redoing an allocation, clean that up.
> >
> > ---
> >
> > Dan Williams (2):
> > cxl/hdm: Fix double allocation of @cxlhdm
> > cxl/hdm: Skip emulation when driver manages mem_enable
> >
> >
> > drivers/cxl/core/hdm.c | 65 ++++++++++++++++++------------------------------
> > drivers/cxl/cxl.h | 4 ++-
> > drivers/cxl/port.c | 2 +
> > 3 files changed, 28 insertions(+), 43 deletions(-)
> >
> > base-commit: 23c198e3dfaabbc891681aecb0855b9e0ac791e1
>
>
> not *quite* sure what to make of this yet, but i get stack trace on boot
> on real hardware with this patch. I'm debugging other issues with this
> hardware, so i'm not sure if it's related or not, but prior to this patch
> I did not have a stack trace.
>
>
> I think there's two issues here:
>
> 1) The system I'm on fails to register a CFMW/root port decoder. I'm
> not entirely sure why, other than during cxl_decoder_add(), the
> target map contains "[0,]" as the target id's, and the only
> registered ports/decoders are the endpoints.
>
> I don't know whether this is because the hardware just doesn't have a
> root decoder, or what. But it makes the volatile region patches
> non-functional, and i have to revert back to static configuration to
> use the real cxl device (i.e. don't mark it EFI_MEMORY_SP).
It looks like the BIOS is trying to report something in the CEDT.CFMWS
but it looks
> 2) Per the second bit - there's no component registers being registered
> for this cxl device (plus some spurious DOE error).
If the CEDT is broken then for RCH topologies the device component
registers will also be missing.
>
>
> The no root decoder thing has been throwing me for a loop, if you can
> help me shed some light on this i'd greatly appreciate it. If a socket
> has no decoders, should we expect memory expanders to be managable via
> the volatile region system in the driver?
>
>
> relevant dmesg info
>
> [ 21.928436] cxl root0: Failed to populate active decoder targets
Would be interesting to know if decoder_populate_targets() is returning
-EINVAL or -ENXIO.
> [ 21.929077] cxl_acpi ACPI0017:00: Failed to add decode range [0x1050000000 - 0x304fffffff]
> [ 21.933150] pci0000:3f: host supports CXL (restricted)
This signals this is an RCH topology.
> [... snip ...]
> [ 21.965126] cxl_pci 0000:3f:00.0: No component registers (-19)
> [ 22.001597] cxl_pci 0000:3f:00.0: DOE: [d80] failed to cache protocols : -5
> [ 22.002351] cxl_pci 0000:3f:00.0: Failed to create MB object for MB @ d80
> [ 22.003265] cxl_pci 0000:3f:00.0: Failed to request region 0x0000000000001fff-0x000000000010201e
> [... snip ...]
> [ 22.339973] BUG: unable to handle page fault for address: 0000000000001000
> [ 22.340584] #PF: supervisor read access in kernel mode
> [ 22.346801] #PF: error_code(0x0000) - not-present page
> [ 22.349059] PGD 1339ec067 P4D 0
> [ 22.350877] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 22.354558] CPU: 45 PID: 1351 Comm: systemd-udevd Not tainted 6.2.0+ #7
> [ 22.358357] RIP: 0010:cxl_probe_component_regs+0x23/0x180 [cxl_core]
Can you send the output of:
scripts/faddr2line drivers/cxl/core/cxl_core.ko cxl_probe_component_regs+0x23
...from your kernel build directory?
I suspect this crash can be avoided with an explicit earlier check for
missing component registers, but that's not really a fix for this
failure.
Can you also send the log without these patches applied for comparison?
next prev parent reply other threads:[~2023-03-01 18:46 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-22 1:51 [PATCH 0/2] cxl: DVSEC Range emulation fixups Dan Williams
2023-02-22 1:51 ` [PATCH 1/2] cxl/hdm: Fix double allocation of @cxlhdm Dan Williams
2023-02-22 12:53 ` Jonathan Cameron
2023-02-22 16:57 ` Dave Jiang
2023-02-22 1:51 ` [PATCH 2/2] cxl/hdm: Skip emulation when driver manages mem_enable Dan Williams
2023-02-22 13:22 ` Jonathan Cameron
2023-02-23 5:05 ` Dan Williams
2023-02-22 16:59 ` Dave Jiang
2023-03-31 16:33 ` Fan Ni
2023-02-24 1:14 ` [PATCH 0/2] cxl: DVSEC Range emulation fixups Gregory Price
2023-03-01 18:46 ` Dan Williams [this message]
2023-02-26 7:28 ` Gregory Price
2023-03-03 16:43 ` Gregory Price
2023-03-21 17:17 ` Gregory Price
2023-03-23 17:56 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=63ff9d85215e_495bc294e2@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=dave.jiang@intel.com \
--cc=gregory.price@memverge.com \
--cc=linux-cxl@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox