Linux CXL
 help / color / mirror / Atom feed
From: Alison Schofield <alison.schofield@intel.com>
To: Itaru Kitayama <itaru.kitayama@linux.dev>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Jiang <dave.jiang@intel.com>, <linux-cxl@vger.kernel.org>
Subject: Re: Internal error: Oops: 0000000096000044 [#11] SMP
Date: Thu, 22 May 2025 20:28:24 -0700	[thread overview]
Message-ID: <aC_rWIc9TY5F2wGf@aschofie-mobl2.lan> (raw)
In-Reply-To: <FD4183E1-162E-4790-B865-E50F20249A74@linux.dev>

On Fri, May 23, 2025 at 06:46:53AM +0900, Itaru Kitayama wrote:
> Hi Jonathan,
> 
> > On May 22, 2025, at 22:56, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > 
> > On Wed, 21 May 2025 16:34:16 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> > 
> >> Itaru Kitayama wrote:
> >>> Dave et al.,  
> >> [..]
> >>> Rebuilt the rootfs image and tried today’s cx/next
> >>> (6.15.0-rc4-00046-g6eed708a5693) again to boot now I don’t see the
> >>> splats, so something I was messing my dev environment sorry about
> >>> that.
> >>> 
> >>> CXL utility commands work reasonably now and I can execute meson test
> >>> —suite cxl, while most of them still fails due to the HPA allocation
> >>> error which makes me wonder as the resource requests are quite modest.   
> >> 
> >> So cxl_test_init() just "hopes" that the top of the system physical
> >> address space is free to use to emulate CXL windows. That might be an
> >> assumption that only works for x86_64, not ARM64. I would double check
> >> that this code in cxl_test_init()
> >> 
> >>        rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G,
> >>                          SZ_64G, NUMA_NO_NODE);
> >>        if (rc)
> >>                goto err_gen_pool_add;
> >> 
> >> ...is not setting up CXL Windows that overlap with existing resources in
> >> that range.
> >> 
> > 
> > I think there are checks that block use of ranges up there.
> > 
> > Print I'm seeing is
> > Hotplug memory [0xfffffff010000000-0xfffffff030000000] exceeds maximum addressable range [0x40000000-0xf80003fffffff]
> > 
> > I think right answer is to use mhp_get_pluggable_range(true); to check
> > for limits on the range we can use.
> > 
> > On architectures that don't define arch_get_mappable_range()
> > that ends up the as (unsigned long)-1 which I think would work
> > though there may be other stuff up there.  Maybe min(iomem_resource.end + 1 - SZ_64G,
> >     mappable_range.end + 1 - SZ_64G)
> > or something like that adapted to avoid wrap around.
> > 
> > I haven't yet sanity checked this doesn't break x86 but I think it should
> > end up making no difference to the locations on x86.
> > 
> > 
> > With the below - all 11 tests in ndctl cxl test suite pass for me.
> > 
> > From b287ff2c5ee7fbe507ef8cb61df3e4e156a9773f Mon Sep 17 00:00:00 2001
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Date: Thu, 22 May 2025 14:20:42 +0100
> > Subject: [PATCH] cxl_test: Limit location for fake CFMWS to mappable range
> > 
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> > tools/testing/cxl/test/cxl.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> > index 8a5815ca870d..b4e6c7659ac4 100644
> > --- a/tools/testing/cxl/test/cxl.c
> > +++ b/tools/testing/cxl/test/cxl.c
> > @@ -1328,6 +1328,7 @@ static int cxl_mem_init(void)
> > static __init int cxl_test_init(void)
> > {
> > int rc, i;
> > + struct range mappable;
> > 
> > cxl_acpi_test();
> > cxl_core_test();
> > @@ -1342,8 +1343,11 @@ static __init int cxl_test_init(void)
> > rc = -ENOMEM;
> > goto err_gen_pool_create;
> > }
> > + mappable = mhp_get_pluggable_range(true);
> > 
> > - rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G,
> > + rc = gen_pool_add(cxl_mock_pool,
> > +  min(iomem_resource.end + 1 - SZ_64G,
> > +      mappable.end + 1 - SZ_64G),
> >  SZ_64G, NUMA_NO_NODE);
> > if (rc)
> > goto err_gen_pool_add;
> > -- 
> > 2.43.0
> > 
> 
> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com <mailto:itaru.kitayama@fujitsu.com>>
> 
> # meson test --suite cxl
> ninja: Entering directory `/root/ndctl/build'
> [1/82] Generating version.h with a custom command
>  1/12 ndctl:cxl / cxl-topology.sh                OK              33.96s
>  2/12 ndctl:cxl / cxl-region-sysfs.sh            OK              18.00s
>  3/12 ndctl:cxl / cxl-labels.sh                  OK              23.78s
>  4/12 ndctl:cxl / cxl-create-region.sh           OK              43.03s
>  5/12 ndctl:cxl / cxl-xor-region.sh              OK              19.30s
>  6/12 ndctl:cxl / cxl-events.sh                  FAIL             6.40s   exit status 1
> >>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MALLOC_PERTURB_=45 TEST_PATH=/root/ndctl/build/test UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-events.sh
> 
>  7/12 ndctl:cxl / cxl-sanitize.sh                OK              14.77s
>  8/12 ndctl:cxl / cxl-destroy-region.sh          OK              13.69s
>  9/12 ndctl:cxl / cxl-qos-class.sh               OK              14.31s
> 10/12 ndctl:cxl / cxl-poison.sh                  FAIL             3.46s   exit status 1
> >>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MALLOC_PERTURB_=80 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 TEST_PATH=/root/ndctl/build/test MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-poison.sh
> 
> 11/12 ndctl:cxl / cxl-update-firmware.sh         OK              66.23s
> 12/12 ndctl:cxl / cxl-security.sh                SKIP             0.34s   exit status 77
> 
> Ok:                 9
> Expected Fail:      0
> Fail:               2
> Unexpected Pass:    0
> Skipped:            1
> Timeout:            0
> 
> My understanding is that these CXL tests are using mock CFMWs, not the actual physical memory regions at their fixed locations. So I wonder executing these set of test on a “sane" CXL emulation setup (run_qemu.sh creates) that the Intel folk is using does matter or not.

Right - these test run on the mock CFMW's that the cxl-test module
creates. As far as running on a 'sane' CXL emulation setup, like
run_qemu.sh, I may not be understanding the question, but I'll take
a shot. The qemu defined CXL devices do not matter at all for the cxl
unit test run. The unit tests only uses the mock cxl/test environment
provided by the cxl-test module. The qemu CXL devices are irrelevant.

Let me know if I missed the point of you were making.

I noticed your test output FAIL cases, probably for CONFIG_TRACING not
enabled, and posted a patch to turn those into SKIPs.

--Alison

> 
> Itaru.

  reply	other threads:[~2025-05-23  3:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-21  8:39 Internal error: Oops: 0000000096000044 [#11] SMP Itaru Kitayama
2025-05-21 15:31 ` Dave Jiang
2025-05-21 20:38   ` Itaru Kitayama
2025-05-21 20:46     ` Dave Jiang
2025-05-21 23:28       ` Itaru Kitayama
2025-05-21 23:34         ` Dan Williams
2025-05-22 13:56           ` Jonathan Cameron
2025-05-22 18:19             ` Dan Williams
2025-05-22 21:46             ` Itaru Kitayama
2025-05-23  3:28               ` Alison Schofield [this message]
2025-05-23  4:56                 ` Itaru Kitayama
2025-05-23  5:52             ` Marc Herbert
2025-05-21 15:33 ` Alison Schofield
2025-05-21 15:36 ` Jonathan Cameron
2025-05-21 15:41 ` Alison Schofield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aC_rWIc9TY5F2wGf@aschofie-mobl2.lan \
    --to=alison.schofield@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=itaru.kitayama@linux.dev \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox