Linux CXL
 help / color / mirror / Atom feed
From: Itaru Kitayama <itaru.kitayama@linux.dev>
To: Alison Schofield <alison.schofield@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	linux-cxl@vger.kernel.org
Subject: Re: Internal error: Oops: 0000000096000044 [#11] SMP
Date: Fri, 23 May 2025 13:56:04 +0900	[thread overview]
Message-ID: <F7FA1D5E-9BCB-44C0-8E18-3313D490D6F7@linux.dev> (raw)
In-Reply-To: <aC_rWIc9TY5F2wGf@aschofie-mobl2.lan>

Hi Alison,

> On May 23, 2025, at 12:28, Alison Schofield <alison.schofield@intel.com> wrote:
> 
> On Fri, May 23, 2025 at 06:46:53AM +0900, Itaru Kitayama wrote:
>> Hi Jonathan,
>> 
>>> On May 22, 2025, at 22:56, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>>> 
>>> On Wed, 21 May 2025 16:34:16 -0700
>>> Dan Williams <dan.j.williams@intel.com> wrote:
>>> 
>>>> Itaru Kitayama wrote:
>>>>> Dave et al.,  
>>>> [..]
>>>>> Rebuilt the rootfs image and tried today’s cx/next
>>>>> (6.15.0-rc4-00046-g6eed708a5693) again to boot now I don’t see the
>>>>> splats, so something I was messing my dev environment sorry about
>>>>> that.
>>>>> 
>>>>> CXL utility commands work reasonably now and I can execute meson test
>>>>> —suite cxl, while most of them still fails due to the HPA allocation
>>>>> error which makes me wonder as the resource requests are quite modest.   
>>>> 
>>>> So cxl_test_init() just "hopes" that the top of the system physical
>>>> address space is free to use to emulate CXL windows. That might be an
>>>> assumption that only works for x86_64, not ARM64. I would double check
>>>> that this code in cxl_test_init()
>>>> 
>>>>       rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G,
>>>>                         SZ_64G, NUMA_NO_NODE);
>>>>       if (rc)
>>>>               goto err_gen_pool_add;
>>>> 
>>>> ...is not setting up CXL Windows that overlap with existing resources in
>>>> that range.
>>>> 
>>> 
>>> I think there are checks that block use of ranges up there.
>>> 
>>> Print I'm seeing is
>>> Hotplug memory [0xfffffff010000000-0xfffffff030000000] exceeds maximum addressable range [0x40000000-0xf80003fffffff]
>>> 
>>> I think right answer is to use mhp_get_pluggable_range(true); to check
>>> for limits on the range we can use.
>>> 
>>> On architectures that don't define arch_get_mappable_range()
>>> that ends up the as (unsigned long)-1 which I think would work
>>> though there may be other stuff up there.  Maybe min(iomem_resource.end + 1 - SZ_64G,
>>>    mappable_range.end + 1 - SZ_64G)
>>> or something like that adapted to avoid wrap around.
>>> 
>>> I haven't yet sanity checked this doesn't break x86 but I think it should
>>> end up making no difference to the locations on x86.
>>> 
>>> 
>>> With the below - all 11 tests in ndctl cxl test suite pass for me.
>>> 
>>> From b287ff2c5ee7fbe507ef8cb61df3e4e156a9773f Mon Sep 17 00:00:00 2001
>>> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Date: Thu, 22 May 2025 14:20:42 +0100
>>> Subject: [PATCH] cxl_test: Limit location for fake CFMWS to mappable range
>>> 
>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> ---
>>> tools/testing/cxl/test/cxl.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
>>> index 8a5815ca870d..b4e6c7659ac4 100644
>>> --- a/tools/testing/cxl/test/cxl.c
>>> +++ b/tools/testing/cxl/test/cxl.c
>>> @@ -1328,6 +1328,7 @@ static int cxl_mem_init(void)
>>> static __init int cxl_test_init(void)
>>> {
>>> int rc, i;
>>> + struct range mappable;
>>> 
>>> cxl_acpi_test();
>>> cxl_core_test();
>>> @@ -1342,8 +1343,11 @@ static __init int cxl_test_init(void)
>>> rc = -ENOMEM;
>>> goto err_gen_pool_create;
>>> }
>>> + mappable = mhp_get_pluggable_range(true);
>>> 
>>> - rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G,
>>> + rc = gen_pool_add(cxl_mock_pool,
>>> +  min(iomem_resource.end + 1 - SZ_64G,
>>> +      mappable.end + 1 - SZ_64G),
>>> SZ_64G, NUMA_NO_NODE);
>>> if (rc)
>>> goto err_gen_pool_add;
>>> -- 
>>> 2.43.0
>>> 
>> 
>> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com <mailto:itaru.kitayama@fujitsu.com>>
>> 
>> # meson test --suite cxl
>> ninja: Entering directory `/root/ndctl/build'
>> [1/82] Generating version.h with a custom command
>> 1/12 ndctl:cxl / cxl-topology.sh                OK              33.96s
>> 2/12 ndctl:cxl / cxl-region-sysfs.sh            OK              18.00s
>> 3/12 ndctl:cxl / cxl-labels.sh                  OK              23.78s
>> 4/12 ndctl:cxl / cxl-create-region.sh           OK              43.03s
>> 5/12 ndctl:cxl / cxl-xor-region.sh              OK              19.30s
>> 6/12 ndctl:cxl / cxl-events.sh                  FAIL             6.40s   exit status 1
>>>>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MALLOC_PERTURB_=45 TEST_PATH=/root/ndctl/build/test UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-events.sh
>> 
>> 7/12 ndctl:cxl / cxl-sanitize.sh                OK              14.77s
>> 8/12 ndctl:cxl / cxl-destroy-region.sh          OK              13.69s
>> 9/12 ndctl:cxl / cxl-qos-class.sh               OK              14.31s
>> 10/12 ndctl:cxl / cxl-poison.sh                  FAIL             3.46s   exit status 1
>>>>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MALLOC_PERTURB_=80 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 TEST_PATH=/root/ndctl/build/test MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-poison.sh
>> 
>> 11/12 ndctl:cxl / cxl-update-firmware.sh         OK              66.23s
>> 12/12 ndctl:cxl / cxl-security.sh                SKIP             0.34s   exit status 77
>> 
>> Ok:                 9
>> Expected Fail:      0
>> Fail:               2
>> Unexpected Pass:    0
>> Skipped:            1
>> Timeout:            0
>> 
>> My understanding is that these CXL tests are using mock CFMWs, not the actual physical memory regions at their fixed locations. So I wonder executing these set of test on a “sane" CXL emulation setup (run_qemu.sh creates) that the Intel folk is using does matter or not.
> 
> Right - these test run on the mock CFMW's that the cxl-test module
> creates. As far as running on a 'sane' CXL emulation setup, like
> run_qemu.sh, I may not be understanding the question, but I'll take
> a shot. The qemu defined CXL devices do not matter at all for the cxl
> unit test run. The unit tests only uses the mock cxl/test environment
> provided by the cxl-test module. The qemu CXL devices are irrelevant.

Ah, I see thanks for the clarification. That’s what I needed to know.   

> 
> Let me know if I missed the point of you were making.
> 
> I noticed your test output FAIL cases, probably for CONFIG_TRACING not
> enabled, and posted a patch to turn those into SKIPs.

Indeed, by looking at the test logs I figured that. Now like Jonathan confirmed I just seen the same results:

 1/12 ndctl:cxl / cxl-topology.sh                OK             106.48s
 2/12 ndctl:cxl / cxl-region-sysfs.sh            OK              55.90s
 3/12 ndctl:cxl / cxl-labels.sh                  OK              54.95s
 4/12 ndctl:cxl / cxl-create-region.sh           OK             141.98s
 5/12 ndctl:cxl / cxl-xor-region.sh              OK              66.00s
 6/12 ndctl:cxl / cxl-events.sh                  OK              33.82s
 7/12 ndctl:cxl / cxl-sanitize.sh                OK              34.92s
 8/12 ndctl:cxl / cxl-destroy-region.sh          OK              41.08s
 9/12 ndctl:cxl / cxl-qos-class.sh               OK              40.55s
10/12 ndctl:cxl / cxl-poison.sh                  OK              82.08s
11/12 ndctl:cxl / cxl-update-firmware.sh         OK              99.39s
12/12 ndctl:cxl / cxl-security.sh                SKIP             1.03s   exit status 77

Thanks again for your comments.

Itaru.

> 
> --Alison
> 
>> 
>> Itaru.



  reply	other threads:[~2025-05-23  4:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-21  8:39 Internal error: Oops: 0000000096000044 [#11] SMP Itaru Kitayama
2025-05-21 15:31 ` Dave Jiang
2025-05-21 20:38   ` Itaru Kitayama
2025-05-21 20:46     ` Dave Jiang
2025-05-21 23:28       ` Itaru Kitayama
2025-05-21 23:34         ` Dan Williams
2025-05-22 13:56           ` Jonathan Cameron
2025-05-22 18:19             ` Dan Williams
2025-05-22 21:46             ` Itaru Kitayama
2025-05-23  3:28               ` Alison Schofield
2025-05-23  4:56                 ` Itaru Kitayama [this message]
2025-05-23  5:52             ` Marc Herbert
2025-05-21 15:33 ` Alison Schofield
2025-05-21 15:36 ` Jonathan Cameron
2025-05-21 15:41 ` Alison Schofield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F7FA1D5E-9BCB-44C0-8E18-3313D490D6F7@linux.dev \
    --to=itaru.kitayama@linux.dev \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox