From: Itaru Kitayama <itaru.kitayama@linux.dev>
To: Alison Schofield <alison.schofield@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Dan Williams <dan.j.williams@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
linux-cxl@vger.kernel.org
Subject: Re: Internal error: Oops: 0000000096000044 [#11] SMP
Date: Fri, 23 May 2025 13:56:04 +0900 [thread overview]
Message-ID: <F7FA1D5E-9BCB-44C0-8E18-3313D490D6F7@linux.dev> (raw)
In-Reply-To: <aC_rWIc9TY5F2wGf@aschofie-mobl2.lan>
Hi Alison,
> On May 23, 2025, at 12:28, Alison Schofield <alison.schofield@intel.com> wrote:
>
> On Fri, May 23, 2025 at 06:46:53AM +0900, Itaru Kitayama wrote:
>> Hi Jonathan,
>>
>>> On May 22, 2025, at 22:56, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>>>
>>> On Wed, 21 May 2025 16:34:16 -0700
>>> Dan Williams <dan.j.williams@intel.com> wrote:
>>>
>>>> Itaru Kitayama wrote:
>>>>> Dave et al.,
>>>> [..]
>>>>> Rebuilt the rootfs image and tried today’s cx/next
>>>>> (6.15.0-rc4-00046-g6eed708a5693) again to boot now I don’t see the
>>>>> splats, so something I was messing my dev environment sorry about
>>>>> that.
>>>>>
>>>>> CXL utility commands work reasonably now and I can execute meson test
>>>>> —suite cxl, while most of them still fails due to the HPA allocation
>>>>> error which makes me wonder as the resource requests are quite modest.
>>>>
>>>> So cxl_test_init() just "hopes" that the top of the system physical
>>>> address space is free to use to emulate CXL windows. That might be an
>>>> assumption that only works for x86_64, not ARM64. I would double check
>>>> that this code in cxl_test_init()
>>>>
>>>> rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G,
>>>> SZ_64G, NUMA_NO_NODE);
>>>> if (rc)
>>>> goto err_gen_pool_add;
>>>>
>>>> ...is not setting up CXL Windows that overlap with existing resources in
>>>> that range.
>>>>
>>>
>>> I think there are checks that block use of ranges up there.
>>>
>>> Print I'm seeing is
>>> Hotplug memory [0xfffffff010000000-0xfffffff030000000] exceeds maximum addressable range [0x40000000-0xf80003fffffff]
>>>
>>> I think right answer is to use mhp_get_pluggable_range(true); to check
>>> for limits on the range we can use.
>>>
>>> On architectures that don't define arch_get_mappable_range()
>>> that ends up the as (unsigned long)-1 which I think would work
>>> though there may be other stuff up there. Maybe min(iomem_resource.end + 1 - SZ_64G,
>>> mappable_range.end + 1 - SZ_64G)
>>> or something like that adapted to avoid wrap around.
>>>
>>> I haven't yet sanity checked this doesn't break x86 but I think it should
>>> end up making no difference to the locations on x86.
>>>
>>>
>>> With the below - all 11 tests in ndctl cxl test suite pass for me.
>>>
>>> From b287ff2c5ee7fbe507ef8cb61df3e4e156a9773f Mon Sep 17 00:00:00 2001
>>> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Date: Thu, 22 May 2025 14:20:42 +0100
>>> Subject: [PATCH] cxl_test: Limit location for fake CFMWS to mappable range
>>>
>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> ---
>>> tools/testing/cxl/test/cxl.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
>>> index 8a5815ca870d..b4e6c7659ac4 100644
>>> --- a/tools/testing/cxl/test/cxl.c
>>> +++ b/tools/testing/cxl/test/cxl.c
>>> @@ -1328,6 +1328,7 @@ static int cxl_mem_init(void)
>>> static __init int cxl_test_init(void)
>>> {
>>> int rc, i;
>>> + struct range mappable;
>>>
>>> cxl_acpi_test();
>>> cxl_core_test();
>>> @@ -1342,8 +1343,11 @@ static __init int cxl_test_init(void)
>>> rc = -ENOMEM;
>>> goto err_gen_pool_create;
>>> }
>>> + mappable = mhp_get_pluggable_range(true);
>>>
>>> - rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G,
>>> + rc = gen_pool_add(cxl_mock_pool,
>>> + min(iomem_resource.end + 1 - SZ_64G,
>>> + mappable.end + 1 - SZ_64G),
>>> SZ_64G, NUMA_NO_NODE);
>>> if (rc)
>>> goto err_gen_pool_add;
>>> --
>>> 2.43.0
>>>
>>
>> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com <mailto:itaru.kitayama@fujitsu.com>>
>>
>> # meson test --suite cxl
>> ninja: Entering directory `/root/ndctl/build'
>> [1/82] Generating version.h with a custom command
>> 1/12 ndctl:cxl / cxl-topology.sh OK 33.96s
>> 2/12 ndctl:cxl / cxl-region-sysfs.sh OK 18.00s
>> 3/12 ndctl:cxl / cxl-labels.sh OK 23.78s
>> 4/12 ndctl:cxl / cxl-create-region.sh OK 43.03s
>> 5/12 ndctl:cxl / cxl-xor-region.sh OK 19.30s
>> 6/12 ndctl:cxl / cxl-events.sh FAIL 6.40s exit status 1
>>>>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MALLOC_PERTURB_=45 TEST_PATH=/root/ndctl/build/test UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-events.sh
>>
>> 7/12 ndctl:cxl / cxl-sanitize.sh OK 14.77s
>> 8/12 ndctl:cxl / cxl-destroy-region.sh OK 13.69s
>> 9/12 ndctl:cxl / cxl-qos-class.sh OK 14.31s
>> 10/12 ndctl:cxl / cxl-poison.sh FAIL 3.46s exit status 1
>>>>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MALLOC_PERTURB_=80 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 TEST_PATH=/root/ndctl/build/test MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-poison.sh
>>
>> 11/12 ndctl:cxl / cxl-update-firmware.sh OK 66.23s
>> 12/12 ndctl:cxl / cxl-security.sh SKIP 0.34s exit status 77
>>
>> Ok: 9
>> Expected Fail: 0
>> Fail: 2
>> Unexpected Pass: 0
>> Skipped: 1
>> Timeout: 0
>>
>> My understanding is that these CXL tests are using mock CFMWs, not the actual physical memory regions at their fixed locations. So I wonder executing these set of test on a “sane" CXL emulation setup (run_qemu.sh creates) that the Intel folk is using does matter or not.
>
> Right - these test run on the mock CFMW's that the cxl-test module
> creates. As far as running on a 'sane' CXL emulation setup, like
> run_qemu.sh, I may not be understanding the question, but I'll take
> a shot. The qemu defined CXL devices do not matter at all for the cxl
> unit test run. The unit tests only uses the mock cxl/test environment
> provided by the cxl-test module. The qemu CXL devices are irrelevant.
Ah, I see thanks for the clarification. That’s what I needed to know.
>
> Let me know if I missed the point of you were making.
>
> I noticed your test output FAIL cases, probably for CONFIG_TRACING not
> enabled, and posted a patch to turn those into SKIPs.
Indeed, by looking at the test logs I figured that. Now like Jonathan confirmed I just seen the same results:
1/12 ndctl:cxl / cxl-topology.sh OK 106.48s
2/12 ndctl:cxl / cxl-region-sysfs.sh OK 55.90s
3/12 ndctl:cxl / cxl-labels.sh OK 54.95s
4/12 ndctl:cxl / cxl-create-region.sh OK 141.98s
5/12 ndctl:cxl / cxl-xor-region.sh OK 66.00s
6/12 ndctl:cxl / cxl-events.sh OK 33.82s
7/12 ndctl:cxl / cxl-sanitize.sh OK 34.92s
8/12 ndctl:cxl / cxl-destroy-region.sh OK 41.08s
9/12 ndctl:cxl / cxl-qos-class.sh OK 40.55s
10/12 ndctl:cxl / cxl-poison.sh OK 82.08s
11/12 ndctl:cxl / cxl-update-firmware.sh OK 99.39s
12/12 ndctl:cxl / cxl-security.sh SKIP 1.03s exit status 77
Thanks again for your comments.
Itaru.
>
> --Alison
>
>>
>> Itaru.
next prev parent reply other threads:[~2025-05-23 4:56 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 8:39 Internal error: Oops: 0000000096000044 [#11] SMP Itaru Kitayama
2025-05-21 15:31 ` Dave Jiang
2025-05-21 20:38 ` Itaru Kitayama
2025-05-21 20:46 ` Dave Jiang
2025-05-21 23:28 ` Itaru Kitayama
2025-05-21 23:34 ` Dan Williams
2025-05-22 13:56 ` Jonathan Cameron
2025-05-22 18:19 ` Dan Williams
2025-05-22 21:46 ` Itaru Kitayama
2025-05-23 3:28 ` Alison Schofield
2025-05-23 4:56 ` Itaru Kitayama [this message]
2025-05-23 5:52 ` Marc Herbert
2025-05-21 15:33 ` Alison Schofield
2025-05-21 15:36 ` Jonathan Cameron
2025-05-21 15:41 ` Alison Schofield
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=F7FA1D5E-9BCB-44C0-8E18-3313D490D6F7@linux.dev \
--to=itaru.kitayama@linux.dev \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=linux-cxl@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox