From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-183.mta1.migadu.com (out-183.mta1.migadu.com [95.215.58.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70B302DCC02 for ; Fri, 23 May 2025 04:56:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.183 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747976189; cv=none; b=OOTZaJbCsKEyad/Aksou6uRZjrA0pRB3w1FSk21zLtb9V+7FguddGvlnQ+q1l12ba0SUpJL9v4dnCm05vuLdKBePQ3qScHwTTCH5/U71KNInT12u7CM6HUrfJEdAsI4U1MpgHVTo/l6MH87RY0dhHlU+RtKxjTOzA/0Ai5Om7sM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747976189; c=relaxed/simple; bh=HROEQe5PwmAT/OKb8/rwJJgCpcVy7spRTXiZ+M+7Fxw=; h=Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc: Message-Id:References:To; b=X2R1zpD7nOMvbU/45yFsHDrTXNhb5j1BNGEWTbrh6I99m8I3J0WxRDu4cq2z4n5cUokAQlNZVEaypFjt8Gf+1RKvbHUIwwl+KtTGCj/Acn2gq5YuyEBj9cX8RAGY+pINNyp6f/Z2VrG0JFOQLJwl1rS5r2exEguAAV1XjpCfFlA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=rpCfndCF; arc=none smtp.client-ip=95.215.58.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="rpCfndCF" Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1747976183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eRb5F0GEI9PYjBXKAOP5D0a7jUXVitFDy6QPJO0kiDs=; b=rpCfndCFy+zG0Xo5UEJFipGetp3gMieJa3IDIezZh/tXWlVlH9rOkj28m5EQWMQ8bKYWc/ Qd61PODkOZwnpjJgrY4O54h86XQIMiMx58oLw3WYTOHMDARpWC94vGfG//X+ALT9triSpz 3bHi7Nss2Np7Oydeeg9ucGil7qQLInY= Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.600.51.1.1\)) Subject: Re: Internal error: Oops: 0000000096000044 [#11] SMP X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Itaru Kitayama In-Reply-To: Date: Fri, 23 May 2025 13:56:04 +0900 Cc: Jonathan Cameron , Dan Williams , Dave Jiang , linux-cxl@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <96235d4d-2bb7-4743-b519-0c35a9a21749@intel.com> <98DE3B2C-1393-4ED8-BB6A-E72D6131F97A@linux.dev> <71238a94-361f-4264-a5e4-510d428f5f66@intel.com> <682e62f8e7073_1626e10066@dwillia2-xfh.jf.intel.com.notmuch> <20250522145622.00002633@huawei.com> To: Alison Schofield X-Migadu-Flow: FLOW_OUT Hi Alison, > On May 23, 2025, at 12:28, Alison Schofield = wrote: >=20 > On Fri, May 23, 2025 at 06:46:53AM +0900, Itaru Kitayama wrote: >> Hi Jonathan, >>=20 >>> On May 22, 2025, at 22:56, Jonathan Cameron = wrote: >>>=20 >>> On Wed, 21 May 2025 16:34:16 -0700 >>> Dan Williams wrote: >>>=20 >>>> Itaru Kitayama wrote: >>>>> Dave et al., =20 >>>> [..] >>>>> Rebuilt the rootfs image and tried today=E2=80=99s cx/next >>>>> (6.15.0-rc4-00046-g6eed708a5693) again to boot now I don=E2=80=99t = see the >>>>> splats, so something I was messing my dev environment sorry about >>>>> that. >>>>>=20 >>>>> CXL utility commands work reasonably now and I can execute meson = test >>>>> =E2=80=94suite cxl, while most of them still fails due to the HPA = allocation >>>>> error which makes me wonder as the resource requests are quite = modest. =20 >>>>=20 >>>> So cxl_test_init() just "hopes" that the top of the system physical >>>> address space is free to use to emulate CXL windows. That might be = an >>>> assumption that only works for x86_64, not ARM64. I would double = check >>>> that this code in cxl_test_init() >>>>=20 >>>> rc =3D gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - = SZ_64G, >>>> SZ_64G, NUMA_NO_NODE); >>>> if (rc) >>>> goto err_gen_pool_add; >>>>=20 >>>> ...is not setting up CXL Windows that overlap with existing = resources in >>>> that range. >>>>=20 >>>=20 >>> I think there are checks that block use of ranges up there. >>>=20 >>> Print I'm seeing is >>> Hotplug memory [0xfffffff010000000-0xfffffff030000000] exceeds = maximum addressable range [0x40000000-0xf80003fffffff] >>>=20 >>> I think right answer is to use mhp_get_pluggable_range(true); to = check >>> for limits on the range we can use. >>>=20 >>> On architectures that don't define arch_get_mappable_range() >>> that ends up the as (unsigned long)-1 which I think would work >>> though there may be other stuff up there. Maybe = min(iomem_resource.end + 1 - SZ_64G, >>> mappable_range.end + 1 - SZ_64G) >>> or something like that adapted to avoid wrap around. >>>=20 >>> I haven't yet sanity checked this doesn't break x86 but I think it = should >>> end up making no difference to the locations on x86. >>>=20 >>>=20 >>> With the below - all 11 tests in ndctl cxl test suite pass for me. >>>=20 >>> =46rom b287ff2c5ee7fbe507ef8cb61df3e4e156a9773f Mon Sep 17 00:00:00 = 2001 >>> From: Jonathan Cameron >>> Date: Thu, 22 May 2025 14:20:42 +0100 >>> Subject: [PATCH] cxl_test: Limit location for fake CFMWS to mappable = range >>>=20 >>> Signed-off-by: Jonathan Cameron >>> --- >>> tools/testing/cxl/test/cxl.c | 6 +++++- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>=20 >>> diff --git a/tools/testing/cxl/test/cxl.c = b/tools/testing/cxl/test/cxl.c >>> index 8a5815ca870d..b4e6c7659ac4 100644 >>> --- a/tools/testing/cxl/test/cxl.c >>> +++ b/tools/testing/cxl/test/cxl.c >>> @@ -1328,6 +1328,7 @@ static int cxl_mem_init(void) >>> static __init int cxl_test_init(void) >>> { >>> int rc, i; >>> + struct range mappable; >>>=20 >>> cxl_acpi_test(); >>> cxl_core_test(); >>> @@ -1342,8 +1343,11 @@ static __init int cxl_test_init(void) >>> rc =3D -ENOMEM; >>> goto err_gen_pool_create; >>> } >>> + mappable =3D mhp_get_pluggable_range(true); >>>=20 >>> - rc =3D gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - = SZ_64G, >>> + rc =3D gen_pool_add(cxl_mock_pool, >>> + min(iomem_resource.end + 1 - SZ_64G, >>> + mappable.end + 1 - SZ_64G), >>> SZ_64G, NUMA_NO_NODE); >>> if (rc) >>> goto err_gen_pool_add; >>> --=20 >>> 2.43.0 >>>=20 >>=20 >> Tested-by: Itaru Kitayama > >>=20 >> # meson test --suite cxl >> ninja: Entering directory `/root/ndctl/build' >> [1/82] Generating version.h with a custom command >> 1/12 ndctl:cxl / cxl-topology.sh OK = 33.96s >> 2/12 ndctl:cxl / cxl-region-sysfs.sh OK = 18.00s >> 3/12 ndctl:cxl / cxl-labels.sh OK = 23.78s >> 4/12 ndctl:cxl / cxl-create-region.sh OK = 43.03s >> 5/12 ndctl:cxl / cxl-xor-region.sh OK = 19.30s >> 6/12 ndctl:cxl / cxl-events.sh FAIL = 6.40s exit status 1 >>>>> = LD_LIBRARY_PATH=3D/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/= root/ndctl/build/ndctl/lib MALLOC_PERTURB_=3D45 = TEST_PATH=3D/root/ndctl/build/test = UBSAN_OPTIONS=3Dhalt_on_error=3D1:abort_on_error=3D1:print_summary=3D1:pri= nt_stacktrace=3D1 = MSAN_OPTIONS=3Dhalt_on_error=3D1:abort_on_error=3D1:print_summary=3D1:prin= t_stacktrace=3D1 MESON_TEST_ITERATION=3D1 = DAXCTL=3D/root/ndctl/build/daxctl/daxctl = NDCTL=3D/root/ndctl/build/ndctl/ndctl = ASAN_OPTIONS=3Dhalt_on_error=3D1:abort_on_error=3D1:print_summary=3D1 = DATA_PATH=3D/root/ndctl/test /bin/bash /root/ndctl/test/cxl-events.sh >>=20 >> 7/12 ndctl:cxl / cxl-sanitize.sh OK = 14.77s >> 8/12 ndctl:cxl / cxl-destroy-region.sh OK = 13.69s >> 9/12 ndctl:cxl / cxl-qos-class.sh OK = 14.31s >> 10/12 ndctl:cxl / cxl-poison.sh FAIL = 3.46s exit status 1 >>>>> = LD_LIBRARY_PATH=3D/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/= root/ndctl/build/ndctl/lib = MSAN_OPTIONS=3Dhalt_on_error=3D1:abort_on_error=3D1:print_summary=3D1:prin= t_stacktrace=3D1 MALLOC_PERTURB_=3D80 = UBSAN_OPTIONS=3Dhalt_on_error=3D1:abort_on_error=3D1:print_summary=3D1:pri= nt_stacktrace=3D1 TEST_PATH=3D/root/ndctl/build/test = MESON_TEST_ITERATION=3D1 DAXCTL=3D/root/ndctl/build/daxctl/daxctl = NDCTL=3D/root/ndctl/build/ndctl/ndctl = ASAN_OPTIONS=3Dhalt_on_error=3D1:abort_on_error=3D1:print_summary=3D1 = DATA_PATH=3D/root/ndctl/test /bin/bash /root/ndctl/test/cxl-poison.sh >>=20 >> 11/12 ndctl:cxl / cxl-update-firmware.sh OK = 66.23s >> 12/12 ndctl:cxl / cxl-security.sh SKIP = 0.34s exit status 77 >>=20 >> Ok: 9 >> Expected Fail: 0 >> Fail: 2 >> Unexpected Pass: 0 >> Skipped: 1 >> Timeout: 0 >>=20 >> My understanding is that these CXL tests are using mock CFMWs, not = the actual physical memory regions at their fixed locations. So I wonder = executing these set of test on a =E2=80=9Csane" CXL emulation setup = (run_qemu.sh creates) that the Intel folk is using does matter or not. >=20 > Right - these test run on the mock CFMW's that the cxl-test module > creates. As far as running on a 'sane' CXL emulation setup, like > run_qemu.sh, I may not be understanding the question, but I'll take > a shot. The qemu defined CXL devices do not matter at all for the cxl > unit test run. The unit tests only uses the mock cxl/test environment > provided by the cxl-test module. The qemu CXL devices are irrelevant. Ah, I see thanks for the clarification. That=E2=80=99s what I needed to = know. =20 >=20 > Let me know if I missed the point of you were making. >=20 > I noticed your test output FAIL cases, probably for CONFIG_TRACING not > enabled, and posted a patch to turn those into SKIPs. Indeed, by looking at the test logs I figured that. Now like Jonathan = confirmed I just seen the same results: 1/12 ndctl:cxl / cxl-topology.sh OK 106.48s 2/12 ndctl:cxl / cxl-region-sysfs.sh OK 55.90s 3/12 ndctl:cxl / cxl-labels.sh OK 54.95s 4/12 ndctl:cxl / cxl-create-region.sh OK 141.98s 5/12 ndctl:cxl / cxl-xor-region.sh OK 66.00s 6/12 ndctl:cxl / cxl-events.sh OK 33.82s 7/12 ndctl:cxl / cxl-sanitize.sh OK 34.92s 8/12 ndctl:cxl / cxl-destroy-region.sh OK 41.08s 9/12 ndctl:cxl / cxl-qos-class.sh OK 40.55s 10/12 ndctl:cxl / cxl-poison.sh OK 82.08s 11/12 ndctl:cxl / cxl-update-firmware.sh OK 99.39s 12/12 ndctl:cxl / cxl-security.sh SKIP 1.03s = exit status 77 Thanks again for your comments. Itaru. >=20 > --Alison >=20 >>=20 >> Itaru.