* Internal error: Oops: 0000000096000044 [#11] SMP
@ 2025-05-21 8:39 Itaru Kitayama
2025-05-21 15:31 ` Dave Jiang
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Itaru Kitayama @ 2025-05-21 8:39 UTC (permalink / raw)
To: linux-cxl
Hi,
On arm64/virt QEMU, the cxl/next (as of today) kernel prints out Internal errors:
[ 80.968299] [ T48] Internal error: Oops: 0000000096000044 [#11]
SMP
[ 80.989250] [ T48] Modules linked in: cxl_mock_mem(O) cfg80211
rfkill cxl_test(O) cxl_mem(O) cxl_pmem(O) cxl_acpi(O) cxl_port(O)
cxl_mock(O) libnvdimm encrypted_keys trusted caam_jr caam asn1_encoder
caamhash_desc caamalg_desc error crypto_engine authenc libdes fuse drm
backlight ip_tables x_tables sm3_ce sm3 sha3_ce sha512_ce sha512_arm64
cxl_core(O) fwctl btrfs blake2b_generic xor xor_neon raid6_pq
zstd_compress ipv6
[ 80.992210] [ T48] CPU: 1 UID: 0 PID: 48 Comm: kworker/u8:2
Tainted: G D O 6.15.0-rc4-00040-g128ad8fa385b #40 PREEMPT
[ 80.992791] [ T48] Tainted: [D]=DIE, [O]=OOT_MODULE
[ 80.993039] [ T48] Hardware name: QEMU QEMU Virtual Machine, BIOS
2025.02-3ubuntu2 04/04/2025
[ 80.993400] [ T48] Workqueue: async async_run_entry_fn
[ 80.994718] [ T48] pstate: 61402005 (nZCv daif +PAN -UAO -TCO
+DIT -SSBS BTYPE=--)
[ 80.995329] [ T48] pc : cxl_mock_mbox_send+0xec/0x12c0
[cxl_mock_mem]
[ 80.995691] [ T48] lr : cxl_internal_send_cmd+0x40/0x104
[cxl_core]
[ 80.996189] [ T48] sp : ffff800080d0b9f0
[ 80.996380] [ T48] x29: ffff800080d0ba70 x28: fff0000008dd2410
x27: fff00000088fb390
[ 80.996714] [ T48] x26: ffff800080d0bb07 x25: 0000000000000100
x24: 0000000000000003
[ 80.997135] [ T48] x23: 0000000000000020 x22: fff0000008dd2410
x21: 0000000000000002
[ 80.998119] [ T48] x20: fff00000088fb080 x19: ffff800080d0bb08
x18: 00000000ffffffff
[ 80.998419] [ T48] x17: 0000000000000000 x16: ffffa8d169128748
x15: ffff800080d0b5ad
[ 80.999243] [ T48] x14: ffff800080d0b400 x13: ffff800080d0b5b8
x12: fff000006f7a0000
[ 81.000519] [ T48] x11: 0000000000000058 x10: 0000000000000018 x9
: fff000006f7a0000
[ 81.001337] [ T48] x8 : ffff800080d0bb48 x7 : fff0000074fa0000 x6
: fff0000074fa0000
[ 81.002497] [ T48] x5 : fff000007f937508 x4 : 0000000000000001 x3
: 0000000000001000
[ 81.003993] [ T48] x2 : 0000000000001000 x1 : 0000000000000000 x0
: 0000000000000088
[ 81.004223] [ T48] Call trace:
[ 81.004795] [ T48] cxl_mock_mbox_send+0xec/0x12c0 [cxl_mock_mem]
(P)
[ 81.005136] [ T48] cxl_internal_send_cmd+0x40/0x104 [cxl_core]
[ 81.005520] [ T48] cxl_mem_get_records_log+0xbc/0x198 [cxl_core]
[ 81.006042] [ T48] cxl_mem_get_event_records+0xb0/0xc0
[cxl_core]
[ 81.006246] [ T48] cxl_mock_mem_probe+0x568/0x6f0 [cxl_mock_mem]
[ 81.006417] [ T48] platform_probe+0x68/0xd8
[ 81.008340] [ T48] really_probe+0xc0/0x39c
[ 81.008885] [ T48] __driver_probe_device+0xd0/0x14c
[ 81.009539] [ T48] driver_probe_device+0x3c/0x120
[ 81.010239] [ T48] __driver_attach_async_helper+0x50/0xec
[ 81.011130] [ T48] async_run_entry_fn+0x34/0x14c
[ 81.011276] [ T48] process_one_work+0x148/0x284
[ 81.011420] [ T48] worker_thread+0x2c4/0x3e0
[ 81.011552] [ T48] kthread+0x12c/0x204
[ 81.011693] [ T48] ret_from_fork+0x10/0x20
[ 81.011840] [ T48] Code: 54001b28 a90c6bf9 52801100 f9400a61
(a9007c3f)
[ 81.013772] [ T48] ---[ end trace 0000000000000000 ]---
How serious is this?
Thanks,
Itaru.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-21 8:39 Internal error: Oops: 0000000096000044 [#11] SMP Itaru Kitayama @ 2025-05-21 15:31 ` Dave Jiang 2025-05-21 20:38 ` Itaru Kitayama 2025-05-21 15:33 ` Alison Schofield ` (2 subsequent siblings) 3 siblings, 1 reply; 15+ messages in thread From: Dave Jiang @ 2025-05-21 15:31 UTC (permalink / raw) To: Itaru Kitayama, linux-cxl On 5/21/25 1:39 AM, Itaru Kitayama wrote: > Hi, > On arm64/virt QEMU, the cxl/next (as of today) kernel prints out Internal errors: > > [ 80.968299] [ T48] Internal error: Oops: 0000000096000044 [#11] > SMP > [ 80.989250] [ T48] Modules linked in: cxl_mock_mem(O) cfg80211 > rfkill cxl_test(O) cxl_mem(O) cxl_pmem(O) cxl_acpi(O) cxl_port(O) > cxl_mock(O) libnvdimm encrypted_keys trusted caam_jr caam asn1_encoder > caamhash_desc caamalg_desc error crypto_engine authenc libdes fuse drm > backlight ip_tables x_tables sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 > cxl_core(O) fwctl btrfs blake2b_generic xor xor_neon raid6_pq > zstd_compress ipv6 > [ 80.992210] [ T48] CPU: 1 UID: 0 PID: 48 Comm: kworker/u8:2 > Tainted: G D O 6.15.0-rc4-00040-g128ad8fa385b #40 PREEMPT > [ 80.992791] [ T48] Tainted: [D]=DIE, [O]=OOT_MODULE > [ 80.993039] [ T48] Hardware name: QEMU QEMU Virtual Machine, BIOS > 2025.02-3ubuntu2 04/04/2025 > [ 80.993400] [ T48] Workqueue: async async_run_entry_fn > [ 80.994718] [ T48] pstate: 61402005 (nZCv daif +PAN -UAO -TCO > +DIT -SSBS BTYPE=--) > [ 80.995329] [ T48] pc : cxl_mock_mbox_send+0xec/0x12c0 > [cxl_mock_mem] Can you do this in your kernel tree? ./scripts/faddr2line tools/testing/cxl/test/cxl_mock_mem.ko cxl_mock_mbox_send+0xec/0x12c0 I've not see this issue on x86 running cxl/next. How consistently can you reproduce this? If it's every time, is it possible for you to do a git bisect on the kernel and see which commit causes this please? Thanks! DJ > [ 80.995691] [ T48] lr : cxl_internal_send_cmd+0x40/0x104 > [cxl_core] > [ 80.996189] [ T48] sp : ffff800080d0b9f0 > [ 80.996380] [ T48] x29: ffff800080d0ba70 x28: fff0000008dd2410 > x27: fff00000088fb390 > [ 80.996714] [ T48] x26: ffff800080d0bb07 x25: 0000000000000100 > x24: 0000000000000003 > [ 80.997135] [ T48] x23: 0000000000000020 x22: fff0000008dd2410 > x21: 0000000000000002 > [ 80.998119] [ T48] x20: fff00000088fb080 x19: ffff800080d0bb08 > x18: 00000000ffffffff > [ 80.998419] [ T48] x17: 0000000000000000 x16: ffffa8d169128748 > x15: ffff800080d0b5ad > [ 80.999243] [ T48] x14: ffff800080d0b400 x13: ffff800080d0b5b8 > x12: fff000006f7a0000 > [ 81.000519] [ T48] x11: 0000000000000058 x10: 0000000000000018 x9 > : fff000006f7a0000 > [ 81.001337] [ T48] x8 : ffff800080d0bb48 x7 : fff0000074fa0000 x6 > : fff0000074fa0000 > [ 81.002497] [ T48] x5 : fff000007f937508 x4 : 0000000000000001 x3 > : 0000000000001000 > [ 81.003993] [ T48] x2 : 0000000000001000 x1 : 0000000000000000 x0 > : 0000000000000088 > [ 81.004223] [ T48] Call trace: > [ 81.004795] [ T48] cxl_mock_mbox_send+0xec/0x12c0 [cxl_mock_mem] > (P) > [ 81.005136] [ T48] cxl_internal_send_cmd+0x40/0x104 [cxl_core] > [ 81.005520] [ T48] cxl_mem_get_records_log+0xbc/0x198 [cxl_core] > [ 81.006042] [ T48] cxl_mem_get_event_records+0xb0/0xc0 > [cxl_core] > [ 81.006246] [ T48] cxl_mock_mem_probe+0x568/0x6f0 [cxl_mock_mem] > [ 81.006417] [ T48] platform_probe+0x68/0xd8 > [ 81.008340] [ T48] really_probe+0xc0/0x39c > [ 81.008885] [ T48] __driver_probe_device+0xd0/0x14c > [ 81.009539] [ T48] driver_probe_device+0x3c/0x120 > [ 81.010239] [ T48] __driver_attach_async_helper+0x50/0xec > [ 81.011130] [ T48] async_run_entry_fn+0x34/0x14c > [ 81.011276] [ T48] process_one_work+0x148/0x284 > [ 81.011420] [ T48] worker_thread+0x2c4/0x3e0 > [ 81.011552] [ T48] kthread+0x12c/0x204 > [ 81.011693] [ T48] ret_from_fork+0x10/0x20 > [ 81.011840] [ T48] Code: 54001b28 a90c6bf9 52801100 f9400a61 > (a9007c3f) > [ 81.013772] [ T48] ---[ end trace 0000000000000000 ]--- > > How serious is this? > > Thanks, > Itaru. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-21 15:31 ` Dave Jiang @ 2025-05-21 20:38 ` Itaru Kitayama 2025-05-21 20:46 ` Dave Jiang 0 siblings, 1 reply; 15+ messages in thread From: Itaru Kitayama @ 2025-05-21 20:38 UTC (permalink / raw) To: Dave Jiang; +Cc: linux-cxl Dave > On May 22, 2025, at 0:31, Dave Jiang <dave.jiang@intel.com> wrote: > > > > On 5/21/25 1:39 AM, Itaru Kitayama wrote: >> Hi, >> On arm64/virt QEMU, the cxl/next (as of today) kernel prints out Internal errors: >> >> [ 80.968299] [ T48] Internal error: Oops: 0000000096000044 [#11] >> SMP >> [ 80.989250] [ T48] Modules linked in: cxl_mock_mem(O) cfg80211 >> rfkill cxl_test(O) cxl_mem(O) cxl_pmem(O) cxl_acpi(O) cxl_port(O) >> cxl_mock(O) libnvdimm encrypted_keys trusted caam_jr caam asn1_encoder >> caamhash_desc caamalg_desc error crypto_engine authenc libdes fuse drm >> backlight ip_tables x_tables sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 >> cxl_core(O) fwctl btrfs blake2b_generic xor xor_neon raid6_pq >> zstd_compress ipv6 >> [ 80.992210] [ T48] CPU: 1 UID: 0 PID: 48 Comm: kworker/u8:2 >> Tainted: G D O 6.15.0-rc4-00040-g128ad8fa385b #40 PREEMPT >> [ 80.992791] [ T48] Tainted: [D]=DIE, [O]=OOT_MODULE >> [ 80.993039] [ T48] Hardware name: QEMU QEMU Virtual Machine, BIOS >> 2025.02-3ubuntu2 04/04/2025 >> [ 80.993400] [ T48] Workqueue: async async_run_entry_fn >> [ 80.994718] [ T48] pstate: 61402005 (nZCv daif +PAN -UAO -TCO >> +DIT -SSBS BTYPE=--) >> [ 80.995329] [ T48] pc : cxl_mock_mbox_send+0xec/0x12c0 >> [cxl_mock_mem] > > Can you do this in your kernel tree? > ./scripts/faddr2line tools/testing/cxl/test/cxl_mock_mem.ko cxl_mock_mbox_send+0xec/0x12c0 realm@machine-1:~/projects/cxl$ ./scripts/faddr2line tools/testing/cxl/test/cxl_mock_mem.ko cxl_mock_mbox_send+0xec/0x12c0 cxl_mock_mbox_send+0xec/0x12c0: mock_get_event at /home/realm/projects/cxl/tools/testing/cxl/test/mem.c:277 (inlined by) cxl_mock_mbox_send at /home/realm/projects/cxl/tools/testing/cxl/test/mem.c:1571 > > I've not see this issue on x86 running cxl/next. How consistently can you reproduce this? If it's every time, is it possible for you to do a git bisect on the kernel and see which commit causes this please? Thanks! Fairly reliably (100% of the boot time, and cx/fixes did not change this BTW, which branch is seen as stable for you folks?), yes, I should try git bisect. Itaru. > > DJ > >> [ 80.995691] [ T48] lr : cxl_internal_send_cmd+0x40/0x104 >> [cxl_core] >> [ 80.996189] [ T48] sp : ffff800080d0b9f0 >> [ 80.996380] [ T48] x29: ffff800080d0ba70 x28: fff0000008dd2410 >> x27: fff00000088fb390 >> [ 80.996714] [ T48] x26: ffff800080d0bb07 x25: 0000000000000100 >> x24: 0000000000000003 >> [ 80.997135] [ T48] x23: 0000000000000020 x22: fff0000008dd2410 >> x21: 0000000000000002 >> [ 80.998119] [ T48] x20: fff00000088fb080 x19: ffff800080d0bb08 >> x18: 00000000ffffffff >> [ 80.998419] [ T48] x17: 0000000000000000 x16: ffffa8d169128748 >> x15: ffff800080d0b5ad >> [ 80.999243] [ T48] x14: ffff800080d0b400 x13: ffff800080d0b5b8 >> x12: fff000006f7a0000 >> [ 81.000519] [ T48] x11: 0000000000000058 x10: 0000000000000018 x9 >> : fff000006f7a0000 >> [ 81.001337] [ T48] x8 : ffff800080d0bb48 x7 : fff0000074fa0000 x6 >> : fff0000074fa0000 >> [ 81.002497] [ T48] x5 : fff000007f937508 x4 : 0000000000000001 x3 >> : 0000000000001000 >> [ 81.003993] [ T48] x2 : 0000000000001000 x1 : 0000000000000000 x0 >> : 0000000000000088 >> [ 81.004223] [ T48] Call trace: >> [ 81.004795] [ T48] cxl_mock_mbox_send+0xec/0x12c0 [cxl_mock_mem] >> (P) >> [ 81.005136] [ T48] cxl_internal_send_cmd+0x40/0x104 [cxl_core] >> [ 81.005520] [ T48] cxl_mem_get_records_log+0xbc/0x198 [cxl_core] >> [ 81.006042] [ T48] cxl_mem_get_event_records+0xb0/0xc0 >> [cxl_core] >> [ 81.006246] [ T48] cxl_mock_mem_probe+0x568/0x6f0 [cxl_mock_mem] >> [ 81.006417] [ T48] platform_probe+0x68/0xd8 >> [ 81.008340] [ T48] really_probe+0xc0/0x39c >> [ 81.008885] [ T48] __driver_probe_device+0xd0/0x14c >> [ 81.009539] [ T48] driver_probe_device+0x3c/0x120 >> [ 81.010239] [ T48] __driver_attach_async_helper+0x50/0xec >> [ 81.011130] [ T48] async_run_entry_fn+0x34/0x14c >> [ 81.011276] [ T48] process_one_work+0x148/0x284 >> [ 81.011420] [ T48] worker_thread+0x2c4/0x3e0 >> [ 81.011552] [ T48] kthread+0x12c/0x204 >> [ 81.011693] [ T48] ret_from_fork+0x10/0x20 >> [ 81.011840] [ T48] Code: 54001b28 a90c6bf9 52801100 f9400a61 >> (a9007c3f) >> [ 81.013772] [ T48] ---[ end trace 0000000000000000 ]--- >> >> How serious is this? >> >> Thanks, >> Itaru. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-21 20:38 ` Itaru Kitayama @ 2025-05-21 20:46 ` Dave Jiang 2025-05-21 23:28 ` Itaru Kitayama 0 siblings, 1 reply; 15+ messages in thread From: Dave Jiang @ 2025-05-21 20:46 UTC (permalink / raw) To: Itaru Kitayama; +Cc: linux-cxl On 5/21/25 1:38 PM, Itaru Kitayama wrote: > Dave > >> On May 22, 2025, at 0:31, Dave Jiang <dave.jiang@intel.com> wrote: >> >> >> >> On 5/21/25 1:39 AM, Itaru Kitayama wrote: >>> Hi, >>> On arm64/virt QEMU, the cxl/next (as of today) kernel prints out Internal errors: >>> >>> [ 80.968299] [ T48] Internal error: Oops: 0000000096000044 [#11] >>> SMP >>> [ 80.989250] [ T48] Modules linked in: cxl_mock_mem(O) cfg80211 >>> rfkill cxl_test(O) cxl_mem(O) cxl_pmem(O) cxl_acpi(O) cxl_port(O) >>> cxl_mock(O) libnvdimm encrypted_keys trusted caam_jr caam asn1_encoder >>> caamhash_desc caamalg_desc error crypto_engine authenc libdes fuse drm >>> backlight ip_tables x_tables sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 >>> cxl_core(O) fwctl btrfs blake2b_generic xor xor_neon raid6_pq >>> zstd_compress ipv6 >>> [ 80.992210] [ T48] CPU: 1 UID: 0 PID: 48 Comm: kworker/u8:2 >>> Tainted: G D O 6.15.0-rc4-00040-g128ad8fa385b #40 PREEMPT >>> [ 80.992791] [ T48] Tainted: [D]=DIE, [O]=OOT_MODULE >>> [ 80.993039] [ T48] Hardware name: QEMU QEMU Virtual Machine, BIOS >>> 2025.02-3ubuntu2 04/04/2025 >>> [ 80.993400] [ T48] Workqueue: async async_run_entry_fn >>> [ 80.994718] [ T48] pstate: 61402005 (nZCv daif +PAN -UAO -TCO >>> +DIT -SSBS BTYPE=--) >>> [ 80.995329] [ T48] pc : cxl_mock_mbox_send+0xec/0x12c0 >>> [cxl_mock_mem] >> >> Can you do this in your kernel tree? >> ./scripts/faddr2line tools/testing/cxl/test/cxl_mock_mem.ko cxl_mock_mbox_send+0xec/0x12c0 > > realm@machine-1:~/projects/cxl$ ./scripts/faddr2line tools/testing/cxl/test/cxl_mock_mem.ko cxl_mock_mbox_send+0xec/0x12c0 > cxl_mock_mbox_send+0xec/0x12c0: > mock_get_event at /home/realm/projects/cxl/tools/testing/cxl/test/mem.c:277 > (inlined by) cxl_mock_mbox_send at /home/realm/projects/cxl/tools/testing/cxl/test/mem.c:1571 > >> >> I've not see this issue on x86 running cxl/next. How consistently can you reproduce this? If it's every time, is it possible for you to do a git bisect on the kernel and see which commit causes this please? Thanks! > > Fairly reliably (100% of the boot time, and cx/fixes did not change this BTW, which branch is seen as stable for you folks?), yes, I should try git bisect. The current cxl/next is based on 6.15-rc4, which should have everything that was in cxl/fixes. And you do not see this with 6.14-final? git bisect would be very helpful. Thank you! > > Itaru. > >> >> DJ >> >>> [ 80.995691] [ T48] lr : cxl_internal_send_cmd+0x40/0x104 >>> [cxl_core] >>> [ 80.996189] [ T48] sp : ffff800080d0b9f0 >>> [ 80.996380] [ T48] x29: ffff800080d0ba70 x28: fff0000008dd2410 >>> x27: fff00000088fb390 >>> [ 80.996714] [ T48] x26: ffff800080d0bb07 x25: 0000000000000100 >>> x24: 0000000000000003 >>> [ 80.997135] [ T48] x23: 0000000000000020 x22: fff0000008dd2410 >>> x21: 0000000000000002 >>> [ 80.998119] [ T48] x20: fff00000088fb080 x19: ffff800080d0bb08 >>> x18: 00000000ffffffff >>> [ 80.998419] [ T48] x17: 0000000000000000 x16: ffffa8d169128748 >>> x15: ffff800080d0b5ad >>> [ 80.999243] [ T48] x14: ffff800080d0b400 x13: ffff800080d0b5b8 >>> x12: fff000006f7a0000 >>> [ 81.000519] [ T48] x11: 0000000000000058 x10: 0000000000000018 x9 >>> : fff000006f7a0000 >>> [ 81.001337] [ T48] x8 : ffff800080d0bb48 x7 : fff0000074fa0000 x6 >>> : fff0000074fa0000 >>> [ 81.002497] [ T48] x5 : fff000007f937508 x4 : 0000000000000001 x3 >>> : 0000000000001000 >>> [ 81.003993] [ T48] x2 : 0000000000001000 x1 : 0000000000000000 x0 >>> : 0000000000000088 >>> [ 81.004223] [ T48] Call trace: >>> [ 81.004795] [ T48] cxl_mock_mbox_send+0xec/0x12c0 [cxl_mock_mem] >>> (P) >>> [ 81.005136] [ T48] cxl_internal_send_cmd+0x40/0x104 [cxl_core] >>> [ 81.005520] [ T48] cxl_mem_get_records_log+0xbc/0x198 [cxl_core] >>> [ 81.006042] [ T48] cxl_mem_get_event_records+0xb0/0xc0 >>> [cxl_core] >>> [ 81.006246] [ T48] cxl_mock_mem_probe+0x568/0x6f0 [cxl_mock_mem] >>> [ 81.006417] [ T48] platform_probe+0x68/0xd8 >>> [ 81.008340] [ T48] really_probe+0xc0/0x39c >>> [ 81.008885] [ T48] __driver_probe_device+0xd0/0x14c >>> [ 81.009539] [ T48] driver_probe_device+0x3c/0x120 >>> [ 81.010239] [ T48] __driver_attach_async_helper+0x50/0xec >>> [ 81.011130] [ T48] async_run_entry_fn+0x34/0x14c >>> [ 81.011276] [ T48] process_one_work+0x148/0x284 >>> [ 81.011420] [ T48] worker_thread+0x2c4/0x3e0 >>> [ 81.011552] [ T48] kthread+0x12c/0x204 >>> [ 81.011693] [ T48] ret_from_fork+0x10/0x20 >>> [ 81.011840] [ T48] Code: 54001b28 a90c6bf9 52801100 f9400a61 >>> (a9007c3f) >>> [ 81.013772] [ T48] ---[ end trace 0000000000000000 ]--- >>> >>> How serious is this? >>> >>> Thanks, >>> Itaru. > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-21 20:46 ` Dave Jiang @ 2025-05-21 23:28 ` Itaru Kitayama 2025-05-21 23:34 ` Dan Williams 0 siblings, 1 reply; 15+ messages in thread From: Itaru Kitayama @ 2025-05-21 23:28 UTC (permalink / raw) To: Dave Jiang; +Cc: linux-cxl Dave et al., > On May 22, 2025, at 5:46, Dave Jiang <dave.jiang@intel.com> wrote: > > > > On 5/21/25 1:38 PM, Itaru Kitayama wrote: >> Dave >> >>> On May 22, 2025, at 0:31, Dave Jiang <dave.jiang@intel.com> wrote: >>> >>> >>> >>> On 5/21/25 1:39 AM, Itaru Kitayama wrote: >>>> Hi, >>>> On arm64/virt QEMU, the cxl/next (as of today) kernel prints out Internal errors: >>>> >>>> [ 80.968299] [ T48] Internal error: Oops: 0000000096000044 [#11] >>>> SMP >>>> [ 80.989250] [ T48] Modules linked in: cxl_mock_mem(O) cfg80211 >>>> rfkill cxl_test(O) cxl_mem(O) cxl_pmem(O) cxl_acpi(O) cxl_port(O) >>>> cxl_mock(O) libnvdimm encrypted_keys trusted caam_jr caam asn1_encoder >>>> caamhash_desc caamalg_desc error crypto_engine authenc libdes fuse drm >>>> backlight ip_tables x_tables sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 >>>> cxl_core(O) fwctl btrfs blake2b_generic xor xor_neon raid6_pq >>>> zstd_compress ipv6 >>>> [ 80.992210] [ T48] CPU: 1 UID: 0 PID: 48 Comm: kworker/u8:2 >>>> Tainted: G D O 6.15.0-rc4-00040-g128ad8fa385b #40 PREEMPT >>>> [ 80.992791] [ T48] Tainted: [D]=DIE, [O]=OOT_MODULE >>>> [ 80.993039] [ T48] Hardware name: QEMU QEMU Virtual Machine, BIOS >>>> 2025.02-3ubuntu2 04/04/2025 >>>> [ 80.993400] [ T48] Workqueue: async async_run_entry_fn >>>> [ 80.994718] [ T48] pstate: 61402005 (nZCv daif +PAN -UAO -TCO >>>> +DIT -SSBS BTYPE=--) >>>> [ 80.995329] [ T48] pc : cxl_mock_mbox_send+0xec/0x12c0 >>>> [cxl_mock_mem] >>> >>> Can you do this in your kernel tree? >>> ./scripts/faddr2line tools/testing/cxl/test/cxl_mock_mem.ko cxl_mock_mbox_send+0xec/0x12c0 >> >> realm@machine-1:~/projects/cxl$ ./scripts/faddr2line tools/testing/cxl/test/cxl_mock_mem.ko cxl_mock_mbox_send+0xec/0x12c0 >> cxl_mock_mbox_send+0xec/0x12c0: >> mock_get_event at /home/realm/projects/cxl/tools/testing/cxl/test/mem.c:277 >> (inlined by) cxl_mock_mbox_send at /home/realm/projects/cxl/tools/testing/cxl/test/mem.c:1571 >> >>> >>> I've not see this issue on x86 running cxl/next. How consistently can you reproduce this? If it's every time, is it possible for you to do a git bisect on the kernel and see which commit causes this please? Thanks! >> >> Fairly reliably (100% of the boot time, and cx/fixes did not change this BTW, which branch is seen as stable for you folks?), yes, I should try git bisect. > > The current cxl/next is based on 6.15-rc4, which should have everything that was in cxl/fixes. And you do not see this with 6.14-final? git bisect would be very helpful. Thank you! > >> >> Itaru. >> >>> >>> DJ >>> >>>> [ 80.995691] [ T48] lr : cxl_internal_send_cmd+0x40/0x104 >>>> [cxl_core] >>>> [ 80.996189] [ T48] sp : ffff800080d0b9f0 >>>> [ 80.996380] [ T48] x29: ffff800080d0ba70 x28: fff0000008dd2410 >>>> x27: fff00000088fb390 >>>> [ 80.996714] [ T48] x26: ffff800080d0bb07 x25: 0000000000000100 >>>> x24: 0000000000000003 >>>> [ 80.997135] [ T48] x23: 0000000000000020 x22: fff0000008dd2410 >>>> x21: 0000000000000002 >>>> [ 80.998119] [ T48] x20: fff00000088fb080 x19: ffff800080d0bb08 >>>> x18: 00000000ffffffff >>>> [ 80.998419] [ T48] x17: 0000000000000000 x16: ffffa8d169128748 >>>> x15: ffff800080d0b5ad >>>> [ 80.999243] [ T48] x14: ffff800080d0b400 x13: ffff800080d0b5b8 >>>> x12: fff000006f7a0000 >>>> [ 81.000519] [ T48] x11: 0000000000000058 x10: 0000000000000018 x9 >>>> : fff000006f7a0000 >>>> [ 81.001337] [ T48] x8 : ffff800080d0bb48 x7 : fff0000074fa0000 x6 >>>> : fff0000074fa0000 >>>> [ 81.002497] [ T48] x5 : fff000007f937508 x4 : 0000000000000001 x3 >>>> : 0000000000001000 >>>> [ 81.003993] [ T48] x2 : 0000000000001000 x1 : 0000000000000000 x0 >>>> : 0000000000000088 >>>> [ 81.004223] [ T48] Call trace: >>>> [ 81.004795] [ T48] cxl_mock_mbox_send+0xec/0x12c0 [cxl_mock_mem] >>>> (P) >>>> [ 81.005136] [ T48] cxl_internal_send_cmd+0x40/0x104 [cxl_core] >>>> [ 81.005520] [ T48] cxl_mem_get_records_log+0xbc/0x198 [cxl_core] >>>> [ 81.006042] [ T48] cxl_mem_get_event_records+0xb0/0xc0 >>>> [cxl_core] >>>> [ 81.006246] [ T48] cxl_mock_mem_probe+0x568/0x6f0 [cxl_mock_mem] >>>> [ 81.006417] [ T48] platform_probe+0x68/0xd8 >>>> [ 81.008340] [ T48] really_probe+0xc0/0x39c >>>> [ 81.008885] [ T48] __driver_probe_device+0xd0/0x14c >>>> [ 81.009539] [ T48] driver_probe_device+0x3c/0x120 >>>> [ 81.010239] [ T48] __driver_attach_async_helper+0x50/0xec >>>> [ 81.011130] [ T48] async_run_entry_fn+0x34/0x14c >>>> [ 81.011276] [ T48] process_one_work+0x148/0x284 >>>> [ 81.011420] [ T48] worker_thread+0x2c4/0x3e0 >>>> [ 81.011552] [ T48] kthread+0x12c/0x204 >>>> [ 81.011693] [ T48] ret_from_fork+0x10/0x20 >>>> [ 81.011840] [ T48] Code: 54001b28 a90c6bf9 52801100 f9400a61 >>>> (a9007c3f) >>>> [ 81.013772] [ T48] ---[ end trace 0000000000000000 ]--- >>>> >>>> How serious is this? >>>> >>>> Thanks, >>>> Itaru. Rebuilt the rootfs image and tried today’s cx/next (6.15.0-rc4-00046-g6eed708a5693) again to boot now I don’t see the splats, so something I was messing my dev environment sorry about that. CXL utility commands work reasonably now and I can execute meson test —suite cxl, while most of them still fails due to the HPA allocation error which makes me wonder as the resource requests are quite modest. Itaru. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-21 23:28 ` Itaru Kitayama @ 2025-05-21 23:34 ` Dan Williams 2025-05-22 13:56 ` Jonathan Cameron 0 siblings, 1 reply; 15+ messages in thread From: Dan Williams @ 2025-05-21 23:34 UTC (permalink / raw) To: Itaru Kitayama, Dave Jiang; +Cc: linux-cxl Itaru Kitayama wrote: > Dave et al., [..] > Rebuilt the rootfs image and tried today’s cx/next > (6.15.0-rc4-00046-g6eed708a5693) again to boot now I don’t see the > splats, so something I was messing my dev environment sorry about > that. > > CXL utility commands work reasonably now and I can execute meson test > —suite cxl, while most of them still fails due to the HPA allocation > error which makes me wonder as the resource requests are quite modest. So cxl_test_init() just "hopes" that the top of the system physical address space is free to use to emulate CXL windows. That might be an assumption that only works for x86_64, not ARM64. I would double check that this code in cxl_test_init() rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, SZ_64G, NUMA_NO_NODE); if (rc) goto err_gen_pool_add; ...is not setting up CXL Windows that overlap with existing resources in that range. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-21 23:34 ` Dan Williams @ 2025-05-22 13:56 ` Jonathan Cameron 2025-05-22 18:19 ` Dan Williams ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Jonathan Cameron @ 2025-05-22 13:56 UTC (permalink / raw) To: Dan Williams; +Cc: Itaru Kitayama, Dave Jiang, linux-cxl On Wed, 21 May 2025 16:34:16 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > Itaru Kitayama wrote: > > Dave et al., > [..] > > Rebuilt the rootfs image and tried today’s cx/next > > (6.15.0-rc4-00046-g6eed708a5693) again to boot now I don’t see the > > splats, so something I was messing my dev environment sorry about > > that. > > > > CXL utility commands work reasonably now and I can execute meson test > > —suite cxl, while most of them still fails due to the HPA allocation > > error which makes me wonder as the resource requests are quite modest. > > So cxl_test_init() just "hopes" that the top of the system physical > address space is free to use to emulate CXL windows. That might be an > assumption that only works for x86_64, not ARM64. I would double check > that this code in cxl_test_init() > > rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, > SZ_64G, NUMA_NO_NODE); > if (rc) > goto err_gen_pool_add; > > ...is not setting up CXL Windows that overlap with existing resources in > that range. > I think there are checks that block use of ranges up there. Print I'm seeing is Hotplug memory [0xfffffff010000000-0xfffffff030000000] exceeds maximum addressable range [0x40000000-0xf80003fffffff] I think right answer is to use mhp_get_pluggable_range(true); to check for limits on the range we can use. On architectures that don't define arch_get_mappable_range() that ends up the as (unsigned long)-1 which I think would work though there may be other stuff up there. Maybe min(iomem_resource.end + 1 - SZ_64G, mappable_range.end + 1 - SZ_64G) or something like that adapted to avoid wrap around. I haven't yet sanity checked this doesn't break x86 but I think it should end up making no difference to the locations on x86. With the below - all 11 tests in ndctl cxl test suite pass for me. From b287ff2c5ee7fbe507ef8cb61df3e4e156a9773f Mon Sep 17 00:00:00 2001 From: Jonathan Cameron <Jonathan.Cameron@huawei.com> Date: Thu, 22 May 2025 14:20:42 +0100 Subject: [PATCH] cxl_test: Limit location for fake CFMWS to mappable range Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- tools/testing/cxl/test/cxl.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 8a5815ca870d..b4e6c7659ac4 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -1328,6 +1328,7 @@ static int cxl_mem_init(void) static __init int cxl_test_init(void) { int rc, i; + struct range mappable; cxl_acpi_test(); cxl_core_test(); @@ -1342,8 +1343,11 @@ static __init int cxl_test_init(void) rc = -ENOMEM; goto err_gen_pool_create; } + mappable = mhp_get_pluggable_range(true); - rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, + rc = gen_pool_add(cxl_mock_pool, + min(iomem_resource.end + 1 - SZ_64G, + mappable.end + 1 - SZ_64G), SZ_64G, NUMA_NO_NODE); if (rc) goto err_gen_pool_add; -- 2.43.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-22 13:56 ` Jonathan Cameron @ 2025-05-22 18:19 ` Dan Williams 2025-05-22 21:46 ` Itaru Kitayama 2025-05-23 5:52 ` Marc Herbert 2 siblings, 0 replies; 15+ messages in thread From: Dan Williams @ 2025-05-22 18:19 UTC (permalink / raw) To: Jonathan Cameron, Dan Williams; +Cc: Itaru Kitayama, Dave Jiang, linux-cxl Jonathan Cameron wrote: > On Wed, 21 May 2025 16:34:16 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > > > Itaru Kitayama wrote: > > > Dave et al., > > [..] > > > Rebuilt the rootfs image and tried today’s cx/next > > > (6.15.0-rc4-00046-g6eed708a5693) again to boot now I don’t see the > > > splats, so something I was messing my dev environment sorry about > > > that. > > > > > > CXL utility commands work reasonably now and I can execute meson test > > > —suite cxl, while most of them still fails due to the HPA allocation > > > error which makes me wonder as the resource requests are quite modest. > > > > So cxl_test_init() just "hopes" that the top of the system physical > > address space is free to use to emulate CXL windows. That might be an > > assumption that only works for x86_64, not ARM64. I would double check > > that this code in cxl_test_init() > > > > rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, > > SZ_64G, NUMA_NO_NODE); > > if (rc) > > goto err_gen_pool_add; > > > > ...is not setting up CXL Windows that overlap with existing resources in > > that range. > > > > I think there are checks that block use of ranges up there. > > Print I'm seeing is > Hotplug memory [0xfffffff010000000-0xfffffff030000000] exceeds maximum addressable range [0x40000000-0xf80003fffffff] > > I think right answer is to use mhp_get_pluggable_range(true); to check > for limits on the range we can use. > > On architectures that don't define arch_get_mappable_range() > that ends up the as (unsigned long)-1 which I think would work > though there may be other stuff up there. Maybe min(iomem_resource.end + 1 - SZ_64G, > mappable_range.end + 1 - SZ_64G) > or something like that adapted to avoid wrap around. > > I haven't yet sanity checked this doesn't break x86 but I think it should > end up making no difference to the locations on x86. > > > With the below - all 11 tests in ndctl cxl test suite pass for me. It was whitespace clobbered but after fixing that up and adding #include <linux/memory_hotplug.h>, it works here, thanks! Reviewed-by: Dan Williams <dan.j.williams@intel.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-22 13:56 ` Jonathan Cameron 2025-05-22 18:19 ` Dan Williams @ 2025-05-22 21:46 ` Itaru Kitayama 2025-05-23 3:28 ` Alison Schofield 2025-05-23 5:52 ` Marc Herbert 2 siblings, 1 reply; 15+ messages in thread From: Itaru Kitayama @ 2025-05-22 21:46 UTC (permalink / raw) To: Jonathan Cameron; +Cc: Dan Williams, Dave Jiang, linux-cxl Hi Jonathan, > On May 22, 2025, at 22:56, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Wed, 21 May 2025 16:34:16 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > >> Itaru Kitayama wrote: >>> Dave et al., >> [..] >>> Rebuilt the rootfs image and tried today’s cx/next >>> (6.15.0-rc4-00046-g6eed708a5693) again to boot now I don’t see the >>> splats, so something I was messing my dev environment sorry about >>> that. >>> >>> CXL utility commands work reasonably now and I can execute meson test >>> —suite cxl, while most of them still fails due to the HPA allocation >>> error which makes me wonder as the resource requests are quite modest. >> >> So cxl_test_init() just "hopes" that the top of the system physical >> address space is free to use to emulate CXL windows. That might be an >> assumption that only works for x86_64, not ARM64. I would double check >> that this code in cxl_test_init() >> >> rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, >> SZ_64G, NUMA_NO_NODE); >> if (rc) >> goto err_gen_pool_add; >> >> ...is not setting up CXL Windows that overlap with existing resources in >> that range. >> > > I think there are checks that block use of ranges up there. > > Print I'm seeing is > Hotplug memory [0xfffffff010000000-0xfffffff030000000] exceeds maximum addressable range [0x40000000-0xf80003fffffff] > > I think right answer is to use mhp_get_pluggable_range(true); to check > for limits on the range we can use. > > On architectures that don't define arch_get_mappable_range() > that ends up the as (unsigned long)-1 which I think would work > though there may be other stuff up there. Maybe min(iomem_resource.end + 1 - SZ_64G, > mappable_range.end + 1 - SZ_64G) > or something like that adapted to avoid wrap around. > > I haven't yet sanity checked this doesn't break x86 but I think it should > end up making no difference to the locations on x86. > > > With the below - all 11 tests in ndctl cxl test suite pass for me. > > From b287ff2c5ee7fbe507ef8cb61df3e4e156a9773f Mon Sep 17 00:00:00 2001 > From: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Date: Thu, 22 May 2025 14:20:42 +0100 > Subject: [PATCH] cxl_test: Limit location for fake CFMWS to mappable range > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > tools/testing/cxl/test/cxl.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c > index 8a5815ca870d..b4e6c7659ac4 100644 > --- a/tools/testing/cxl/test/cxl.c > +++ b/tools/testing/cxl/test/cxl.c > @@ -1328,6 +1328,7 @@ static int cxl_mem_init(void) > static __init int cxl_test_init(void) > { > int rc, i; > + struct range mappable; > > cxl_acpi_test(); > cxl_core_test(); > @@ -1342,8 +1343,11 @@ static __init int cxl_test_init(void) > rc = -ENOMEM; > goto err_gen_pool_create; > } > + mappable = mhp_get_pluggable_range(true); > > - rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, > + rc = gen_pool_add(cxl_mock_pool, > + min(iomem_resource.end + 1 - SZ_64G, > + mappable.end + 1 - SZ_64G), > SZ_64G, NUMA_NO_NODE); > if (rc) > goto err_gen_pool_add; > -- > 2.43.0 > Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com <mailto:itaru.kitayama@fujitsu.com>> # meson test --suite cxl ninja: Entering directory `/root/ndctl/build' [1/82] Generating version.h with a custom command 1/12 ndctl:cxl / cxl-topology.sh OK 33.96s 2/12 ndctl:cxl / cxl-region-sysfs.sh OK 18.00s 3/12 ndctl:cxl / cxl-labels.sh OK 23.78s 4/12 ndctl:cxl / cxl-create-region.sh OK 43.03s 5/12 ndctl:cxl / cxl-xor-region.sh OK 19.30s 6/12 ndctl:cxl / cxl-events.sh FAIL 6.40s exit status 1 >>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MALLOC_PERTURB_=45 TEST_PATH=/root/ndctl/build/test UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-events.sh 7/12 ndctl:cxl / cxl-sanitize.sh OK 14.77s 8/12 ndctl:cxl / cxl-destroy-region.sh OK 13.69s 9/12 ndctl:cxl / cxl-qos-class.sh OK 14.31s 10/12 ndctl:cxl / cxl-poison.sh FAIL 3.46s exit status 1 >>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MALLOC_PERTURB_=80 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 TEST_PATH=/root/ndctl/build/test MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-poison.sh 11/12 ndctl:cxl / cxl-update-firmware.sh OK 66.23s 12/12 ndctl:cxl / cxl-security.sh SKIP 0.34s exit status 77 Ok: 9 Expected Fail: 0 Fail: 2 Unexpected Pass: 0 Skipped: 1 Timeout: 0 My understanding is that these CXL tests are using mock CFMWs, not the actual physical memory regions at their fixed locations. So I wonder executing these set of test on a “sane" CXL emulation setup (run_qemu.sh creates) that the Intel folk is using does matter or not. Itaru. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-22 21:46 ` Itaru Kitayama @ 2025-05-23 3:28 ` Alison Schofield 2025-05-23 4:56 ` Itaru Kitayama 0 siblings, 1 reply; 15+ messages in thread From: Alison Schofield @ 2025-05-23 3:28 UTC (permalink / raw) To: Itaru Kitayama; +Cc: Jonathan Cameron, Dan Williams, Dave Jiang, linux-cxl On Fri, May 23, 2025 at 06:46:53AM +0900, Itaru Kitayama wrote: > Hi Jonathan, > > > On May 22, 2025, at 22:56, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > > > On Wed, 21 May 2025 16:34:16 -0700 > > Dan Williams <dan.j.williams@intel.com> wrote: > > > >> Itaru Kitayama wrote: > >>> Dave et al., > >> [..] > >>> Rebuilt the rootfs image and tried today’s cx/next > >>> (6.15.0-rc4-00046-g6eed708a5693) again to boot now I don’t see the > >>> splats, so something I was messing my dev environment sorry about > >>> that. > >>> > >>> CXL utility commands work reasonably now and I can execute meson test > >>> —suite cxl, while most of them still fails due to the HPA allocation > >>> error which makes me wonder as the resource requests are quite modest. > >> > >> So cxl_test_init() just "hopes" that the top of the system physical > >> address space is free to use to emulate CXL windows. That might be an > >> assumption that only works for x86_64, not ARM64. I would double check > >> that this code in cxl_test_init() > >> > >> rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, > >> SZ_64G, NUMA_NO_NODE); > >> if (rc) > >> goto err_gen_pool_add; > >> > >> ...is not setting up CXL Windows that overlap with existing resources in > >> that range. > >> > > > > I think there are checks that block use of ranges up there. > > > > Print I'm seeing is > > Hotplug memory [0xfffffff010000000-0xfffffff030000000] exceeds maximum addressable range [0x40000000-0xf80003fffffff] > > > > I think right answer is to use mhp_get_pluggable_range(true); to check > > for limits on the range we can use. > > > > On architectures that don't define arch_get_mappable_range() > > that ends up the as (unsigned long)-1 which I think would work > > though there may be other stuff up there. Maybe min(iomem_resource.end + 1 - SZ_64G, > > mappable_range.end + 1 - SZ_64G) > > or something like that adapted to avoid wrap around. > > > > I haven't yet sanity checked this doesn't break x86 but I think it should > > end up making no difference to the locations on x86. > > > > > > With the below - all 11 tests in ndctl cxl test suite pass for me. > > > > From b287ff2c5ee7fbe507ef8cb61df3e4e156a9773f Mon Sep 17 00:00:00 2001 > > From: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > Date: Thu, 22 May 2025 14:20:42 +0100 > > Subject: [PATCH] cxl_test: Limit location for fake CFMWS to mappable range > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > --- > > tools/testing/cxl/test/cxl.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c > > index 8a5815ca870d..b4e6c7659ac4 100644 > > --- a/tools/testing/cxl/test/cxl.c > > +++ b/tools/testing/cxl/test/cxl.c > > @@ -1328,6 +1328,7 @@ static int cxl_mem_init(void) > > static __init int cxl_test_init(void) > > { > > int rc, i; > > + struct range mappable; > > > > cxl_acpi_test(); > > cxl_core_test(); > > @@ -1342,8 +1343,11 @@ static __init int cxl_test_init(void) > > rc = -ENOMEM; > > goto err_gen_pool_create; > > } > > + mappable = mhp_get_pluggable_range(true); > > > > - rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, > > + rc = gen_pool_add(cxl_mock_pool, > > + min(iomem_resource.end + 1 - SZ_64G, > > + mappable.end + 1 - SZ_64G), > > SZ_64G, NUMA_NO_NODE); > > if (rc) > > goto err_gen_pool_add; > > -- > > 2.43.0 > > > > Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com <mailto:itaru.kitayama@fujitsu.com>> > > # meson test --suite cxl > ninja: Entering directory `/root/ndctl/build' > [1/82] Generating version.h with a custom command > 1/12 ndctl:cxl / cxl-topology.sh OK 33.96s > 2/12 ndctl:cxl / cxl-region-sysfs.sh OK 18.00s > 3/12 ndctl:cxl / cxl-labels.sh OK 23.78s > 4/12 ndctl:cxl / cxl-create-region.sh OK 43.03s > 5/12 ndctl:cxl / cxl-xor-region.sh OK 19.30s > 6/12 ndctl:cxl / cxl-events.sh FAIL 6.40s exit status 1 > >>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MALLOC_PERTURB_=45 TEST_PATH=/root/ndctl/build/test UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-events.sh > > 7/12 ndctl:cxl / cxl-sanitize.sh OK 14.77s > 8/12 ndctl:cxl / cxl-destroy-region.sh OK 13.69s > 9/12 ndctl:cxl / cxl-qos-class.sh OK 14.31s > 10/12 ndctl:cxl / cxl-poison.sh FAIL 3.46s exit status 1 > >>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MALLOC_PERTURB_=80 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 TEST_PATH=/root/ndctl/build/test MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-poison.sh > > 11/12 ndctl:cxl / cxl-update-firmware.sh OK 66.23s > 12/12 ndctl:cxl / cxl-security.sh SKIP 0.34s exit status 77 > > Ok: 9 > Expected Fail: 0 > Fail: 2 > Unexpected Pass: 0 > Skipped: 1 > Timeout: 0 > > My understanding is that these CXL tests are using mock CFMWs, not the actual physical memory regions at their fixed locations. So I wonder executing these set of test on a “sane" CXL emulation setup (run_qemu.sh creates) that the Intel folk is using does matter or not. Right - these test run on the mock CFMW's that the cxl-test module creates. As far as running on a 'sane' CXL emulation setup, like run_qemu.sh, I may not be understanding the question, but I'll take a shot. The qemu defined CXL devices do not matter at all for the cxl unit test run. The unit tests only uses the mock cxl/test environment provided by the cxl-test module. The qemu CXL devices are irrelevant. Let me know if I missed the point of you were making. I noticed your test output FAIL cases, probably for CONFIG_TRACING not enabled, and posted a patch to turn those into SKIPs. --Alison > > Itaru. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-23 3:28 ` Alison Schofield @ 2025-05-23 4:56 ` Itaru Kitayama 0 siblings, 0 replies; 15+ messages in thread From: Itaru Kitayama @ 2025-05-23 4:56 UTC (permalink / raw) To: Alison Schofield; +Cc: Jonathan Cameron, Dan Williams, Dave Jiang, linux-cxl Hi Alison, > On May 23, 2025, at 12:28, Alison Schofield <alison.schofield@intel.com> wrote: > > On Fri, May 23, 2025 at 06:46:53AM +0900, Itaru Kitayama wrote: >> Hi Jonathan, >> >>> On May 22, 2025, at 22:56, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: >>> >>> On Wed, 21 May 2025 16:34:16 -0700 >>> Dan Williams <dan.j.williams@intel.com> wrote: >>> >>>> Itaru Kitayama wrote: >>>>> Dave et al., >>>> [..] >>>>> Rebuilt the rootfs image and tried today’s cx/next >>>>> (6.15.0-rc4-00046-g6eed708a5693) again to boot now I don’t see the >>>>> splats, so something I was messing my dev environment sorry about >>>>> that. >>>>> >>>>> CXL utility commands work reasonably now and I can execute meson test >>>>> —suite cxl, while most of them still fails due to the HPA allocation >>>>> error which makes me wonder as the resource requests are quite modest. >>>> >>>> So cxl_test_init() just "hopes" that the top of the system physical >>>> address space is free to use to emulate CXL windows. That might be an >>>> assumption that only works for x86_64, not ARM64. I would double check >>>> that this code in cxl_test_init() >>>> >>>> rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, >>>> SZ_64G, NUMA_NO_NODE); >>>> if (rc) >>>> goto err_gen_pool_add; >>>> >>>> ...is not setting up CXL Windows that overlap with existing resources in >>>> that range. >>>> >>> >>> I think there are checks that block use of ranges up there. >>> >>> Print I'm seeing is >>> Hotplug memory [0xfffffff010000000-0xfffffff030000000] exceeds maximum addressable range [0x40000000-0xf80003fffffff] >>> >>> I think right answer is to use mhp_get_pluggable_range(true); to check >>> for limits on the range we can use. >>> >>> On architectures that don't define arch_get_mappable_range() >>> that ends up the as (unsigned long)-1 which I think would work >>> though there may be other stuff up there. Maybe min(iomem_resource.end + 1 - SZ_64G, >>> mappable_range.end + 1 - SZ_64G) >>> or something like that adapted to avoid wrap around. >>> >>> I haven't yet sanity checked this doesn't break x86 but I think it should >>> end up making no difference to the locations on x86. >>> >>> >>> With the below - all 11 tests in ndctl cxl test suite pass for me. >>> >>> From b287ff2c5ee7fbe507ef8cb61df3e4e156a9773f Mon Sep 17 00:00:00 2001 >>> From: Jonathan Cameron <Jonathan.Cameron@huawei.com> >>> Date: Thu, 22 May 2025 14:20:42 +0100 >>> Subject: [PATCH] cxl_test: Limit location for fake CFMWS to mappable range >>> >>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >>> --- >>> tools/testing/cxl/test/cxl.c | 6 +++++- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>> >>> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c >>> index 8a5815ca870d..b4e6c7659ac4 100644 >>> --- a/tools/testing/cxl/test/cxl.c >>> +++ b/tools/testing/cxl/test/cxl.c >>> @@ -1328,6 +1328,7 @@ static int cxl_mem_init(void) >>> static __init int cxl_test_init(void) >>> { >>> int rc, i; >>> + struct range mappable; >>> >>> cxl_acpi_test(); >>> cxl_core_test(); >>> @@ -1342,8 +1343,11 @@ static __init int cxl_test_init(void) >>> rc = -ENOMEM; >>> goto err_gen_pool_create; >>> } >>> + mappable = mhp_get_pluggable_range(true); >>> >>> - rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, >>> + rc = gen_pool_add(cxl_mock_pool, >>> + min(iomem_resource.end + 1 - SZ_64G, >>> + mappable.end + 1 - SZ_64G), >>> SZ_64G, NUMA_NO_NODE); >>> if (rc) >>> goto err_gen_pool_add; >>> -- >>> 2.43.0 >>> >> >> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com <mailto:itaru.kitayama@fujitsu.com>> >> >> # meson test --suite cxl >> ninja: Entering directory `/root/ndctl/build' >> [1/82] Generating version.h with a custom command >> 1/12 ndctl:cxl / cxl-topology.sh OK 33.96s >> 2/12 ndctl:cxl / cxl-region-sysfs.sh OK 18.00s >> 3/12 ndctl:cxl / cxl-labels.sh OK 23.78s >> 4/12 ndctl:cxl / cxl-create-region.sh OK 43.03s >> 5/12 ndctl:cxl / cxl-xor-region.sh OK 19.30s >> 6/12 ndctl:cxl / cxl-events.sh FAIL 6.40s exit status 1 >>>>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MALLOC_PERTURB_=45 TEST_PATH=/root/ndctl/build/test UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-events.sh >> >> 7/12 ndctl:cxl / cxl-sanitize.sh OK 14.77s >> 8/12 ndctl:cxl / cxl-destroy-region.sh OK 13.69s >> 9/12 ndctl:cxl / cxl-qos-class.sh OK 14.31s >> 10/12 ndctl:cxl / cxl-poison.sh FAIL 3.46s exit status 1 >>>>> LD_LIBRARY_PATH=/root/ndctl/build/daxctl/lib:/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MALLOC_PERTURB_=80 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 TEST_PATH=/root/ndctl/build/test MESON_TEST_ITERATION=1 DAXCTL=/root/ndctl/build/daxctl/daxctl NDCTL=/root/ndctl/build/ndctl/ndctl ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 DATA_PATH=/root/ndctl/test /bin/bash /root/ndctl/test/cxl-poison.sh >> >> 11/12 ndctl:cxl / cxl-update-firmware.sh OK 66.23s >> 12/12 ndctl:cxl / cxl-security.sh SKIP 0.34s exit status 77 >> >> Ok: 9 >> Expected Fail: 0 >> Fail: 2 >> Unexpected Pass: 0 >> Skipped: 1 >> Timeout: 0 >> >> My understanding is that these CXL tests are using mock CFMWs, not the actual physical memory regions at their fixed locations. So I wonder executing these set of test on a “sane" CXL emulation setup (run_qemu.sh creates) that the Intel folk is using does matter or not. > > Right - these test run on the mock CFMW's that the cxl-test module > creates. As far as running on a 'sane' CXL emulation setup, like > run_qemu.sh, I may not be understanding the question, but I'll take > a shot. The qemu defined CXL devices do not matter at all for the cxl > unit test run. The unit tests only uses the mock cxl/test environment > provided by the cxl-test module. The qemu CXL devices are irrelevant. Ah, I see thanks for the clarification. That’s what I needed to know. > > Let me know if I missed the point of you were making. > > I noticed your test output FAIL cases, probably for CONFIG_TRACING not > enabled, and posted a patch to turn those into SKIPs. Indeed, by looking at the test logs I figured that. Now like Jonathan confirmed I just seen the same results: 1/12 ndctl:cxl / cxl-topology.sh OK 106.48s 2/12 ndctl:cxl / cxl-region-sysfs.sh OK 55.90s 3/12 ndctl:cxl / cxl-labels.sh OK 54.95s 4/12 ndctl:cxl / cxl-create-region.sh OK 141.98s 5/12 ndctl:cxl / cxl-xor-region.sh OK 66.00s 6/12 ndctl:cxl / cxl-events.sh OK 33.82s 7/12 ndctl:cxl / cxl-sanitize.sh OK 34.92s 8/12 ndctl:cxl / cxl-destroy-region.sh OK 41.08s 9/12 ndctl:cxl / cxl-qos-class.sh OK 40.55s 10/12 ndctl:cxl / cxl-poison.sh OK 82.08s 11/12 ndctl:cxl / cxl-update-firmware.sh OK 99.39s 12/12 ndctl:cxl / cxl-security.sh SKIP 1.03s exit status 77 Thanks again for your comments. Itaru. > > --Alison > >> >> Itaru. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-22 13:56 ` Jonathan Cameron 2025-05-22 18:19 ` Dan Williams 2025-05-22 21:46 ` Itaru Kitayama @ 2025-05-23 5:52 ` Marc Herbert 2 siblings, 0 replies; 15+ messages in thread From: Marc Herbert @ 2025-05-23 5:52 UTC (permalink / raw) To: Jonathan Cameron, Dan Williams; +Cc: Itaru Kitayama, Dave Jiang, linux-cxl On 2025-05-22 06:56, Jonathan Cameron wrote: > > With the below - all 11 tests in ndctl cxl test suite pass for me. > > From b287ff2c5ee7fbe507ef8cb61df3e4e156a9773f Mon Sep 17 00:00:00 2001 > From: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Date: Thu, 22 May 2025 14:20:42 +0100 > Subject: [PATCH] cxl_test: Limit location for fake CFMWS to mappable range > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > tools/testing/cxl/test/cxl.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c > index 8a5815ca870d..b4e6c7659ac4 100644 > --- a/tools/testing/cxl/test/cxl.c > +++ b/tools/testing/cxl/test/cxl.c > @@ -1328,6 +1328,7 @@ static int cxl_mem_init(void) > static __init int cxl_test_init(void) > { > int rc, i; > + struct range mappable; > > cxl_acpi_test(); > cxl_core_test(); > @@ -1342,8 +1343,11 @@ static __init int cxl_test_init(void) > rc = -ENOMEM; > goto err_gen_pool_create; > } > + mappable = mhp_get_pluggable_range(true); > > - rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, > + rc = gen_pool_add(cxl_mock_pool, > + min(iomem_resource.end + 1 - SZ_64G, > + mappable.end + 1 - SZ_64G), > SZ_64G, NUMA_NO_NODE); > if (rc) > goto err_gen_pool_add; Tested-by: Marc Herbert <marc.herbert@linux.intel.com> cxl-security.sh aside, this patch turns all CXL test results from red to green and finally fixes the 3 months old https://github.com/pmem/ndctl/issues/278 which has a ton of relevant context. Without Johanthan's fix, Itaru's and everyone else's run should be all red too. But they are not all red because the current test suite does not care about kernel errors (!) which causes a lot of false negatives ("green failures"). These serious false negatives are addressed by my patch v2 https://lore.kernel.org/linux-cxl/20250515021730.1201996-3-marc.herbert@linux.intel.com/T/#u Please help with review and testing it, thanks! This is the perfect timing to test it. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-21 8:39 Internal error: Oops: 0000000096000044 [#11] SMP Itaru Kitayama 2025-05-21 15:31 ` Dave Jiang @ 2025-05-21 15:33 ` Alison Schofield 2025-05-21 15:36 ` Jonathan Cameron 2025-05-21 15:41 ` Alison Schofield 3 siblings, 0 replies; 15+ messages in thread From: Alison Schofield @ 2025-05-21 15:33 UTC (permalink / raw) To: Itaru Kitayama; +Cc: linux-cxl On Wed, May 21, 2025 at 05:39:05PM +0900, Itaru Kitayama wrote: > Hi, > On arm64/virt QEMU, the cxl/next (as of today) kernel prints out Internal errors: How do you repro this? Is it repeating on each run of cxl-events.sh for you? Any repro info is useful. I've seen once but have not been able to repro myself. Since I saw in March I'm thinking not new to cxl/next. > > [ 80.968299] [ T48] Internal error: Oops: 0000000096000044 [#11] > SMP > [ 80.989250] [ T48] Modules linked in: cxl_mock_mem(O) cfg80211 > rfkill cxl_test(O) cxl_mem(O) cxl_pmem(O) cxl_acpi(O) cxl_port(O) > cxl_mock(O) libnvdimm encrypted_keys trusted caam_jr caam asn1_encoder > caamhash_desc caamalg_desc error crypto_engine authenc libdes fuse drm > backlight ip_tables x_tables sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 > cxl_core(O) fwctl btrfs blake2b_generic xor xor_neon raid6_pq > zstd_compress ipv6 > [ 80.992210] [ T48] CPU: 1 UID: 0 PID: 48 Comm: kworker/u8:2 > Tainted: G D O 6.15.0-rc4-00040-g128ad8fa385b #40 PREEMPT > [ 80.992791] [ T48] Tainted: [D]=DIE, [O]=OOT_MODULE > [ 80.993039] [ T48] Hardware name: QEMU QEMU Virtual Machine, BIOS > 2025.02-3ubuntu2 04/04/2025 > [ 80.993400] [ T48] Workqueue: async async_run_entry_fn > [ 80.994718] [ T48] pstate: 61402005 (nZCv daif +PAN -UAO -TCO > +DIT -SSBS BTYPE=--) > [ 80.995329] [ T48] pc : cxl_mock_mbox_send+0xec/0x12c0 > [cxl_mock_mem] > [ 80.995691] [ T48] lr : cxl_internal_send_cmd+0x40/0x104 > [cxl_core] > [ 80.996189] [ T48] sp : ffff800080d0b9f0 > [ 80.996380] [ T48] x29: ffff800080d0ba70 x28: fff0000008dd2410 > x27: fff00000088fb390 > [ 80.996714] [ T48] x26: ffff800080d0bb07 x25: 0000000000000100 > x24: 0000000000000003 > [ 80.997135] [ T48] x23: 0000000000000020 x22: fff0000008dd2410 > x21: 0000000000000002 > [ 80.998119] [ T48] x20: fff00000088fb080 x19: ffff800080d0bb08 > x18: 00000000ffffffff > [ 80.998419] [ T48] x17: 0000000000000000 x16: ffffa8d169128748 > x15: ffff800080d0b5ad > [ 80.999243] [ T48] x14: ffff800080d0b400 x13: ffff800080d0b5b8 > x12: fff000006f7a0000 > [ 81.000519] [ T48] x11: 0000000000000058 x10: 0000000000000018 x9 > : fff000006f7a0000 > [ 81.001337] [ T48] x8 : ffff800080d0bb48 x7 : fff0000074fa0000 x6 > : fff0000074fa0000 > [ 81.002497] [ T48] x5 : fff000007f937508 x4 : 0000000000000001 x3 > : 0000000000001000 > [ 81.003993] [ T48] x2 : 0000000000001000 x1 : 0000000000000000 x0 > : 0000000000000088 > [ 81.004223] [ T48] Call trace: > [ 81.004795] [ T48] cxl_mock_mbox_send+0xec/0x12c0 [cxl_mock_mem] > (P) > [ 81.005136] [ T48] cxl_internal_send_cmd+0x40/0x104 [cxl_core] > [ 81.005520] [ T48] cxl_mem_get_records_log+0xbc/0x198 [cxl_core] > [ 81.006042] [ T48] cxl_mem_get_event_records+0xb0/0xc0 > [cxl_core] > [ 81.006246] [ T48] cxl_mock_mem_probe+0x568/0x6f0 [cxl_mock_mem] > [ 81.006417] [ T48] platform_probe+0x68/0xd8 > [ 81.008340] [ T48] really_probe+0xc0/0x39c > [ 81.008885] [ T48] __driver_probe_device+0xd0/0x14c > [ 81.009539] [ T48] driver_probe_device+0x3c/0x120 > [ 81.010239] [ T48] __driver_attach_async_helper+0x50/0xec > [ 81.011130] [ T48] async_run_entry_fn+0x34/0x14c > [ 81.011276] [ T48] process_one_work+0x148/0x284 > [ 81.011420] [ T48] worker_thread+0x2c4/0x3e0 > [ 81.011552] [ T48] kthread+0x12c/0x204 > [ 81.011693] [ T48] ret_from_fork+0x10/0x20 > [ 81.011840] [ T48] Code: 54001b28 a90c6bf9 52801100 f9400a61 > (a9007c3f) > [ 81.013772] [ T48] ---[ end trace 0000000000000000 ]--- > > How serious is this? > > Thanks, > Itaru. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-21 8:39 Internal error: Oops: 0000000096000044 [#11] SMP Itaru Kitayama 2025-05-21 15:31 ` Dave Jiang 2025-05-21 15:33 ` Alison Schofield @ 2025-05-21 15:36 ` Jonathan Cameron 2025-05-21 15:41 ` Alison Schofield 3 siblings, 0 replies; 15+ messages in thread From: Jonathan Cameron @ 2025-05-21 15:36 UTC (permalink / raw) To: Itaru Kitayama; +Cc: linux-cxl On Wed, 21 May 2025 17:39:05 +0900 Itaru Kitayama <itaru.kitayama@linux.dev> wrote: > Hi, > On arm64/virt QEMU, the cxl/next (as of today) kernel prints out Internal errors: > > [ 80.968299] [ T48] Internal error: Oops: 0000000096000044 [#11] > SMP > [ 80.989250] [ T48] Modules linked in: cxl_mock_mem(O) cfg80211 > rfkill cxl_test(O) cxl_mem(O) cxl_pmem(O) cxl_acpi(O) cxl_port(O) > cxl_mock(O) libnvdimm encrypted_keys trusted caam_jr caam asn1_encoder > caamhash_desc caamalg_desc error crypto_engine authenc libdes fuse drm > backlight ip_tables x_tables sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 > cxl_core(O) fwctl btrfs blake2b_generic xor xor_neon raid6_pq > zstd_compress ipv6 > [ 80.992210] [ T48] CPU: 1 UID: 0 PID: 48 Comm: kworker/u8:2 > Tainted: G D O 6.15.0-rc4-00040-g128ad8fa385b #40 PREEMPT > [ 80.992791] [ T48] Tainted: [D]=DIE, [O]=OOT_MODULE > [ 80.993039] [ T48] Hardware name: QEMU QEMU Virtual Machine, BIOS > 2025.02-3ubuntu2 04/04/2025 > [ 80.993400] [ T48] Workqueue: async async_run_entry_fn > [ 80.994718] [ T48] pstate: 61402005 (nZCv daif +PAN -UAO -TCO > +DIT -SSBS BTYPE=--) > [ 80.995329] [ T48] pc : cxl_mock_mbox_send+0xec/0x12c0 > [cxl_mock_mem] > [ 80.995691] [ T48] lr : cxl_internal_send_cmd+0x40/0x104 > [cxl_core] > [ 80.996189] [ T48] sp : ffff800080d0b9f0 > [ 80.996380] [ T48] x29: ffff800080d0ba70 x28: fff0000008dd2410 > x27: fff00000088fb390 > [ 80.996714] [ T48] x26: ffff800080d0bb07 x25: 0000000000000100 > x24: 0000000000000003 > [ 80.997135] [ T48] x23: 0000000000000020 x22: fff0000008dd2410 > x21: 0000000000000002 > [ 80.998119] [ T48] x20: fff00000088fb080 x19: ffff800080d0bb08 > x18: 00000000ffffffff > [ 80.998419] [ T48] x17: 0000000000000000 x16: ffffa8d169128748 > x15: ffff800080d0b5ad > [ 80.999243] [ T48] x14: ffff800080d0b400 x13: ffff800080d0b5b8 > x12: fff000006f7a0000 > [ 81.000519] [ T48] x11: 0000000000000058 x10: 0000000000000018 x9 > : fff000006f7a0000 > [ 81.001337] [ T48] x8 : ffff800080d0bb48 x7 : fff0000074fa0000 x6 > : fff0000074fa0000 > [ 81.002497] [ T48] x5 : fff000007f937508 x4 : 0000000000000001 x3 > : 0000000000001000 > [ 81.003993] [ T48] x2 : 0000000000001000 x1 : 0000000000000000 x0 > : 0000000000000088 > [ 81.004223] [ T48] Call trace: > [ 81.004795] [ T48] cxl_mock_mbox_send+0xec/0x12c0 [cxl_mock_mem] > (P) > [ 81.005136] [ T48] cxl_internal_send_cmd+0x40/0x104 [cxl_core] > [ 81.005520] [ T48] cxl_mem_get_records_log+0xbc/0x198 [cxl_core] > [ 81.006042] [ T48] cxl_mem_get_event_records+0xb0/0xc0 > [cxl_core] > [ 81.006246] [ T48] cxl_mock_mem_probe+0x568/0x6f0 [cxl_mock_mem] > [ 81.006417] [ T48] platform_probe+0x68/0xd8 > [ 81.008340] [ T48] really_probe+0xc0/0x39c > [ 81.008885] [ T48] __driver_probe_device+0xd0/0x14c > [ 81.009539] [ T48] driver_probe_device+0x3c/0x120 > [ 81.010239] [ T48] __driver_attach_async_helper+0x50/0xec > [ 81.011130] [ T48] async_run_entry_fn+0x34/0x14c > [ 81.011276] [ T48] process_one_work+0x148/0x284 > [ 81.011420] [ T48] worker_thread+0x2c4/0x3e0 > [ 81.011552] [ T48] kthread+0x12c/0x204 > [ 81.011693] [ T48] ret_from_fork+0x10/0x20 > [ 81.011840] [ T48] Code: 54001b28 a90c6bf9 52801100 f9400a61 > (a9007c3f) > [ 81.013772] [ T48] ---[ end trace 0000000000000000 ]--- > > How serious is this? > Definitely not encouraging. I'll see if I can replicate. I had a suitable x86 config ready and not seeing it there. Jonathan > Thanks, > Itaru. > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Internal error: Oops: 0000000096000044 [#11] SMP 2025-05-21 8:39 Internal error: Oops: 0000000096000044 [#11] SMP Itaru Kitayama ` (2 preceding siblings ...) 2025-05-21 15:36 ` Jonathan Cameron @ 2025-05-21 15:41 ` Alison Schofield 3 siblings, 0 replies; 15+ messages in thread From: Alison Schofield @ 2025-05-21 15:41 UTC (permalink / raw) To: Itaru Kitayama; +Cc: linux-cxl On Wed, May 21, 2025 at 05:39:05PM +0900, Itaru Kitayama wrote: > Hi, > On arm64/virt QEMU, the cxl/next (as of today) kernel prints out Internal errors: Now I'm connecting more dots. You posted this previously here: https://lore.kernel.org/linux-cxl/49A4B521-AB66-4037-A23D-1D0D7AF0F42F@linux.dev/ > > [ 80.968299] [ T48] Internal error: Oops: 0000000096000044 [#11] > SMP > [ 80.989250] [ T48] Modules linked in: cxl_mock_mem(O) cfg80211 > rfkill cxl_test(O) cxl_mem(O) cxl_pmem(O) cxl_acpi(O) cxl_port(O) > cxl_mock(O) libnvdimm encrypted_keys trusted caam_jr caam asn1_encoder > caamhash_desc caamalg_desc error crypto_engine authenc libdes fuse drm > backlight ip_tables x_tables sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 > cxl_core(O) fwctl btrfs blake2b_generic xor xor_neon raid6_pq > zstd_compress ipv6 > [ 80.992210] [ T48] CPU: 1 UID: 0 PID: 48 Comm: kworker/u8:2 > Tainted: G D O 6.15.0-rc4-00040-g128ad8fa385b #40 PREEMPT > [ 80.992791] [ T48] Tainted: [D]=DIE, [O]=OOT_MODULE > [ 80.993039] [ T48] Hardware name: QEMU QEMU Virtual Machine, BIOS > 2025.02-3ubuntu2 04/04/2025 > [ 80.993400] [ T48] Workqueue: async async_run_entry_fn > [ 80.994718] [ T48] pstate: 61402005 (nZCv daif +PAN -UAO -TCO > +DIT -SSBS BTYPE=--) > [ 80.995329] [ T48] pc : cxl_mock_mbox_send+0xec/0x12c0 > [cxl_mock_mem] > [ 80.995691] [ T48] lr : cxl_internal_send_cmd+0x40/0x104 > [cxl_core] > [ 80.996189] [ T48] sp : ffff800080d0b9f0 > [ 80.996380] [ T48] x29: ffff800080d0ba70 x28: fff0000008dd2410 > x27: fff00000088fb390 > [ 80.996714] [ T48] x26: ffff800080d0bb07 x25: 0000000000000100 > x24: 0000000000000003 > [ 80.997135] [ T48] x23: 0000000000000020 x22: fff0000008dd2410 > x21: 0000000000000002 > [ 80.998119] [ T48] x20: fff00000088fb080 x19: ffff800080d0bb08 > x18: 00000000ffffffff > [ 80.998419] [ T48] x17: 0000000000000000 x16: ffffa8d169128748 > x15: ffff800080d0b5ad > [ 80.999243] [ T48] x14: ffff800080d0b400 x13: ffff800080d0b5b8 > x12: fff000006f7a0000 > [ 81.000519] [ T48] x11: 0000000000000058 x10: 0000000000000018 x9 > : fff000006f7a0000 > [ 81.001337] [ T48] x8 : ffff800080d0bb48 x7 : fff0000074fa0000 x6 > : fff0000074fa0000 > [ 81.002497] [ T48] x5 : fff000007f937508 x4 : 0000000000000001 x3 > : 0000000000001000 > [ 81.003993] [ T48] x2 : 0000000000001000 x1 : 0000000000000000 x0 > : 0000000000000088 > [ 81.004223] [ T48] Call trace: > [ 81.004795] [ T48] cxl_mock_mbox_send+0xec/0x12c0 [cxl_mock_mem] > (P) > [ 81.005136] [ T48] cxl_internal_send_cmd+0x40/0x104 [cxl_core] > [ 81.005520] [ T48] cxl_mem_get_records_log+0xbc/0x198 [cxl_core] > [ 81.006042] [ T48] cxl_mem_get_event_records+0xb0/0xc0 > [cxl_core] > [ 81.006246] [ T48] cxl_mock_mem_probe+0x568/0x6f0 [cxl_mock_mem] > [ 81.006417] [ T48] platform_probe+0x68/0xd8 > [ 81.008340] [ T48] really_probe+0xc0/0x39c > [ 81.008885] [ T48] __driver_probe_device+0xd0/0x14c > [ 81.009539] [ T48] driver_probe_device+0x3c/0x120 > [ 81.010239] [ T48] __driver_attach_async_helper+0x50/0xec > [ 81.011130] [ T48] async_run_entry_fn+0x34/0x14c > [ 81.011276] [ T48] process_one_work+0x148/0x284 > [ 81.011420] [ T48] worker_thread+0x2c4/0x3e0 > [ 81.011552] [ T48] kthread+0x12c/0x204 > [ 81.011693] [ T48] ret_from_fork+0x10/0x20 > [ 81.011840] [ T48] Code: 54001b28 a90c6bf9 52801100 f9400a61 > (a9007c3f) > [ 81.013772] [ T48] ---[ end trace 0000000000000000 ]--- > > How serious is this? > > Thanks, > Itaru. > ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-05-23 5:52 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-21 8:39 Internal error: Oops: 0000000096000044 [#11] SMP Itaru Kitayama 2025-05-21 15:31 ` Dave Jiang 2025-05-21 20:38 ` Itaru Kitayama 2025-05-21 20:46 ` Dave Jiang 2025-05-21 23:28 ` Itaru Kitayama 2025-05-21 23:34 ` Dan Williams 2025-05-22 13:56 ` Jonathan Cameron 2025-05-22 18:19 ` Dan Williams 2025-05-22 21:46 ` Itaru Kitayama 2025-05-23 3:28 ` Alison Schofield 2025-05-23 4:56 ` Itaru Kitayama 2025-05-23 5:52 ` Marc Herbert 2025-05-21 15:33 ` Alison Schofield 2025-05-21 15:36 ` Jonathan Cameron 2025-05-21 15:41 ` Alison Schofield
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox