From: "Li, Ming4" <ming4.li@intel.com>
To: Pengfei Xu <pengfei.xu@intel.com>, <rrichter@amd.com>
Cc: <linux-cxl@vger.kernel.org>, <dave.jiang@intel.com>,
<Jonathan.Cameron@huawei.com>, <dan.j.williams@intel.com>
Subject: Re: [CXL] There is BUG: slab-out-of-bounds in cxl_setup_parent_dport in v6.10
Date: Tue, 6 Aug 2024 13:19:13 +0800 [thread overview]
Message-ID: <7d1a47c8-4de5-44a9-b992-7f86d76366eb@intel.com> (raw)
In-Reply-To: <ZrGFWYKwxa1ld9Iz@xpf.sh.intel.com>
On 8/6/2024 10:07 AM, Pengfei Xu wrote:
> Hi Robert Richter and CXL experts,
>
> There is BUG: slab-out-of-bounds in cxl_setup_parent_dport in v6.10 when
> boot up CXL qemu environment.
>
> It's a kernel tools/testing/cxl testing in qemu simulated CXL environment.
>
> Related kconfig and dmesg are in attached of link:
> https://bugzilla.kernel.org/show_bug.cgi?id=219127
>
> It seems to be related to commit:
> f05fd10d138d cxl/pci: Add RCH downstream port AER register discovery
>
> KASAN and CXL kconfig could trigger this problem:
> "
> CONFIG_KASAN=y
> CONFIG_KASAN_GENERIC=y
> CONFIG_KASAN_INLINE=y
> CONFIG_KASAN_STACK=y
>
> CONFIG_CXL_BUS=m
> CONFIG_CXL_PCI=m
> CONFIG_CXL_MEM_RAW_COMMANDS=y
> CONFIG_CXL_ACPI=m
> CONFIG_CXL_PMEM=m
> CONFIG_CXL_MEM=m
> CONFIG_CXL_PORT=y
> CONFIG_CXL_SUSPEND=y
> CONFIG_CXL_REGION_INVALIDATION_TEST=y
> CONFIG_NVDIMM_SECURITY_TEST=y
> "
>
> Dmesg info:
> "
> [ 24.413405] ==================================================================
> [ 24.416332] BUG: KASAN: slab-out-of-bounds in cxl_setup_parent_dport+0x480/0x530 [cxl_core]
> [ 24.419291] Read of size 1 at addr ff110000676014f8 by task (udev-worker)/676[ 24.424403] CPU: 2 PID: 676 Comm: (udev-worker) Tainted: G O N 6.10.0-qemucxl #1
> [ 24.427232] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20240214-2.el9 02/14/2024
> [ 24.430089] Call Trace:
> [ 24.432534] <TASK>
> [ 24.434891] dump_stack_lvl+0xea/0x150
> [ 24.438131] print_report+0xce/0x610
> [ 24.440498] ? cxl_setup_parent_dport+0x480/0x530 [cxl_core]
> [ 24.443129] ? kasan_complete_mode_report_info+0x40/0x200
> [ 24.445602] ? cxl_setup_parent_dport+0x480/0x530 [cxl_core]
> [ 24.448221] kasan_report+0xcc/0x110
> [ 24.450527] ? cxl_setup_parent_dport+0x480/0x530 [cxl_core]
> [ 24.453140] __asan_report_load1_noabort+0x18/0x20
> [ 24.455455] cxl_setup_parent_dport+0x480/0x530 [cxl_core]
> [ 24.457986] cxl_mem_probe+0x49b/0xaa0 [cxl_mem]
> [ 24.460285] ? __pfx_cxl_mem_probe+0x10/0x10 [cxl_mem]
> [ 24.462592] ? sysfs_create_link+0x75/0xd0
> [ 24.464775] cxl_bus_probe+0x5e/0xc0 [cxl_core]
> [ 24.467153] ? __pfx_cxl_bus_probe+0x10/0x10 [cxl_core]
> [ 24.469632] really_probe+0x27c/0xac0
> [ 24.471750] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
> [ 24.474087] __driver_probe_device+0x1f3/0x460
> [ 24.476288] ? parse_option_str+0x149/0x190
> [ 24.478435] driver_probe_device+0x56/0x1b0
> [ 24.480543] __device_attach_driver+0x1e7/0x300
> [ 24.482682] bus_for_each_drv+0x159/0x1e0
> [ 24.484818] ? __pfx___device_attach_driver+0x10/0x10
> [ 24.486935] ? __pfx_bus_for_each_drv+0x10/0x10
> [ 24.489037] ? _raw_spin_unlock_irqrestore+0x45/0x70
> [ 24.491097] __device_attach+0x215/0x4f0
> [ 24.493055] ? __pfx___device_attach+0x10/0x10
> [ 24.495032] ? do_raw_spin_unlock+0x15c/0x210
> [ 24.497020] device_initial_probe+0x24/0x30
> [ 24.498922] bus_probe_device+0x18e/0x1d0
> [ 24.500732] device_add+0x11b6/0x1b60
> [ 24.502485] ? __pfx_device_add+0x10/0x10
> [ 24.504275] ? __pfx_exact_lock+0x10/0x10
> [ 24.506063] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [ 24.507931] ? kobject_get+0xc5/0x160
> [ 24.509619] cdev_device_add+0x13c/0x280
> [ 24.511319] devm_cxl_add_memdev+0x547/0x6f0 [cxl_core]
> [ 24.513287] cxl_mock_mem_probe+0xf1d/0x1d30 [cxl_mock_mem]
> [ 24.515133] ? __pfx_cxl_mock_mem_probe+0x10/0x10 [cxl_mock_mem]
> [ 24.516998] platform_probe+0x10a/0x200
> [ 24.518813] ? __pfx_platform_probe+0x10/0x10
> [ 24.520638] really_probe+0x27c/0xac0
> [ 24.522340] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
> [ 24.524194] __driver_probe_device+0x1f3/0x460
> [ 24.525972] ? parse_option_str+0x149/0x190
> [ 24.527763] driver_probe_device+0x56/0x1b0
> [ 24.529555] __driver_attach+0x277/0x570
> [ 24.531278] ? __pfx___driver_attach+0x10/0x10
> [ 24.532912] bus_for_each_dev+0x142/0x1e0
> [ 24.534474] ? __pfx_bus_for_each_dev+0x10/0x10
> [ 24.536094] ? _raw_spin_unlock+0x31/0x60
> [ 24.537676] driver_attach+0x49/0x60
> [ 24.539220] bus_add_driver+0x2f3/0x6b0
> [ 24.540781] driver_register+0x170/0x4b0
> [ 24.542334] ? __pfx_cxl_mock_mem_driver_init+0x10/0x10 [cxl_mock_mem]
> [ 24.544104] __platform_driver_register+0x66/0x80
> [ 24.545782] ? __pfx_cxl_mock_mem_driver_init+0x10/0x10 [cxl_mock_mem]
> [ 24.547579] cxl_mock_mem_driver_init+0x25/0xff0 [cxl_mock_mem]
> [ 24.549362] do_one_initcall+0x114/0x5d0
> [ 24.550991] ? __pfx_do_one_initcall+0x10/0x10
> [ 24.552593] ? __kasan_kmalloc+0x88/0xa0
> [ 24.554089] ? kasan_poison+0x3e/0x60
> [ 24.555511] ? kasan_unpoison+0x2c/0x60
> [ 24.557076] ? kasan_poison+0x3e/0x60
> [ 24.558573] ? __asan_register_globals+0x62/0x80
> [ 24.560188] ? __pfx_cxl_mock_mem_driver_init+0x10/0x10 [cxl_mock_mem]
> [ 24.562019] do_init_module+0x277/0x750
> [ 24.563556] load_module+0x5d1d/0x74f0
> [ 24.565124] ? __pfx_load_module+0x10/0x10
> [ 24.566656] ? __pfx_ima_post_read_file+0x10/0x10
> [ 24.568235] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> [ 24.569875] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [ 24.571521] ? security_kernel_post_read_file+0xa2/0xd0
> [ 24.573189] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [ 24.574851] ? kernel_read_file+0x503/0x820
> [ 24.576441] ? __pfx_kernel_read_file+0x10/0x10
> [ 24.577887] ? __pfx___lock_acquire+0x10/0x10
> [ 24.579390] init_module_from_file+0x12c/0x1a0
> [ 24.580988] ? init_module_from_file+0x12c/0x1a0
> [ 24.582575] ? __pfx_init_module_from_file+0x10/0x10
> [ 24.584234] ? __this_cpu_preempt_check+0x21/0x30
> [ 24.585831] ? do_raw_spin_unlock+0x15c/0x210
> [ 24.587460] idempotent_init_module+0x3f1/0x690
> [ 24.589126] ? __pfx_idempotent_init_module+0x10/0x10
> [ 24.590808] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [ 24.592513] ? __fget_light+0x17d/0x210
> [ 24.594058] __x64_sys_finit_module+0x10e/0x1a0
> [ 24.595643] x64_sys_call+0x137a/0x20d0
> [ 24.597160] do_syscall_64+0x6d/0x140
> [ 24.598688] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 24.600350] RIP: 0033:0x7fbac6f3185d
> [ 24.601923] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48
> [ 24.606145] RSP: 002b:00007ffd13414db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [ 24.608143] RAX: ffffffffffffffda RBX: 0000564200192ca0 RCX: 00007fbac6f3185d
> [ 24.610130] RDX: 0000000000000000 RSI: 00007fbac758707d RDI: 0000000000000006
> [ 24.612106] RBP: 00007ffd13414e70 R08: 0000000000000000 R09: 00007ffd13414e00
> [ 24.614112] R10: 0000000000000006 R11: 0000000000000246 R12: 00007fbac758707d
> [ 24.616113] R13: 0000000000020000 R14: 0000564200159890 R15: 0000564200195a20
> [ 24.618182] </TASK>[ 24.621370] Allocated by task 615:
> [ 24.623062] kasan_save_stack+0x2c/0x60
> [ 24.624851] kasan_save_track+0x18/0x40
> [ 24.626603] kasan_save_alloc_info+0x3c/0x50
> [ 24.628411] __kasan_kmalloc+0x88/0xa0
> [ 24.630155] __kmalloc_noprof+0x1dd/0x4a0
> [ 24.631899] platform_device_alloc+0x3a/0x230
> [ 24.633595] fq_codel_reset+0x6c/0x370 [sch_fq_codel]
> [ 24.635477] do_one_initcall+0x114/0x5d0
> [ 24.637164] do_init_module+0x277/0x750
> [ 24.638818] load_module+0x5d1d/0x74f0
> [ 24.640483] init_module_from_file+0x12c/0x1a0
> [ 24.642214] idempotent_init_module+0x3f1/0x690
> [ 24.644013] __x64_sys_finit_module+0x10e/0x1a0
> [ 24.645772] x64_sys_call+0x137a/0x20d0
> [ 24.647459] do_syscall_64+0x6d/0x140
> [ 24.649117] entry_SYSCALL_64_after_hwframe+0x76/0x7e[ 24.652407] The buggy address belongs to the object at ff11000067601000
> which belongs to the cache kmalloc-2k of size 2048
> [ 24.656105] The buggy address is located 23 bytes to the right of
> allocated 1249-byte region [ff11000067601000, ff110000676014e1)[ 24.661503] The buggy address belongs to the physical page:
> [ 24.663371] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x67600
> [ 24.665505] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> [ 24.667605] flags: 0xfffffc0000040(head|node=0|zone=1|lastcpupid=0x1fffff)
> [ 24.669656] page_type: 0xffffefff(slab)
> [ 24.671471] raw: 000fffffc0000040 ff1100000d83d200 dead000000000122 0000000000000000
> [ 24.673568] raw: 0000000000000000 0000000000080008 00000001ffffefff 0000000000000000
> [ 24.675767] head: 000fffffc0000040 ff1100000d83d200 dead000000000122 0000000000000000
> [ 24.678014] head: 0000000000000000 0000000000080008 00000001ffffefff 0000000000000000
> [ 24.680121] head: 000fffffc0000003 ffd40000019d8001 ffffffffffffffff 0000000000000000
> [ 24.682313] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
> [ 24.684465] page dumped because: kasan: bad access detected[ 24.688127] Memory state around the buggy address:
> [ 24.690046] ff11000067601380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 24.692171] ff11000067601400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 24.694384] >ff11000067601480: 00 00 00 00 00 00 00 00 00 00 00 00 01 fc fc fc
> [ 24.696415] ^
> [ 24.698485] ff11000067601500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 24.700609] ff11000067601580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 24.702598] ==================================================================
> "
>
> I hope it's helpful.
>
> Best Regards,
> Thanks!
>
Hi Pengfei,
I can reproduce it on my environment with your configuration. I confirm the bug is the same as the bug I hit recently, I have sent out a patch for review, the link is https://lore.kernel.org/linux-cxl/20240806041547.1958787-1-ming4.li@intel.com/T/#u
the root cause is that cxl-test module creates an RCH topology, and using platform_device to create RCH downstream port in the RCH topology. There is a wrong pci_host_bridge got from to_pci_host_bridge(dport->dport_dev) in cxl_setup_parent_dport().
next prev parent reply other threads:[~2024-08-06 5:19 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-06 2:07 [CXL] There is BUG: slab-out-of-bounds in cxl_setup_parent_dport in v6.10 Pengfei Xu
2024-08-06 5:19 ` Li, Ming4 [this message]
2024-08-06 7:38 ` Pengfei Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7d1a47c8-4de5-44a9-b992-7f86d76366eb@intel.com \
--to=ming4.li@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=pengfei.xu@intel.com \
--cc=rrichter@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox