public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fwctl: Fix class init ordering to avoid NULL pointer dereference on device removal
@ 2026-04-09  5:19 Richard Cheng
  2026-04-09 15:09 ` Dave Jiang
  0 siblings, 1 reply; 2+ messages in thread
From: Richard Cheng @ 2026-04-09  5:19 UTC (permalink / raw)
  To: dave.jiang, jgg, saeedm
  Cc: Jonathan.Cameron, linux-kernel, jan, newtonl, kristinc, sreddym,
	skomatineni, vidyas, kaihengf, mochs, Richard Cheng

CXL is linked before fwctl in drivers/Makefile. Both use `module_init,
so `cxl_pci_driver_init()` runs first. When `cxl_pci_probe()` calls
`fwctl_register()` and then `device_add()`, fwctl_class is not yet
registered because fwctl_init() hasn't run, causing `class_to_subsys()`
to return NULL and skip knode_class initialization.
On device removal, `class_to_subsys()` returns non-NULL, and
`device_del()` calls `klist_del()` on the uninitialized knode,
triggering a NULL pointer dereference [1].

Fixes: 858ce2f56b52 ("cxl: Add FWCTL support to CXL")
Signed-off-by: Richard Cheng <icheng@nvidia.com>
Reviewed-by: Kai-Heng Feng <kaihengf@nvidia.com>
---
[1]:
The error is triggered on with 7.0.0-rc6 kernel with CXL
device.
The PCI topology is as below
```
$ sudo lspci -tv
-[0001:00]---00.0-[01]--+-00.0  Mellanox Technologies CX8 Family [ConnectX-8]
                        +-00.1  Mellanox Technologies CX8 Family [ConnectX-8]
                        +-00.2  Mellanox Technologies CX8 Family [ConnectX-8]
                        \-00.3  Mellanox Technologies CX8 Family [ConnectX-8]
-[0002:00]---00.0-[01]--
-+-[0003:00]---00.0-[01]----00.0  Montage Technology Co., Ltd. Device c002
 \-[0003:80]---00.0-[81]----00.0  Montage Technology Co., Ltd. Device c002
-[0004:00]---00.0-[01]--
-+-[0005:00]---00.0-[01]----00.0  Samsung Electronics Co Ltd Device a810
 +-[0005:40]---00.0-[41]----00.0  Samsung Electronics Co Ltd Device a810
 +-[0005:c0]---00.0-[c1]----00.0  Intel Corporation I210 Gigabit Network Connection
 \-[0005:e0]---00.0-[e1-e2]----00.0-[e2]--+-00.0  ASPEED Technology, Inc. ASPEED Graphics Family
                                          \-02.0  ASPEED Technology, Inc. Device 2603
-+-[0006:00]---00.0-[01]--
 \-[0006:80]---00.0-[81]--
```
The CXL device is on 0003:01:00.0 CXL [0502]: Montage Technology Co.,
Ltd. Device [1b00:c002] (rev 03) and another one is 0003:81:00.0 CXL
[0502]: Montage Technology Co., Ltd. Device [1b00:c002] (rev 03). The
one we are targeting is 0003:01:00.0.

The kernel should be built with CONFIG_FWCTL=y and
CONFIG_CXL_FEATURES=y, otherwise the bug won't be triggered.

With `sudo setpci -v -s 0003:00:00.0 CAP_EXP+0x10.b=0x10:0x10` to bring
its root port link down and the error log in dmesg is like the following
```
[  890.137377] pcieport 0003:00:00.0: pciehp: Slot(0-2): Link Down
[  890.145201] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[  890.145203] Mem abort info:
[  890.145205]   ESR = 0x0000000096000006
[  890.145207]   EC = 0x25: DABT (current EL), IL = 32 bits
[  890.145208]   SET = 0, FnV = 0
[  890.145209]   EA = 0, S1PTW = 0
[  890.145211]   FSC = 0x06: level 2 translation fault
[  890.145212] Data abort info:
[  890.145213]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[  890.145214]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  890.145215]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  890.145216] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001d1937000
[  890.145218] [0000000000000020] pgd=08000001d193e403, p4d=08000001d193e403, pud=08000001d193d403, pmd=0000000000000000
[  890.145223] Internal error: Oops: 0000000096000006 [#1]  SMP
[  890.214749] Modules linked in: nft_masq nft_ct nft_reject_ipv4 nf_reject_ipv4 nft_reject act_csum cls_u32 sch_htb nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bridge stp llc qrtr cfg80211 binfmt_misc nls_iso8859_1 ast cxl_pmu dax_hmem nvidia_cspmu acpi_power_meter coresight_trbe sbsa_gwdt ipmi_ssif acpi_ipmi cxl_acpi arm_smmuv3_pmu coresight arm_cspmu_module arm_spe_pmu ipmi_devintf ipmi_msghandler cxl_pmem cppc_cpufreq sch_fq_codel dm_multipath nvme_fabrics efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs libblake2b raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 linear mlx5_ib macsec ib_uverbs ib_core mlx5_dpll ghash_ce sm4_ce_gcm nvme sm4_ce_ccm mlx5_core nvme_core sm4_ce mlxfw nvme_keyring sm4_ce_cipher tls sm4 igb nvme_auth sm3_ce arm_smccc_trng i2c_algo_bit hkdf psample aes_neon_bs aes_neon_blk aes_ce_blk
[  890.294573] CPU: 28 UID: 0 PID: 1350 Comm: irq/156-pciehp Not tainted 7.0.0-rc6-richard+ #3 PREEMPT(full)
[  890.304108] Hardware name:  , BIOS buildbrain-gcid-sbios-44706962 Mon Mar 30 03:04:08 PM UTC 2026
[  890.312959] pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[  890.319660] pc : klist_put+0x2c/0x140
[  890.323441] lr : klist_del+0x18/0x38
[  890.327050] sp : ffff80008cfd39e0
[  890.330229] x29: ffff80008cfd39e0 x28: 0000000000000018 x27: ffffd9f0a1a9deb0
[  890.337275] x26: ffffd9f0a1a9a2b8 x25: ffff00009c3ebe78 x24: ffff00009c3ebe00
[  890.344321] x23: ffff00008d1cc800 x22: 0000000000000001 x21: ffff00009c3ebe68
[  890.351367] x20: 0000000000000000 x19: ffff00009afc7188 x18: ffff80008cfb50a8
[  890.358413] x17: 0000000000000000 x16: 0000000000000000 x15: 633d4d4554535953
[  890.365459] x14: 42555300302e306d x13: 5300302e306d656d x12: 5f756d702f302e30
[  890.372504] x11: 0035333033343d4d x10: 0000000000000000 x9 : ffffd9f0a31b9a5c
[  890.379551] x8 : 0101010101010101 x7 : 7fff7f7f7f7f7f7f x6 : 339eff3033746f62
[  890.386597] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[  890.393642] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000000
[  890.400689] Call trace:
[  890.402923]  klist_put+0x2c/0x140 (P)
[  890.406703]  klist_del+0x18/0x38
[  890.409968]  device_del+0x120/0x3c8
[  890.413148]  cdev_device_del+0x2c/0xa8
[  890.416928]  fwctl_unregister+0x11c/0x128
[  890.420967]  free_memdev_fwctl+0x24/0x50
[  890.424919]  devm_action_release+0x20/0x48
[  890.428786]  release_nodes+0x68/0xc8
[  890.432481]  devres_release_all+0x9c/0x130
[  890.436348]  device_unbind_cleanup+0x24/0xb0
[  890.440730]  device_release_driver_internal+0x234/0x2e0
[  890.445627]  device_release_driver+0x24/0x50
[  890.450096]  pci_stop_bus_device+0x88/0x100
[  890.453962]  pci_stop_and_remove_bus_device+0x24/0x58
[  890.459118]  pciehp_unconfigure_device+0xb4/0x1e0
[  890.463758]  pciehp_disable_slot+0x7c/0x190
[  890.467710]  pciehp_handle_presence_or_link_change+0x94/0x518
[  890.473554]  pciehp_ist+0x1c8/0x310
[  890.476819]  irq_thread_fn+0x38/0xd0
[  890.480599]  irq_thread+0x1ac/0x450
[  890.483779]  kthread+0x13c/0x150
[  890.487215]  ret_from_fork+0x10/0x20
[  890.490739] Code: 12001c36 f9400014 927ffa94 aa1403e0 (f9401295)
[  890.496668] ---[ end trace 0000000000000000 ]---
[  890.552081] genirq: exiting task "irq/156-pciehp" (1350) is an active IRQ thread (irq 156)
```


Best regards,
Richard Cheng.
---
 drivers/fwctl/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index bc6378506296..098c3824ad75 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -415,7 +415,7 @@ static void __exit fwctl_exit(void)
 	unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
 }
 
-module_init(fwctl_init);
+subsys_initcall(fwctl_init);
 module_exit(fwctl_exit);
 MODULE_DESCRIPTION("fwctl device firmware access framework");
 MODULE_LICENSE("GPL");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] fwctl: Fix class init ordering to avoid NULL pointer dereference on device removal
  2026-04-09  5:19 [PATCH] fwctl: Fix class init ordering to avoid NULL pointer dereference on device removal Richard Cheng
@ 2026-04-09 15:09 ` Dave Jiang
  0 siblings, 0 replies; 2+ messages in thread
From: Dave Jiang @ 2026-04-09 15:09 UTC (permalink / raw)
  To: Richard Cheng, jgg, saeedm
  Cc: Jonathan.Cameron, linux-kernel, jan, newtonl, kristinc, sreddym,
	skomatineni, vidyas, kaihengf, mochs



On 4/8/26 10:19 PM, Richard Cheng wrote:
> CXL is linked before fwctl in drivers/Makefile. Both use `module_init,
> so `cxl_pci_driver_init()` runs first. When `cxl_pci_probe()` calls
> `fwctl_register()` and then `device_add()`, fwctl_class is not yet
> registered because fwctl_init() hasn't run, causing `class_to_subsys()`
> to return NULL and skip knode_class initialization.
> On device removal, `class_to_subsys()` returns non-NULL, and
> `device_del()` calls `klist_del()` on the uninitialized knode,
> triggering a NULL pointer dereference [1].
> 
> Fixes: 858ce2f56b52 ("cxl: Add FWCTL support to CXL")
> Signed-off-by: Richard Cheng <icheng@nvidia.com>
> Reviewed-by: Kai-Heng Feng <kaihengf@nvidia.com>

I've no objections. It makes sense to have fwctl init during subsys ahead of other drivers. Although do we need to handle the scenario where this happens for whatever reasons and it should be handled gracefully on device removal?

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


> ---
> [1]:
> The error is triggered on with 7.0.0-rc6 kernel with CXL
> device.
> The PCI topology is as below
> ```
> $ sudo lspci -tv
> -[0001:00]---00.0-[01]--+-00.0  Mellanox Technologies CX8 Family [ConnectX-8]
>                         +-00.1  Mellanox Technologies CX8 Family [ConnectX-8]
>                         +-00.2  Mellanox Technologies CX8 Family [ConnectX-8]
>                         \-00.3  Mellanox Technologies CX8 Family [ConnectX-8]
> -[0002:00]---00.0-[01]--
> -+-[0003:00]---00.0-[01]----00.0  Montage Technology Co., Ltd. Device c002
>  \-[0003:80]---00.0-[81]----00.0  Montage Technology Co., Ltd. Device c002
> -[0004:00]---00.0-[01]--
> -+-[0005:00]---00.0-[01]----00.0  Samsung Electronics Co Ltd Device a810
>  +-[0005:40]---00.0-[41]----00.0  Samsung Electronics Co Ltd Device a810
>  +-[0005:c0]---00.0-[c1]----00.0  Intel Corporation I210 Gigabit Network Connection
>  \-[0005:e0]---00.0-[e1-e2]----00.0-[e2]--+-00.0  ASPEED Technology, Inc. ASPEED Graphics Family
>                                           \-02.0  ASPEED Technology, Inc. Device 2603
> -+-[0006:00]---00.0-[01]--
>  \-[0006:80]---00.0-[81]--
> ```
> The CXL device is on 0003:01:00.0 CXL [0502]: Montage Technology Co.,
> Ltd. Device [1b00:c002] (rev 03) and another one is 0003:81:00.0 CXL
> [0502]: Montage Technology Co., Ltd. Device [1b00:c002] (rev 03). The
> one we are targeting is 0003:01:00.0.
> 
> The kernel should be built with CONFIG_FWCTL=y and
> CONFIG_CXL_FEATURES=y, otherwise the bug won't be triggered.
> 
> With `sudo setpci -v -s 0003:00:00.0 CAP_EXP+0x10.b=0x10:0x10` to bring
> its root port link down and the error log in dmesg is like the following
> ```
> [  890.137377] pcieport 0003:00:00.0: pciehp: Slot(0-2): Link Down
> [  890.145201] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
> [  890.145203] Mem abort info:
> [  890.145205]   ESR = 0x0000000096000006
> [  890.145207]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  890.145208]   SET = 0, FnV = 0
> [  890.145209]   EA = 0, S1PTW = 0
> [  890.145211]   FSC = 0x06: level 2 translation fault
> [  890.145212] Data abort info:
> [  890.145213]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
> [  890.145214]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [  890.145215]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [  890.145216] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001d1937000
> [  890.145218] [0000000000000020] pgd=08000001d193e403, p4d=08000001d193e403, pud=08000001d193d403, pmd=0000000000000000
> [  890.145223] Internal error: Oops: 0000000096000006 [#1]  SMP
> [  890.214749] Modules linked in: nft_masq nft_ct nft_reject_ipv4 nf_reject_ipv4 nft_reject act_csum cls_u32 sch_htb nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bridge stp llc qrtr cfg80211 binfmt_misc nls_iso8859_1 ast cxl_pmu dax_hmem nvidia_cspmu acpi_power_meter coresight_trbe sbsa_gwdt ipmi_ssif acpi_ipmi cxl_acpi arm_smmuv3_pmu coresight arm_cspmu_module arm_spe_pmu ipmi_devintf ipmi_msghandler cxl_pmem cppc_cpufreq sch_fq_codel dm_multipath nvme_fabrics efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs libblake2b raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 linear mlx5_ib macsec ib_uverbs ib_core mlx5_dpll ghash_ce sm4_ce_gcm nvme sm4_ce_ccm mlx5_core nvme_core sm4_ce mlxfw nvme_keyring sm4_ce_cipher tls sm4 igb nvme_auth sm3_ce arm_smccc_trng i2c_algo_bit hkdf psample aes_neon_bs aes_neon_blk aes_ce_blk
> [  890.294573] CPU: 28 UID: 0 PID: 1350 Comm: irq/156-pciehp Not tainted 7.0.0-rc6-richard+ #3 PREEMPT(full)
> [  890.304108] Hardware name:  , BIOS buildbrain-gcid-sbios-44706962 Mon Mar 30 03:04:08 PM UTC 2026
> [  890.312959] pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [  890.319660] pc : klist_put+0x2c/0x140
> [  890.323441] lr : klist_del+0x18/0x38
> [  890.327050] sp : ffff80008cfd39e0
> [  890.330229] x29: ffff80008cfd39e0 x28: 0000000000000018 x27: ffffd9f0a1a9deb0
> [  890.337275] x26: ffffd9f0a1a9a2b8 x25: ffff00009c3ebe78 x24: ffff00009c3ebe00
> [  890.344321] x23: ffff00008d1cc800 x22: 0000000000000001 x21: ffff00009c3ebe68
> [  890.351367] x20: 0000000000000000 x19: ffff00009afc7188 x18: ffff80008cfb50a8
> [  890.358413] x17: 0000000000000000 x16: 0000000000000000 x15: 633d4d4554535953
> [  890.365459] x14: 42555300302e306d x13: 5300302e306d656d x12: 5f756d702f302e30
> [  890.372504] x11: 0035333033343d4d x10: 0000000000000000 x9 : ffffd9f0a31b9a5c
> [  890.379551] x8 : 0101010101010101 x7 : 7fff7f7f7f7f7f7f x6 : 339eff3033746f62
> [  890.386597] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
> [  890.393642] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000000
> [  890.400689] Call trace:
> [  890.402923]  klist_put+0x2c/0x140 (P)
> [  890.406703]  klist_del+0x18/0x38
> [  890.409968]  device_del+0x120/0x3c8
> [  890.413148]  cdev_device_del+0x2c/0xa8
> [  890.416928]  fwctl_unregister+0x11c/0x128
> [  890.420967]  free_memdev_fwctl+0x24/0x50
> [  890.424919]  devm_action_release+0x20/0x48
> [  890.428786]  release_nodes+0x68/0xc8
> [  890.432481]  devres_release_all+0x9c/0x130
> [  890.436348]  device_unbind_cleanup+0x24/0xb0
> [  890.440730]  device_release_driver_internal+0x234/0x2e0
> [  890.445627]  device_release_driver+0x24/0x50
> [  890.450096]  pci_stop_bus_device+0x88/0x100
> [  890.453962]  pci_stop_and_remove_bus_device+0x24/0x58
> [  890.459118]  pciehp_unconfigure_device+0xb4/0x1e0
> [  890.463758]  pciehp_disable_slot+0x7c/0x190
> [  890.467710]  pciehp_handle_presence_or_link_change+0x94/0x518
> [  890.473554]  pciehp_ist+0x1c8/0x310
> [  890.476819]  irq_thread_fn+0x38/0xd0
> [  890.480599]  irq_thread+0x1ac/0x450
> [  890.483779]  kthread+0x13c/0x150
> [  890.487215]  ret_from_fork+0x10/0x20
> [  890.490739] Code: 12001c36 f9400014 927ffa94 aa1403e0 (f9401295)
> [  890.496668] ---[ end trace 0000000000000000 ]---
> [  890.552081] genirq: exiting task "irq/156-pciehp" (1350) is an active IRQ thread (irq 156)
> ```
> 
> 
> Best regards,
> Richard Cheng.
> ---
>  drivers/fwctl/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index bc6378506296..098c3824ad75 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c
> @@ -415,7 +415,7 @@ static void __exit fwctl_exit(void)
>  	unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
>  }
>  
> -module_init(fwctl_init);
> +subsys_initcall(fwctl_init);
>  module_exit(fwctl_exit);
>  MODULE_DESCRIPTION("fwctl device firmware access framework");
>  MODULE_LICENSE("GPL");


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-09 15:09 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09  5:19 [PATCH] fwctl: Fix class init ordering to avoid NULL pointer dereference on device removal Richard Cheng
2026-04-09 15:09 ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox