From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 145493126DD for ; Fri, 10 Apr 2026 17:16:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775841383; cv=none; b=T7Cg735aAE49lfSZZkYb+nwkKk9fPCzzAhJrlG2ZoVZCNHezOYbeoQ9oxTyIyTUXnZt88OW9wFR68NgawrfzWiybmEILigmf5u17YTq9sLLnTm0U43YqD31g5xK/pAXj9PeXYunFVRA7EKr1OGJvJZLLw3CIS8ZEbY53LKSilAw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775841383; c=relaxed/simple; bh=aMvJA6/x1SVKg2QgTLThhOQRAKZphVT54hR7FyiJ+ds=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=CGs4Zcsq6qE2hfhV7j2LPnVk94Zgcq8oBHbjJv0gpfbpO+wW9o146p0gMZgCvr9Re10jMCrxAHT7JeWu21+Gvxu4KtzoDbKGKZHhiWjg1mB43mjFn4l6fXYLtvoVisfLRAYJtQ6bklx/1aQJqWxtm+4V8+NKdhQHdn3hCyWP39k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=chbav92G; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="chbav92G" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1775841382; x=1807377382; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=aMvJA6/x1SVKg2QgTLThhOQRAKZphVT54hR7FyiJ+ds=; b=chbav92GAKoFK7f8WMehDhZGMqIq74xhJ5/ZW6Im2ICzH+dNrGFbUPpR fv9ZgiMiS06D86qh6nxUul/1LvRXUKR3vGOiSvNxHDPousu4TMp0McA2S bBpI+y57ZyRwGZl8XzqP0sCG51qeSBzEXZB4V888D1/UA9qz0HUjzIzqw y4Xk2eILuyootQWU+rO/KvPhRito06q4xSHoYx77jHtGgomumU82pxDWE Lb99FmwTh1MWueQ/4YdO2JiyBKPiOeN3jFzdnoMpXDTA+oWDO4KPg/WbW AWaS2D3o3YWML0fq0hGcGqx5sDBY+fLule00mtKddrJ13XeVD41Jf9V2g w==; X-CSE-ConnectionGUID: Rcqtvvw0QNScJXTOX9JO0A== X-CSE-MsgGUID: xCEotowQTpmSRKqJEm0E4w== X-IronPort-AV: E=McAfee;i="6800,10657,11755"; a="87565375" X-IronPort-AV: E=Sophos;i="6.23,171,1770624000"; d="scan'208";a="87565375" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2026 10:16:21 -0700 X-CSE-ConnectionGUID: DrW3pDgXTu6XBX8tkEk1AQ== X-CSE-MsgGUID: svfER3LyQMesIxr7PpocKw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,171,1770624000"; d="scan'208";a="226408480" Received: from dnelso2-mobl.amr.corp.intel.com (HELO [10.125.109.54]) ([10.125.109.54]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2026 10:16:20 -0700 Message-ID: <24bb834c-fb53-4b8a-b053-a18af2bb91cc@intel.com> Date: Fri, 10 Apr 2026 10:16:19 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] fwctl: Fix class init ordering to avoid NULL pointer dereference on device removal To: Richard Cheng Cc: jgg@ziepe.ca, saeedm@nvidia.com, Jonathan.Cameron@huawei.com, linux-kernel@vger.kernel.org, jan@nvidia.com, newtonl@nvidia.com, kristinc@nvidia.com, sreddym@nvidia.com, skomatineni@nvidia.com, vidyas@nvidia.com, kaihengf@nvidia.com, mochs@nvidia.com References: <20260409051902.40218-1-icheng@nvidia.com> <37f2f122-ec03-462c-82f4-e893f5260274@intel.com> Content-Language: en-US From: Dave Jiang In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 4/9/26 8:34 PM, Richard Cheng wrote: > On Thu, Apr 09, 2026 at 08:09:30AM +0800, Dave Jiang wrote: >> >> >> On 4/8/26 10:19 PM, Richard Cheng wrote: >>> CXL is linked before fwctl in drivers/Makefile. Both use `module_init, >>> so `cxl_pci_driver_init()` runs first. When `cxl_pci_probe()` calls >>> `fwctl_register()` and then `device_add()`, fwctl_class is not yet >>> registered because fwctl_init() hasn't run, causing `class_to_subsys()` >>> to return NULL and skip knode_class initialization. >>> On device removal, `class_to_subsys()` returns non-NULL, and >>> `device_del()` calls `klist_del()` on the uninitialized knode, >>> triggering a NULL pointer dereference [1]. >>> >>> Fixes: 858ce2f56b52 ("cxl: Add FWCTL support to CXL") >>> Signed-off-by: Richard Cheng >>> Reviewed-by: Kai-Heng Feng >> >> I've no objections. It makes sense to have fwctl init during subsys ahead of other drivers. Although do we need to handle the scenario where this happens for whatever reasons and it should be handled gracefully on device removal? >> >> Reviewed-by: Dave Jiang >> > > Hello Dave, > > Thanks for your review. > Regarding to your question, I'm not sure if I understand it correctly, but are you talking about handling the device removal better in device_del() ? > > IMHO, the root cause of the scenario is the wrong ordering of subsystem/device registration, the device does belongs to some subsystem but in device_add() it detects nothing first, > so klist_add_tail() wasn't executed, this is the root cause that causing the NULL dereference. > > Maybe something like the following inside device_add() would be better ? > ``` > sp = class_to_subsys(dev->class); > if (dev->class && !sp) { > pr_warn("device %s has an unregistered class %s\n", dev_name(dev), dev->class->name); > error = -ENODEV; > goto done; > } > ``` > > However, that's a driver-core change with broader discussion, I think we should make it in other pathes and discuss there, I would love to do the job if that's going to help. Yeah, I'm just not coming up with a good solution for the driver-core. I presume that the way it's setup to allow 'sp' to be NULL because there are scenarios where it is NULL? Not sure if just allowing the kernel to OOPs on uninitialized knode is fine because it's a programming error or if that should be handled more gracefully. DJ > This patch aims for the case of fwctl/CXL. > > Best regards > Richard Cheng. > >> >>> --- >>> [1]: >>> The error is triggered on with 7.0.0-rc6 kernel with CXL >>> device. >>> The PCI topology is as below >>> ``` >>> $ sudo lspci -tv >>> -[0001:00]---00.0-[01]--+-00.0 Mellanox Technologies CX8 Family [ConnectX-8] >>> +-00.1 Mellanox Technologies CX8 Family [ConnectX-8] >>> +-00.2 Mellanox Technologies CX8 Family [ConnectX-8] >>> \-00.3 Mellanox Technologies CX8 Family [ConnectX-8] >>> -[0002:00]---00.0-[01]-- >>> -+-[0003:00]---00.0-[01]----00.0 Montage Technology Co., Ltd. Device c002 >>> \-[0003:80]---00.0-[81]----00.0 Montage Technology Co., Ltd. Device c002 >>> -[0004:00]---00.0-[01]-- >>> -+-[0005:00]---00.0-[01]----00.0 Samsung Electronics Co Ltd Device a810 >>> +-[0005:40]---00.0-[41]----00.0 Samsung Electronics Co Ltd Device a810 >>> +-[0005:c0]---00.0-[c1]----00.0 Intel Corporation I210 Gigabit Network Connection >>> \-[0005:e0]---00.0-[e1-e2]----00.0-[e2]--+-00.0 ASPEED Technology, Inc. ASPEED Graphics Family >>> \-02.0 ASPEED Technology, Inc. Device 2603 >>> -+-[0006:00]---00.0-[01]-- >>> \-[0006:80]---00.0-[81]-- >>> ``` >>> The CXL device is on 0003:01:00.0 CXL [0502]: Montage Technology Co., >>> Ltd. Device [1b00:c002] (rev 03) and another one is 0003:81:00.0 CXL >>> [0502]: Montage Technology Co., Ltd. Device [1b00:c002] (rev 03). The >>> one we are targeting is 0003:01:00.0. >>> >>> The kernel should be built with CONFIG_FWCTL=y and >>> CONFIG_CXL_FEATURES=y, otherwise the bug won't be triggered. >>> >>> With `sudo setpci -v -s 0003:00:00.0 CAP_EXP+0x10.b=0x10:0x10` to bring >>> its root port link down and the error log in dmesg is like the following >>> ``` >>> [ 890.137377] pcieport 0003:00:00.0: pciehp: Slot(0-2): Link Down >>> [ 890.145201] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 >>> [ 890.145203] Mem abort info: >>> [ 890.145205] ESR = 0x0000000096000006 >>> [ 890.145207] EC = 0x25: DABT (current EL), IL = 32 bits >>> [ 890.145208] SET = 0, FnV = 0 >>> [ 890.145209] EA = 0, S1PTW = 0 >>> [ 890.145211] FSC = 0x06: level 2 translation fault >>> [ 890.145212] Data abort info: >>> [ 890.145213] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 >>> [ 890.145214] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 >>> [ 890.145215] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 >>> [ 890.145216] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001d1937000 >>> [ 890.145218] [0000000000000020] pgd=08000001d193e403, p4d=08000001d193e403, pud=08000001d193d403, pmd=0000000000000000 >>> [ 890.145223] Internal error: Oops: 0000000096000006 [#1] SMP >>> [ 890.214749] Modules linked in: nft_masq nft_ct nft_reject_ipv4 nf_reject_ipv4 nft_reject act_csum cls_u32 sch_htb nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bridge stp llc qrtr cfg80211 binfmt_misc nls_iso8859_1 ast cxl_pmu dax_hmem nvidia_cspmu acpi_power_meter coresight_trbe sbsa_gwdt ipmi_ssif acpi_ipmi cxl_acpi arm_smmuv3_pmu coresight arm_cspmu_module arm_spe_pmu ipmi_devintf ipmi_msghandler cxl_pmem cppc_cpufreq sch_fq_codel dm_multipath nvme_fabrics efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs libblake2b raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 linear mlx5_ib macsec ib_uverbs ib_core mlx5_dpll ghash_ce sm4_ce_gcm nvme sm4_ce_ccm mlx5_core nvme_core sm4_ce mlxfw nvme_keyring sm4_ce_cipher tls sm4 igb nvme_auth sm3_ce arm_smccc_trng i2c_algo_bit hkdf psample aes_neon_bs aes_neon_blk aes_ce_blk >>> [ 890.294573] CPU: 28 UID: 0 PID: 1350 Comm: irq/156-pciehp Not tainted 7.0.0-rc6-richard+ #3 PREEMPT(full) >>> [ 890.304108] Hardware name: , BIOS buildbrain-gcid-sbios-44706962 Mon Mar 30 03:04:08 PM UTC 2026 >>> [ 890.312959] pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) >>> [ 890.319660] pc : klist_put+0x2c/0x140 >>> [ 890.323441] lr : klist_del+0x18/0x38 >>> [ 890.327050] sp : ffff80008cfd39e0 >>> [ 890.330229] x29: ffff80008cfd39e0 x28: 0000000000000018 x27: ffffd9f0a1a9deb0 >>> [ 890.337275] x26: ffffd9f0a1a9a2b8 x25: ffff00009c3ebe78 x24: ffff00009c3ebe00 >>> [ 890.344321] x23: ffff00008d1cc800 x22: 0000000000000001 x21: ffff00009c3ebe68 >>> [ 890.351367] x20: 0000000000000000 x19: ffff00009afc7188 x18: ffff80008cfb50a8 >>> [ 890.358413] x17: 0000000000000000 x16: 0000000000000000 x15: 633d4d4554535953 >>> [ 890.365459] x14: 42555300302e306d x13: 5300302e306d656d x12: 5f756d702f302e30 >>> [ 890.372504] x11: 0035333033343d4d x10: 0000000000000000 x9 : ffffd9f0a31b9a5c >>> [ 890.379551] x8 : 0101010101010101 x7 : 7fff7f7f7f7f7f7f x6 : 339eff3033746f62 >>> [ 890.386597] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 >>> [ 890.393642] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000000 >>> [ 890.400689] Call trace: >>> [ 890.402923] klist_put+0x2c/0x140 (P) >>> [ 890.406703] klist_del+0x18/0x38 >>> [ 890.409968] device_del+0x120/0x3c8 >>> [ 890.413148] cdev_device_del+0x2c/0xa8 >>> [ 890.416928] fwctl_unregister+0x11c/0x128 >>> [ 890.420967] free_memdev_fwctl+0x24/0x50 >>> [ 890.424919] devm_action_release+0x20/0x48 >>> [ 890.428786] release_nodes+0x68/0xc8 >>> [ 890.432481] devres_release_all+0x9c/0x130 >>> [ 890.436348] device_unbind_cleanup+0x24/0xb0 >>> [ 890.440730] device_release_driver_internal+0x234/0x2e0 >>> [ 890.445627] device_release_driver+0x24/0x50 >>> [ 890.450096] pci_stop_bus_device+0x88/0x100 >>> [ 890.453962] pci_stop_and_remove_bus_device+0x24/0x58 >>> [ 890.459118] pciehp_unconfigure_device+0xb4/0x1e0 >>> [ 890.463758] pciehp_disable_slot+0x7c/0x190 >>> [ 890.467710] pciehp_handle_presence_or_link_change+0x94/0x518 >>> [ 890.473554] pciehp_ist+0x1c8/0x310 >>> [ 890.476819] irq_thread_fn+0x38/0xd0 >>> [ 890.480599] irq_thread+0x1ac/0x450 >>> [ 890.483779] kthread+0x13c/0x150 >>> [ 890.487215] ret_from_fork+0x10/0x20 >>> [ 890.490739] Code: 12001c36 f9400014 927ffa94 aa1403e0 (f9401295) >>> [ 890.496668] ---[ end trace 0000000000000000 ]--- >>> [ 890.552081] genirq: exiting task "irq/156-pciehp" (1350) is an active IRQ thread (irq 156) >>> ``` >>> >>> >>> Best regards, >>> Richard Cheng. >>> --- >>> drivers/fwctl/main.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c >>> index bc6378506296..098c3824ad75 100644 >>> --- a/drivers/fwctl/main.c >>> +++ b/drivers/fwctl/main.c >>> @@ -415,7 +415,7 @@ static void __exit fwctl_exit(void) >>> unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES); >>> } >>> >>> -module_init(fwctl_init); >>> +subsys_initcall(fwctl_init); >>> module_exit(fwctl_exit); >>> MODULE_DESCRIPTION("fwctl device firmware access framework"); >>> MODULE_LICENSE("GPL"); >>