* NULL pointer dereference in igen6_probe - 6.16-rc2 @ 2025-06-17 11:13 Marek Marczykowski-Górecki 2025-06-17 11:57 ` Borislav Petkov 2025-06-18 3:18 ` [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference Qiuxu Zhuo 0 siblings, 2 replies; 15+ messages in thread From: Marek Marczykowski-Górecki @ 2025-06-17 11:13 UTC (permalink / raw) To: Tony Luck, Borislav Petkov, Qiuxu Zhuo; +Cc: open list:EDAC-IGEN6, open list [-- Attachment #1: Type: text/plain, Size: 6808 bytes --] Hi, Environment: - Novacustom V540TU laptop with Intel Core 5 Ultra 125H - Dasharo firmware (coreboot+EDK2) - Linux running as Xen PV dom0 I hit the following crash on boot: [ 13.562085] intel_pmc_core INT33A1:00: Assuming a default substate order for this platform [ 13.562682] intel_pmc_core INT33A1:00: initialized [ 13.565035] EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT) [ 13.565746] EDAC igen6: Expected 2 mcs, but only 1 detected. [ 13.565859] BUG: unable to handle page fault for address: 000000000000d570 [ 13.566623] #PF: supervisor read access in kernel mode [ 13.566956] #PF: error_code(0x0000) - not-present page [ 13.567276] PGD 0 P4D 0 [ 13.567460] Oops: Oops: 0000 [#1] SMP NOPTI [ 13.567742] CPU: 8 UID: 0 PID: 1090 Comm: (udev-worker) Not tainted 6.16.0-0.rc2.1.qubes.1.fc41.x86_64 #1 PREEMPT(full) [ 13.568432] Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024 [ 13.569049] RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac] [ 13.569440] Code: 66 4d 63 ee 48 8b 15 21 c7 01 00 49 83 fd 03 73 6b 4d 69 ed 50 03 00 00 41 8b 47 1c 41 03 47 18 4c 01 ea 48 03 82 08 03 00 00 <48> 8b 30 4a 8d 04 26 48 39 c5 72 ba 48 8b 0d f7 c6 01 00 8b 41 1c [ 13.570602] RSP: e02b:ffffc900428979c8 EFLAGS: 00010202 [ 13.570951] RAX: 000000000000d570 RBX: 0000000000000000 RCX: 00000000000000ca [ 13.571403] RDX: ffff888101dcab50 RSI: ffffffffffffffff RDI: ffffffff83484238 [ 13.571895] RBP: bffffffffffffffe R08: 0000000000000002 R09: 00000000000000c0 [ 13.572358] R10: 0000000000000000 R11: ffffffff81612e60 R12: c000000000000000 [ 13.572820] R13: 0000000000000350 R14: 0000000000000001 R15: ffffffffc11b9c00 [ 13.573302] FS: 0000706cbfc6fbc0(0000) GS:ffff8882133db000(0000) knlGS:0000000000000000 [ 13.573812] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 13.574199] CR2: 000000000000d570 CR3: 0000000104a0a000 CR4: 0000000000050660 [ 13.574658] Call Trace: [ 13.574836] <TASK> [ 13.574985] igen6_probe+0x2a0/0x343 [igen6_edac] [ 13.575332] local_pci_probe+0x42/0x90 [ 13.575599] pci_call_probe+0x5b/0x180 [ 13.575863] pci_device_probe+0x95/0x140 [ 13.576133] ? driver_sysfs_add+0x57/0xc0 [ 13.576415] really_probe+0xdb/0x340 [ 13.576664] ? pm_runtime_barrier+0x54/0x90 [ 13.576940] ? __pfx___driver_attach+0x10/0x10 [ 13.577234] __driver_probe_device+0x78/0x110 [ 13.577569] driver_probe_device+0x1f/0xa0 [ 13.577833] __driver_attach+0xba/0x1c0 [ 13.578080] bus_for_each_dev+0x8b/0xe0 [ 13.578328] bus_add_driver+0x142/0x220 [ 13.578571] driver_register+0x72/0xd0 [ 13.578823] igen6_init+0xc5/0xff0 [igen6_edac] [ 13.579122] ? __pfx_igen6_init+0x10/0x10 [igen6_edac] [ 13.579479] do_one_initcall+0x57/0x310 [ 13.579503] do_init_module+0x90/0x250 [ 13.579969] init_module_from_file+0x88/0xd0 [ 13.579991] idempotent_init_module+0x114/0x310 [ 13.579997] __x64_sys_finit_module+0x6d/0xd0 [ 13.580773] do_syscall_64+0x84/0x2c0 [ 13.581011] ? count_memcg_events+0x167/0x1d0 [ 13.581314] ? handle_mm_fault+0x220/0x340 [ 13.581576] ? do_user_addr_fault+0x2c3/0x7f0 [ 13.581876] ? clear_bhb_loop+0x50/0xa0 [ 13.582125] ? clear_bhb_loop+0x50/0xa0 [ 13.582377] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 13.582724] RIP: 0033:0x706cc04ffd9d [ 13.582967] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 43 60 0f 00 f7 d8 64 89 01 48 [ 13.584097] RSP: 002b:00007ffceaf1b958 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 13.584595] RAX: ffffffffffffffda RBX: 00005aaee2a15d20 RCX: 0000706cc04ffd9d [ 13.585029] RDX: 0000000000000000 RSI: 0000706cbeff93bd RDI: 000000000000002b [ 13.585458] RBP: 00007ffceaf1ba10 R08: 0000000000000001 R09: 00007ffceaf1b9c0 [ 13.585885] R10: 0000000000000040 R11: 0000000000000246 R12: 0000706cbeff93bd [ 13.586316] R13: 0000000000020000 R14: 00005aaee2a85900 R15: 00005aaee2a84c30 [ 13.586753] </TASK> [ 13.586899] Modules linked in: processor_thermal_power_floor processor_thermal_mbox int340x_thermal_zone intel_pmc_core igen6_edac(+) fjes(-) pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_hid intel_scu_pltdrv sparse_keymap joydev fuse loop xenfs nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock zram vmw_vmci lz4hc_compress lz4_compress dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt xe drm_ttm_helper drm_suballoc_helper gpu_sched drm_gpuvm drm_exec drm_gpusvm i915 i2c_algo_bit sdhci_pci drm_buddy nvme sdhci_uhs2 ttm polyval_clmulni intel_pmc_mux nvme_core sdhci ghash_clmulni_intel drm_display_helper typec sha512_ssse3 cqhci nvme_keyring xhci_pci hid_multitouch sha1_ssse3 mmc_core xhci_hcd intel_vpu nvme_auth intel_vsec cec i2c_hid_acpi i2c_hid video thunderbolt wmi pinctrl_meteorlake serio_raw xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput [ 13.589346] Adding 3986428k swap on /dev/zram0. Priority:100 extents:1 across:3986428k SSDsc [ 13.592314] CR2: 000000000000d570 [ 13.592473] ---[ end trace 0000000000000000 ]--- [ 13.593400] RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac] [ 13.593831] Code: 66 4d 63 ee 48 8b 15 21 c7 01 00 49 83 fd 03 73 6b 4d 69 ed 50 03 00 00 41 8b 47 1c 41 03 47 18 4c 01 ea 48 03 82 08 03 00 00 <48> 8b 30 4a 8d 04 26 48 39 c5 72 ba 48 8b 0d f7 c6 01 00 8b 41 1c [ 13.595067] RSP: e02b:ffffc900428979c8 EFLAGS: 00010202 [ 13.595077] RAX: 000000000000d570 RBX: 0000000000000000 RCX: 00000000000000ca [ 13.595078] RDX: ffff888101dcab50 RSI: ffffffffffffffff RDI: ffffffff83484238 [ 13.595080] RBP: bffffffffffffffe R08: 0000000000000002 R09: 00000000000000c0 [ 13.595083] R10: 0000000000000000 R11: ffffffff81612e60 R12: c000000000000000 [ 13.595084] R13: 0000000000000350 R14: 0000000000000001 R15: ffffffffc11b9c00 [ 13.595100] FS: 0000706cbfc6fbc0(0000) GS:ffff8882133db000(0000) knlGS:0000000000000000 [ 13.598301] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 13.598308] CR2: 000000000000d570 CR3: 0000000104a0a000 CR4: 0000000000050660 [ 13.598319] Kernel panic - not syncing: Fatal exception [ 13.598384] Kernel Offset: disabled Full console log: https://openqa.qubes-os.org/tests/143433/logfile?filename=serial0.txt Other observations: - Linux 6.15 works fine - the same Linux 6.16-rc2 boots fine on several other systems, for example on Intel i5 14600K (also with Dasharo firmware) -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NULL pointer dereference in igen6_probe - 6.16-rc2 2025-06-17 11:13 NULL pointer dereference in igen6_probe - 6.16-rc2 Marek Marczykowski-Górecki @ 2025-06-17 11:57 ` Borislav Petkov 2025-06-17 14:09 ` Zhuo, Qiuxu 2025-06-18 3:18 ` [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference Qiuxu Zhuo 1 sibling, 1 reply; 15+ messages in thread From: Borislav Petkov @ 2025-06-17 11:57 UTC (permalink / raw) To: Tony Luck, Qiuxu Zhuo Cc: Marek Marczykowski-Górecki, open list:EDAC-IGEN6, open list On Tue, Jun 17, 2025 at 01:13:49PM +0200, Marek Marczykowski-Górecki wrote: > [ 13.562085] intel_pmc_core INT33A1:00: Assuming a default substate order for this platform > [ 13.562682] intel_pmc_core INT33A1:00: initialized > [ 13.565035] EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT) > [ 13.565746] EDAC igen6: Expected 2 mcs, but only 1 detected. Well, folks, if you've detected only one memory controller, then work with only one and do not kill the machine: diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c index 1930dc00c791..23e26ba2d49b 100644 --- a/drivers/edac/igen6_edac.c +++ b/drivers/edac/igen6_edac.c @@ -1350,9 +1350,11 @@ static int igen6_register_mcis(struct pci_dev *pdev, u64 mchbar) return -ENODEV; } - if (lmc < res_cfg->num_imc) + if (lmc < res_cfg->num_imc) { igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.", res_cfg->num_imc, lmc); + res_cfg->num_imc = lmc; + } return 0; --- but then that cfg struct is const :-\ drivers/edac/igen6_edac.c: In function ‘igen6_register_mcis’: drivers/edac/igen6_edac.c:1356:34: error: assignment of member ‘num_imc’ in read-only object 1356 | res_cfg->num_imc = lmc; | ^ Unless it is some gunky crap this coreboot does - then we will have to have a longer talk. :-P Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: NULL pointer dereference in igen6_probe - 6.16-rc2 2025-06-17 11:57 ` Borislav Petkov @ 2025-06-17 14:09 ` Zhuo, Qiuxu 2025-06-17 14:51 ` Borislav Petkov 0 siblings, 1 reply; 15+ messages in thread From: Zhuo, Qiuxu @ 2025-06-17 14:09 UTC (permalink / raw) To: Borislav Petkov, Luck, Tony Cc: Marek Marczykowski-Górecki, open list:EDAC-IGEN6, open list Hi Boris, > From: Borislav Petkov <bp@alien8.de> > [...] > > [ 13.565035] EDAC MC0: Giving out device to module igen6_edac controller > Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT) > > [ 13.565746] EDAC igen6: Expected 2 mcs, but only 1 detected. > > Well, folks, if you've detected only one memory controller, then work with > only one and do not kill the machine: > Yes. > diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c index > 1930dc00c791..23e26ba2d49b 100644 > --- a/drivers/edac/igen6_edac.c > +++ b/drivers/edac/igen6_edac.c > @@ -1350,9 +1350,11 @@ static int igen6_register_mcis(struct pci_dev *pdev, > u64 mchbar) > return -ENODEV; > } > > - if (lmc < res_cfg->num_imc) > + if (lmc < res_cfg->num_imc) { > igen6_printk(KERN_WARNING, "Expected %d mcs, but > only %d detected.", > res_cfg->num_imc, lmc); > + res_cfg->num_imc = lmc; > + } > > return 0; > > --- > > but then that cfg struct is const :-\ > > drivers/edac/igen6_edac.c: In function ‘igen6_register_mcis’: > drivers/edac/igen6_edac.c:1356:34: error: assignment of member ‘num_imc’ > in read-only object > 1356 | res_cfg->num_imc = lmc; > | ^ > > > Unless it is some gunky crap this coreboot does - then we will have to have a > longer talk. > > 😝 In the 10nm_edac driver for Intel Xeon server, 'cfg' is non-const, and the field 'cfg->ddr_imc_num' [1] is overwritten with the number of detected DDR memory controllers at runtime. Reverting 'cfg' in this igen6_edac driver to non-const, allowing it to be set with the actual number of detected memory controllers seems reasonable. After that then applying Boris' fix above is the simplest way to resolve the issue. 😊 [1] https://github.com/torvalds/linux/blob/master/drivers/edac/i10nm_base.c#L479 Thanks. -Qiuxu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NULL pointer dereference in igen6_probe - 6.16-rc2 2025-06-17 14:09 ` Zhuo, Qiuxu @ 2025-06-17 14:51 ` Borislav Petkov 2025-06-17 16:16 ` Zhuo, Qiuxu 0 siblings, 1 reply; 15+ messages in thread From: Borislav Petkov @ 2025-06-17 14:51 UTC (permalink / raw) To: Zhuo, Qiuxu Cc: Luck, Tony, Marek Marczykowski-Górecki, open list:EDAC-IGEN6, open list On Tue, Jun 17, 2025 at 02:09:42PM +0000, Zhuo, Qiuxu wrote: > In the 10nm_edac driver for Intel Xeon server, 'cfg' is non-const, and the field > 'cfg->ddr_imc_num' [1] is overwritten with the number of detected DDR memory > controllers at runtime. > > Reverting 'cfg' in this igen6_edac driver to non-const, allowing it to be set > with the actual number of detected memory controllers seems reasonable. Question is: is that something the driver should allow? Detecting more memory controllers but enabling less. How can that even happen? > After that then applying Boris' fix above is the simplest way to resolve the > issue. 😊 Right, just prepare a proper patch, please, so that Marek can test and confirm. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: NULL pointer dereference in igen6_probe - 6.16-rc2 2025-06-17 14:51 ` Borislav Petkov @ 2025-06-17 16:16 ` Zhuo, Qiuxu 2025-06-17 18:20 ` Borislav Petkov 0 siblings, 1 reply; 15+ messages in thread From: Zhuo, Qiuxu @ 2025-06-17 16:16 UTC (permalink / raw) To: Borislav Petkov Cc: Luck, Tony, Marek Marczykowski-Górecki, open list:EDAC-IGEN6, open list > From: Borislav Petkov <bp@alien8.de> > [...] > > Reverting 'cfg' in this igen6_edac driver to non-const, allowing it to > > be set with the actual number of detected memory controllers seems > reasonable. > > Question is: is that something the driver should allow? Detecting more In the igen6_edac driver, when notified of the memory errors, it checks all the memory controllers specified by 'cfg->num_imc' to identify the source of the error. Either checking if imc->window == NULL (indicating null MMIO for absent memory controllers) before each usage point, or updating 'cfg->num_imc' to reflect the real present memory controllers should fix the issue. The latter one is simpler. > memory controllers but enabling less. How can that even happen? > The maximum number of detected memory controllers is bounded by the macro NUM_IMC [1]. This value is what we know as the maximum possible value now. [1] https://github.com/torvalds/linux/blob/master/drivers/edac/igen6_edac.c#L1324 > > After that then applying Boris' fix above is the simplest way to > > resolve the issue. 😊 > > Right, just prepare a proper patch, please, so that Marek can test and confirm. > OK. I'll make a patch for Marek to test first. Thanks Boris. - Qiuxu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NULL pointer dereference in igen6_probe - 6.16-rc2 2025-06-17 16:16 ` Zhuo, Qiuxu @ 2025-06-17 18:20 ` Borislav Petkov 0 siblings, 0 replies; 15+ messages in thread From: Borislav Petkov @ 2025-06-17 18:20 UTC (permalink / raw) To: Zhuo, Qiuxu Cc: Luck, Tony, Marek Marczykowski-Górecki, open list:EDAC-IGEN6, open list On Tue, Jun 17, 2025 at 04:16:41PM +0000, Zhuo, Qiuxu wrote: > The maximum number of detected memory controllers is bounded > by the macro NUM_IMC [1]. This value is what we know as the maximum > possible value now. I don't think you're answering my question: Can this happen in real life and why was it added? if (lmc < res_cfg->num_imc) igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.", res_cfg->num_imc, lmc); /me does git archeology... Aha, I guess it can even with Intel and official fw and so on - not only coreboot: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers") > I'll make a patch for Marek to test first. Yes, that would be the right thing to do. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference 2025-06-17 11:13 NULL pointer dereference in igen6_probe - 6.16-rc2 Marek Marczykowski-Górecki 2025-06-17 11:57 ` Borislav Petkov @ 2025-06-18 3:18 ` Qiuxu Zhuo 2025-06-18 3:26 ` Zhuo, Qiuxu 2025-06-18 15:06 ` Luck, Tony 1 sibling, 2 replies; 15+ messages in thread From: Qiuxu Zhuo @ 2025-06-18 3:18 UTC (permalink / raw) To: Tony Luck, Borislav Petkov, marmarek Cc: Qiuxu Zhuo, James Morse, Mauro Carvalho Chehab, Robert Richter, linux-edac, linux-kernel A kernel panic was reported with the following kernel log: EDAC igen6: Expected 2 mcs, but only 1 detected. BUG: unable to handle page fault for address: 000000000000d570 ... Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024 RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac] ... igen6_probe+0x2a0/0x343 [igen6_edac] ... igen6_init+0xc5/0xff0 [igen6_edac] ... This issue occurred because one memory controller was fused off by the BIOS but the igen6_edac driver still checked all the memory controllers, including this absent one, to identify the source of the error. Accessing the null MMIO for the absent memory controller resulted in the oops above. Fix this issue by reverting the configuration structure to non-const and updating the field 'res_cfg->num_imc' to reflect the number of detected memory controllers. Fixes: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/ Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> --- drivers/edac/igen6_edac.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c index 1930dc00c791..1cb5c67e78ae 100644 --- a/drivers/edac/igen6_edac.c +++ b/drivers/edac/igen6_edac.c @@ -125,7 +125,7 @@ #define MEM_SLICE_HASH_MASK(v) (GET_BITFIELD(v, 6, 19) << 6) #define MEM_SLICE_HASH_LSB_MASK_BIT(v) GET_BITFIELD(v, 24, 26) -static const struct res_config { +static struct res_config { bool machine_check; /* The number of present memory controllers. */ int num_imc; @@ -479,7 +479,7 @@ static u64 rpl_p_err_addr(u64 ecclog) return ECC_ERROR_LOG_ADDR45(ecclog); } -static const struct res_config ehl_cfg = { +static struct res_config ehl_cfg = { .num_imc = 1, .imc_base = 0x5000, .ibecc_base = 0xdc00, @@ -489,7 +489,7 @@ static const struct res_config ehl_cfg = { .err_addr_to_imc_addr = ehl_err_addr_to_imc_addr, }; -static const struct res_config icl_cfg = { +static struct res_config icl_cfg = { .num_imc = 1, .imc_base = 0x5000, .ibecc_base = 0xd800, @@ -499,7 +499,7 @@ static const struct res_config icl_cfg = { .err_addr_to_imc_addr = ehl_err_addr_to_imc_addr, }; -static const struct res_config tgl_cfg = { +static struct res_config tgl_cfg = { .machine_check = true, .num_imc = 2, .imc_base = 0x5000, @@ -513,7 +513,7 @@ static const struct res_config tgl_cfg = { .err_addr_to_imc_addr = tgl_err_addr_to_imc_addr, }; -static const struct res_config adl_cfg = { +static struct res_config adl_cfg = { .machine_check = true, .num_imc = 2, .imc_base = 0xd800, @@ -524,7 +524,7 @@ static const struct res_config adl_cfg = { .err_addr_to_imc_addr = adl_err_addr_to_imc_addr, }; -static const struct res_config adl_n_cfg = { +static struct res_config adl_n_cfg = { .machine_check = true, .num_imc = 1, .imc_base = 0xd800, @@ -535,7 +535,7 @@ static const struct res_config adl_n_cfg = { .err_addr_to_imc_addr = adl_err_addr_to_imc_addr, }; -static const struct res_config rpl_p_cfg = { +static struct res_config rpl_p_cfg = { .machine_check = true, .num_imc = 2, .imc_base = 0xd800, @@ -547,7 +547,7 @@ static const struct res_config rpl_p_cfg = { .err_addr_to_imc_addr = adl_err_addr_to_imc_addr, }; -static const struct res_config mtl_ps_cfg = { +static struct res_config mtl_ps_cfg = { .machine_check = true, .num_imc = 2, .imc_base = 0xd800, @@ -558,7 +558,7 @@ static const struct res_config mtl_ps_cfg = { .err_addr_to_imc_addr = adl_err_addr_to_imc_addr, }; -static const struct res_config mtl_p_cfg = { +static struct res_config mtl_p_cfg = { .machine_check = true, .num_imc = 2, .imc_base = 0xd800, @@ -569,7 +569,7 @@ static const struct res_config mtl_p_cfg = { .err_addr_to_imc_addr = adl_err_addr_to_imc_addr, }; -static const struct pci_device_id igen6_pci_tbl[] = { +static struct pci_device_id igen6_pci_tbl[] = { { PCI_VDEVICE(INTEL, DID_EHL_SKU5), (kernel_ulong_t)&ehl_cfg }, { PCI_VDEVICE(INTEL, DID_EHL_SKU6), (kernel_ulong_t)&ehl_cfg }, { PCI_VDEVICE(INTEL, DID_EHL_SKU7), (kernel_ulong_t)&ehl_cfg }, @@ -1350,9 +1350,11 @@ static int igen6_register_mcis(struct pci_dev *pdev, u64 mchbar) return -ENODEV; } - if (lmc < res_cfg->num_imc) + if (lmc < res_cfg->num_imc) { igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.", res_cfg->num_imc, lmc); + res_cfg->num_imc = lmc; + } return 0; base-commit: e04c78d86a9699d136910cfc0bdcf01087e3267e -- 2.43.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference 2025-06-18 3:18 ` [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference Qiuxu Zhuo @ 2025-06-18 3:26 ` Zhuo, Qiuxu 2025-06-18 13:23 ` marmarek 2025-06-18 15:06 ` Luck, Tony 1 sibling, 1 reply; 15+ messages in thread From: Zhuo, Qiuxu @ 2025-06-18 3:26 UTC (permalink / raw) To: marmarek@invisiblethingslab.com Cc: Borislav Petkov, Luck, Tony, James Morse, Mauro Carvalho Chehab, Robert Richter, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org Hi Marek, > From: Zhuo, Qiuxu <qiuxu.zhuo@intel.com> > [...] > Subject: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference Thank you for reporting this issue. Could you please test this patch on your machine to verify if it fixes the issue? Thanks! -Qiuxu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference 2025-06-18 3:26 ` Zhuo, Qiuxu @ 2025-06-18 13:23 ` marmarek 2025-06-18 13:39 ` Zhuo, Qiuxu 0 siblings, 1 reply; 15+ messages in thread From: marmarek @ 2025-06-18 13:23 UTC (permalink / raw) To: Zhuo, Qiuxu Cc: Borislav Petkov, Luck, Tony, James Morse, Mauro Carvalho Chehab, Robert Richter, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 620 bytes --] On Wed, Jun 18, 2025 at 03:26:43AM +0000, Zhuo, Qiuxu wrote: > Hi Marek, > > > From: Zhuo, Qiuxu <qiuxu.zhuo@intel.com> > > [...] > > Subject: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference > > Thank you for reporting this issue. > Could you please test this patch on your machine to verify if it fixes the issue? I can confirm it works now, I have the "EDAC igen6: Expected 2 mcs, but only 1 detected" message and it doesn't crash anymore. Thanks! Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference 2025-06-18 13:23 ` marmarek @ 2025-06-18 13:39 ` Zhuo, Qiuxu 0 siblings, 0 replies; 15+ messages in thread From: Zhuo, Qiuxu @ 2025-06-18 13:39 UTC (permalink / raw) To: marmarek@invisiblethingslab.com Cc: Borislav Petkov, Luck, Tony, James Morse, Mauro Carvalho Chehab, Robert Richter, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org > From: marmarek@invisiblethingslab.com > [...] > > Could you please test this patch on your machine to verify if it fixes the > issue? > > I can confirm it works now, I have the "EDAC igen6: Expected 2 mcs, but only > 1 detected" message and it doesn't crash anymore. Thanks! > > Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Thank you! -Qiuxu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference 2025-06-18 3:18 ` [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference Qiuxu Zhuo 2025-06-18 3:26 ` Zhuo, Qiuxu @ 2025-06-18 15:06 ` Luck, Tony 2025-06-18 15:42 ` Zhuo, Qiuxu 2025-06-18 16:23 ` [PATCH v2 1/2] " Qiuxu Zhuo 1 sibling, 2 replies; 15+ messages in thread From: Luck, Tony @ 2025-06-18 15:06 UTC (permalink / raw) To: Qiuxu Zhuo Cc: Borislav Petkov, marmarek, James Morse, Mauro Carvalho Chehab, Robert Richter, linux-edac, linux-kernel On Wed, Jun 18, 2025 at 11:18:55AM +0800, Qiuxu Zhuo wrote: > A kernel panic was reported with the following kernel log: > > EDAC igen6: Expected 2 mcs, but only 1 detected. > BUG: unable to handle page fault for address: 000000000000d570 > ... > Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024 > RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac] > ... > igen6_probe+0x2a0/0x343 [igen6_edac] > ... > igen6_init+0xc5/0xff0 [igen6_edac] > ... > > This issue occurred because one memory controller was fused off by Maybe "disabled by BIOS" rather than "fused off by BIOS". > the BIOS but the igen6_edac driver still checked all the memory > controllers, including this absent one, to identify the source of > the error. Accessing the null MMIO for the absent memory controller > resulted in the oops above. > > Fix this issue by reverting the configuration structure to non-const > and updating the field 'res_cfg->num_imc' to reflect the number of > detected memory controllers. > > Fixes: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers") > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/ > Suggested-by: Borislav Petkov <bp@alien8.de> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> [snip] > @@ -1350,9 +1350,11 @@ static int igen6_register_mcis(struct pci_dev *pdev, u64 mchbar) > return -ENODEV; > } > > - if (lmc < res_cfg->num_imc) > + if (lmc < res_cfg->num_imc) { > igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.", > res_cfg->num_imc, lmc); KERN_WARNING seems overly dramatic. BIOS likely had good reasons to disable the memory controller (e.g. it isn't connected to any DIMM slots on the motherboard for this system). So there's nothing actually wrong that needs to be fixed. KERN_INFO is enough. Perhaps KERN_DEBUG? -Tony ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference 2025-06-18 15:06 ` Luck, Tony @ 2025-06-18 15:42 ` Zhuo, Qiuxu 2025-06-18 16:23 ` [PATCH v2 1/2] " Qiuxu Zhuo 1 sibling, 0 replies; 15+ messages in thread From: Zhuo, Qiuxu @ 2025-06-18 15:42 UTC (permalink / raw) To: Luck, Tony Cc: Borislav Petkov, marmarek@invisiblethingslab.com, James Morse, Mauro Carvalho Chehab, Robert Richter, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org Hi Tony, > From: Luck, Tony <tony.luck@intel.com> > [...] > > > > This issue occurred because one memory controller was fused off by > > Maybe "disabled by BIOS" rather than "fused off by BIOS". The phrase "disabled by BIOS" should be more appropriate. Will update it in v2. Thanks. > > the BIOS but the igen6_edac driver still checked all the memory > > controllers, including this absent one, to identify the source of the > [...] > > > > - if (lmc < res_cfg->num_imc) > > + if (lmc < res_cfg->num_imc) { > > igen6_printk(KERN_WARNING, "Expected %d mcs, but > only %d detected.", > > res_cfg->num_imc, lmc); > > KERN_WARNING seems overly dramatic. BIOS likely had good reasons to > disable the memory controller (e.g. it isn't connected to any DIMM slots on > the motherboard for this system). So there's nothing actually wrong that > needs to be fixed. Yes. That's true. > KERN_INFO is enough. Perhaps KERN_DEBUG? Will change the log level to "KERN_DEBUG" in v2 to reduce noise. - Qiuxu ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/2] EDAC/igen6: Fix NULL pointer dereference 2025-06-18 15:06 ` Luck, Tony 2025-06-18 15:42 ` Zhuo, Qiuxu @ 2025-06-18 16:23 ` Qiuxu Zhuo 2025-06-18 16:23 ` [PATCH v2 2/2] EDAC/igen6: Reduce log level to debug for absent memory controllers Qiuxu Zhuo 2025-06-18 17:46 ` [PATCH v2 1/2] EDAC/igen6: Fix NULL pointer dereference Luck, Tony 1 sibling, 2 replies; 15+ messages in thread From: Qiuxu Zhuo @ 2025-06-18 16:23 UTC (permalink / raw) To: tony.luck, bp Cc: james.morse, linux-edac, linux-kernel, marmarek, mchehab, qiuxu.zhuo, rric A kernel panic was reported with the following kernel log: EDAC igen6: Expected 2 mcs, but only 1 detected. BUG: unable to handle page fault for address: 000000000000d570 ... Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024 RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac] ... igen6_probe+0x2a0/0x343 [igen6_edac] ... igen6_init+0xc5/0xff0 [igen6_edac] ... This issue occurred because one memory controller was disabled by the BIOS but the igen6_edac driver still checked all the memory controllers, including this absent one, to identify the source of the error. Accessing the null MMIO for the absent memory controller resulted in the oops above. Fix this issue by reverting the configuration structure to non-const and updating the field 'res_cfg->num_imc' to reflect the number of detected memory controllers. Fixes: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/ Suggested-by: Borislav Petkov <bp@alien8.de> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> --- v1->v2: - Add "Tested-by" tag from Marek. - s/fused off/disabled/ in the commit message, as suggested by Tony. drivers/edac/igen6_edac.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c index 1930dc00c791..1cb5c67e78ae 100644 --- a/drivers/edac/igen6_edac.c +++ b/drivers/edac/igen6_edac.c @@ -125,7 +125,7 @@ #define MEM_SLICE_HASH_MASK(v) (GET_BITFIELD(v, 6, 19) << 6) #define MEM_SLICE_HASH_LSB_MASK_BIT(v) GET_BITFIELD(v, 24, 26) -static const struct res_config { +static struct res_config { bool machine_check; /* The number of present memory controllers. */ int num_imc; @@ -479,7 +479,7 @@ static u64 rpl_p_err_addr(u64 ecclog) return ECC_ERROR_LOG_ADDR45(ecclog); } -static const struct res_config ehl_cfg = { +static struct res_config ehl_cfg = { .num_imc = 1, .imc_base = 0x5000, .ibecc_base = 0xdc00, @@ -489,7 +489,7 @@ static const struct res_config ehl_cfg = { .err_addr_to_imc_addr = ehl_err_addr_to_imc_addr, }; -static const struct res_config icl_cfg = { +static struct res_config icl_cfg = { .num_imc = 1, .imc_base = 0x5000, .ibecc_base = 0xd800, @@ -499,7 +499,7 @@ static const struct res_config icl_cfg = { .err_addr_to_imc_addr = ehl_err_addr_to_imc_addr, }; -static const struct res_config tgl_cfg = { +static struct res_config tgl_cfg = { .machine_check = true, .num_imc = 2, .imc_base = 0x5000, @@ -513,7 +513,7 @@ static const struct res_config tgl_cfg = { .err_addr_to_imc_addr = tgl_err_addr_to_imc_addr, }; -static const struct res_config adl_cfg = { +static struct res_config adl_cfg = { .machine_check = true, .num_imc = 2, .imc_base = 0xd800, @@ -524,7 +524,7 @@ static const struct res_config adl_cfg = { .err_addr_to_imc_addr = adl_err_addr_to_imc_addr, }; -static const struct res_config adl_n_cfg = { +static struct res_config adl_n_cfg = { .machine_check = true, .num_imc = 1, .imc_base = 0xd800, @@ -535,7 +535,7 @@ static const struct res_config adl_n_cfg = { .err_addr_to_imc_addr = adl_err_addr_to_imc_addr, }; -static const struct res_config rpl_p_cfg = { +static struct res_config rpl_p_cfg = { .machine_check = true, .num_imc = 2, .imc_base = 0xd800, @@ -547,7 +547,7 @@ static const struct res_config rpl_p_cfg = { .err_addr_to_imc_addr = adl_err_addr_to_imc_addr, }; -static const struct res_config mtl_ps_cfg = { +static struct res_config mtl_ps_cfg = { .machine_check = true, .num_imc = 2, .imc_base = 0xd800, @@ -558,7 +558,7 @@ static const struct res_config mtl_ps_cfg = { .err_addr_to_imc_addr = adl_err_addr_to_imc_addr, }; -static const struct res_config mtl_p_cfg = { +static struct res_config mtl_p_cfg = { .machine_check = true, .num_imc = 2, .imc_base = 0xd800, @@ -569,7 +569,7 @@ static const struct res_config mtl_p_cfg = { .err_addr_to_imc_addr = adl_err_addr_to_imc_addr, }; -static const struct pci_device_id igen6_pci_tbl[] = { +static struct pci_device_id igen6_pci_tbl[] = { { PCI_VDEVICE(INTEL, DID_EHL_SKU5), (kernel_ulong_t)&ehl_cfg }, { PCI_VDEVICE(INTEL, DID_EHL_SKU6), (kernel_ulong_t)&ehl_cfg }, { PCI_VDEVICE(INTEL, DID_EHL_SKU7), (kernel_ulong_t)&ehl_cfg }, @@ -1350,9 +1350,11 @@ static int igen6_register_mcis(struct pci_dev *pdev, u64 mchbar) return -ENODEV; } - if (lmc < res_cfg->num_imc) + if (lmc < res_cfg->num_imc) { igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.", res_cfg->num_imc, lmc); + res_cfg->num_imc = lmc; + } return 0; base-commit: e04c78d86a9699d136910cfc0bdcf01087e3267e -- 2.43.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 2/2] EDAC/igen6: Reduce log level to debug for absent memory controllers 2025-06-18 16:23 ` [PATCH v2 1/2] " Qiuxu Zhuo @ 2025-06-18 16:23 ` Qiuxu Zhuo 2025-06-18 17:46 ` [PATCH v2 1/2] EDAC/igen6: Fix NULL pointer dereference Luck, Tony 1 sibling, 0 replies; 15+ messages in thread From: Qiuxu Zhuo @ 2025-06-18 16:23 UTC (permalink / raw) To: tony.luck, bp Cc: james.morse, linux-edac, linux-kernel, marmarek, mchehab, qiuxu.zhuo, rric The current KERN_WARNING level message for detecting absent memory controllers is overly dramatic. The BIOS likely had valid reasons to disable the memory controller (e.g. it isn't connected to any DIMM slots on the motherboard for this system). So there's nothing actually wrong that needs to be fixed. Reduce the log level to KERN_DEBUG to eliminate the false warning. Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> --- drivers/edac/igen6_edac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c index 1cb5c67e78ae..5ffe9579959f 100644 --- a/drivers/edac/igen6_edac.c +++ b/drivers/edac/igen6_edac.c @@ -1351,7 +1351,7 @@ static int igen6_register_mcis(struct pci_dev *pdev, u64 mchbar) } if (lmc < res_cfg->num_imc) { - igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.", + igen6_printk(KERN_DEBUG, "Expected %d mcs, but only %d detected.", res_cfg->num_imc, lmc); res_cfg->num_imc = lmc; } -- 2.43.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: [PATCH v2 1/2] EDAC/igen6: Fix NULL pointer dereference 2025-06-18 16:23 ` [PATCH v2 1/2] " Qiuxu Zhuo 2025-06-18 16:23 ` [PATCH v2 2/2] EDAC/igen6: Reduce log level to debug for absent memory controllers Qiuxu Zhuo @ 2025-06-18 17:46 ` Luck, Tony 1 sibling, 0 replies; 15+ messages in thread From: Luck, Tony @ 2025-06-18 17:46 UTC (permalink / raw) To: Zhuo, Qiuxu, bp@alien8.de Cc: james.morse@arm.com, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, marmarek@invisiblethingslab.com, mchehab@kernel.org, rric@kernel.org > This issue occurred because one memory controller was disabled by > the BIOS but the igen6_edac driver still checked all the memory > controllers, including this absent one, to identify the source of > the error. Accessing the null MMIO for the absent memory controller > resulted in the oops above. > > Fix this issue by reverting the configuration structure to non-const > and updating the field 'res_cfg->num_imc' to reflect the number of > detected memory controllers. > > Fixes: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers") > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/ > Suggested-by: Borislav Petkov <bp@alien8.de> > Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Applied (both this and patch 2/2) to RAS edac-drivers branch. Thanks -Tony ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-06-18 17:46 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-17 11:13 NULL pointer dereference in igen6_probe - 6.16-rc2 Marek Marczykowski-Górecki 2025-06-17 11:57 ` Borislav Petkov 2025-06-17 14:09 ` Zhuo, Qiuxu 2025-06-17 14:51 ` Borislav Petkov 2025-06-17 16:16 ` Zhuo, Qiuxu 2025-06-17 18:20 ` Borislav Petkov 2025-06-18 3:18 ` [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference Qiuxu Zhuo 2025-06-18 3:26 ` Zhuo, Qiuxu 2025-06-18 13:23 ` marmarek 2025-06-18 13:39 ` Zhuo, Qiuxu 2025-06-18 15:06 ` Luck, Tony 2025-06-18 15:42 ` Zhuo, Qiuxu 2025-06-18 16:23 ` [PATCH v2 1/2] " Qiuxu Zhuo 2025-06-18 16:23 ` [PATCH v2 2/2] EDAC/igen6: Reduce log level to debug for absent memory controllers Qiuxu Zhuo 2025-06-18 17:46 ` [PATCH v2 1/2] EDAC/igen6: Fix NULL pointer dereference Luck, Tony
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).