* NULL pointer dereference in igen6_probe - 6.16-rc2
@ 2025-06-17 11:13 Marek Marczykowski-Górecki
2025-06-17 11:57 ` Borislav Petkov
2025-06-18 3:18 ` [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference Qiuxu Zhuo
0 siblings, 2 replies; 15+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-06-17 11:13 UTC (permalink / raw)
To: Tony Luck, Borislav Petkov, Qiuxu Zhuo; +Cc: open list:EDAC-IGEN6, open list
[-- Attachment #1: Type: text/plain, Size: 6808 bytes --]
Hi,
Environment:
- Novacustom V540TU laptop with Intel Core 5 Ultra 125H
- Dasharo firmware (coreboot+EDK2)
- Linux running as Xen PV dom0
I hit the following crash on boot:
[ 13.562085] intel_pmc_core INT33A1:00: Assuming a default substate order for this platform
[ 13.562682] intel_pmc_core INT33A1:00: initialized
[ 13.565035] EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
[ 13.565746] EDAC igen6: Expected 2 mcs, but only 1 detected.
[ 13.565859] BUG: unable to handle page fault for address: 000000000000d570
[ 13.566623] #PF: supervisor read access in kernel mode
[ 13.566956] #PF: error_code(0x0000) - not-present page
[ 13.567276] PGD 0 P4D 0
[ 13.567460] Oops: Oops: 0000 [#1] SMP NOPTI
[ 13.567742] CPU: 8 UID: 0 PID: 1090 Comm: (udev-worker) Not tainted 6.16.0-0.rc2.1.qubes.1.fc41.x86_64 #1 PREEMPT(full)
[ 13.568432] Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024
[ 13.569049] RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac]
[ 13.569440] Code: 66 4d 63 ee 48 8b 15 21 c7 01 00 49 83 fd 03 73 6b 4d 69 ed 50 03 00 00 41 8b 47 1c 41 03 47 18 4c 01 ea 48 03 82 08 03 00 00 <48> 8b 30 4a 8d 04 26 48 39 c5 72 ba 48 8b 0d f7 c6 01 00 8b 41 1c
[ 13.570602] RSP: e02b:ffffc900428979c8 EFLAGS: 00010202
[ 13.570951] RAX: 000000000000d570 RBX: 0000000000000000 RCX: 00000000000000ca
[ 13.571403] RDX: ffff888101dcab50 RSI: ffffffffffffffff RDI: ffffffff83484238
[ 13.571895] RBP: bffffffffffffffe R08: 0000000000000002 R09: 00000000000000c0
[ 13.572358] R10: 0000000000000000 R11: ffffffff81612e60 R12: c000000000000000
[ 13.572820] R13: 0000000000000350 R14: 0000000000000001 R15: ffffffffc11b9c00
[ 13.573302] FS: 0000706cbfc6fbc0(0000) GS:ffff8882133db000(0000) knlGS:0000000000000000
[ 13.573812] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 13.574199] CR2: 000000000000d570 CR3: 0000000104a0a000 CR4: 0000000000050660
[ 13.574658] Call Trace:
[ 13.574836] <TASK>
[ 13.574985] igen6_probe+0x2a0/0x343 [igen6_edac]
[ 13.575332] local_pci_probe+0x42/0x90
[ 13.575599] pci_call_probe+0x5b/0x180
[ 13.575863] pci_device_probe+0x95/0x140
[ 13.576133] ? driver_sysfs_add+0x57/0xc0
[ 13.576415] really_probe+0xdb/0x340
[ 13.576664] ? pm_runtime_barrier+0x54/0x90
[ 13.576940] ? __pfx___driver_attach+0x10/0x10
[ 13.577234] __driver_probe_device+0x78/0x110
[ 13.577569] driver_probe_device+0x1f/0xa0
[ 13.577833] __driver_attach+0xba/0x1c0
[ 13.578080] bus_for_each_dev+0x8b/0xe0
[ 13.578328] bus_add_driver+0x142/0x220
[ 13.578571] driver_register+0x72/0xd0
[ 13.578823] igen6_init+0xc5/0xff0 [igen6_edac]
[ 13.579122] ? __pfx_igen6_init+0x10/0x10 [igen6_edac]
[ 13.579479] do_one_initcall+0x57/0x310
[ 13.579503] do_init_module+0x90/0x250
[ 13.579969] init_module_from_file+0x88/0xd0
[ 13.579991] idempotent_init_module+0x114/0x310
[ 13.579997] __x64_sys_finit_module+0x6d/0xd0
[ 13.580773] do_syscall_64+0x84/0x2c0
[ 13.581011] ? count_memcg_events+0x167/0x1d0
[ 13.581314] ? handle_mm_fault+0x220/0x340
[ 13.581576] ? do_user_addr_fault+0x2c3/0x7f0
[ 13.581876] ? clear_bhb_loop+0x50/0xa0
[ 13.582125] ? clear_bhb_loop+0x50/0xa0
[ 13.582377] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 13.582724] RIP: 0033:0x706cc04ffd9d
[ 13.582967] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 43 60 0f 00 f7 d8 64 89 01 48
[ 13.584097] RSP: 002b:00007ffceaf1b958 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 13.584595] RAX: ffffffffffffffda RBX: 00005aaee2a15d20 RCX: 0000706cc04ffd9d
[ 13.585029] RDX: 0000000000000000 RSI: 0000706cbeff93bd RDI: 000000000000002b
[ 13.585458] RBP: 00007ffceaf1ba10 R08: 0000000000000001 R09: 00007ffceaf1b9c0
[ 13.585885] R10: 0000000000000040 R11: 0000000000000246 R12: 0000706cbeff93bd
[ 13.586316] R13: 0000000000020000 R14: 00005aaee2a85900 R15: 00005aaee2a84c30
[ 13.586753] </TASK>
[ 13.586899] Modules linked in: processor_thermal_power_floor processor_thermal_mbox int340x_thermal_zone intel_pmc_core igen6_edac(+) fjes(-) pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_hid intel_scu_pltdrv sparse_keymap joydev fuse loop xenfs nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock zram vmw_vmci lz4hc_compress lz4_compress dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt xe drm_ttm_helper drm_suballoc_helper gpu_sched drm_gpuvm drm_exec drm_gpusvm i915 i2c_algo_bit sdhci_pci drm_buddy nvme sdhci_uhs2 ttm polyval_clmulni intel_pmc_mux nvme_core sdhci ghash_clmulni_intel drm_display_helper typec sha512_ssse3 cqhci nvme_keyring xhci_pci hid_multitouch sha1_ssse3 mmc_core xhci_hcd intel_vpu nvme_auth intel_vsec cec i2c_hid_acpi i2c_hid video thunderbolt wmi pinctrl_meteorlake serio_raw xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput
[ 13.589346] Adding 3986428k swap on /dev/zram0. Priority:100 extents:1 across:3986428k SSDsc
[ 13.592314] CR2: 000000000000d570
[ 13.592473] ---[ end trace 0000000000000000 ]---
[ 13.593400] RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac]
[ 13.593831] Code: 66 4d 63 ee 48 8b 15 21 c7 01 00 49 83 fd 03 73 6b 4d 69 ed 50 03 00 00 41 8b 47 1c 41 03 47 18 4c 01 ea 48 03 82 08 03 00 00 <48> 8b 30 4a 8d 04 26 48 39 c5 72 ba 48 8b 0d f7 c6 01 00 8b 41 1c
[ 13.595067] RSP: e02b:ffffc900428979c8 EFLAGS: 00010202
[ 13.595077] RAX: 000000000000d570 RBX: 0000000000000000 RCX: 00000000000000ca
[ 13.595078] RDX: ffff888101dcab50 RSI: ffffffffffffffff RDI: ffffffff83484238
[ 13.595080] RBP: bffffffffffffffe R08: 0000000000000002 R09: 00000000000000c0
[ 13.595083] R10: 0000000000000000 R11: ffffffff81612e60 R12: c000000000000000
[ 13.595084] R13: 0000000000000350 R14: 0000000000000001 R15: ffffffffc11b9c00
[ 13.595100] FS: 0000706cbfc6fbc0(0000) GS:ffff8882133db000(0000) knlGS:0000000000000000
[ 13.598301] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 13.598308] CR2: 000000000000d570 CR3: 0000000104a0a000 CR4: 0000000000050660
[ 13.598319] Kernel panic - not syncing: Fatal exception
[ 13.598384] Kernel Offset: disabled
Full console log: https://openqa.qubes-os.org/tests/143433/logfile?filename=serial0.txt
Other observations:
- Linux 6.15 works fine
- the same Linux 6.16-rc2 boots fine on several other systems, for
example on Intel i5 14600K (also with Dasharo firmware)
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NULL pointer dereference in igen6_probe - 6.16-rc2
2025-06-17 11:13 NULL pointer dereference in igen6_probe - 6.16-rc2 Marek Marczykowski-Górecki
@ 2025-06-17 11:57 ` Borislav Petkov
2025-06-17 14:09 ` Zhuo, Qiuxu
2025-06-18 3:18 ` [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference Qiuxu Zhuo
1 sibling, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2025-06-17 11:57 UTC (permalink / raw)
To: Tony Luck, Qiuxu Zhuo
Cc: Marek Marczykowski-Górecki, open list:EDAC-IGEN6, open list
On Tue, Jun 17, 2025 at 01:13:49PM +0200, Marek Marczykowski-Górecki wrote:
> [ 13.562085] intel_pmc_core INT33A1:00: Assuming a default substate order for this platform
> [ 13.562682] intel_pmc_core INT33A1:00: initialized
> [ 13.565035] EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
> [ 13.565746] EDAC igen6: Expected 2 mcs, but only 1 detected.
Well, folks, if you've detected only one memory controller, then work with
only one and do not kill the machine:
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 1930dc00c791..23e26ba2d49b 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1350,9 +1350,11 @@ static int igen6_register_mcis(struct pci_dev *pdev, u64 mchbar)
return -ENODEV;
}
- if (lmc < res_cfg->num_imc)
+ if (lmc < res_cfg->num_imc) {
igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.",
res_cfg->num_imc, lmc);
+ res_cfg->num_imc = lmc;
+ }
return 0;
---
but then that cfg struct is const :-\
drivers/edac/igen6_edac.c: In function ‘igen6_register_mcis’:
drivers/edac/igen6_edac.c:1356:34: error: assignment of member ‘num_imc’ in read-only object
1356 | res_cfg->num_imc = lmc;
| ^
Unless it is some gunky crap this coreboot does - then we will have to have
a longer talk.
:-P
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: NULL pointer dereference in igen6_probe - 6.16-rc2
2025-06-17 11:57 ` Borislav Petkov
@ 2025-06-17 14:09 ` Zhuo, Qiuxu
2025-06-17 14:51 ` Borislav Petkov
0 siblings, 1 reply; 15+ messages in thread
From: Zhuo, Qiuxu @ 2025-06-17 14:09 UTC (permalink / raw)
To: Borislav Petkov, Luck, Tony
Cc: Marek Marczykowski-Górecki, open list:EDAC-IGEN6, open list
Hi Boris,
> From: Borislav Petkov <bp@alien8.de>
> [...]
> > [ 13.565035] EDAC MC0: Giving out device to module igen6_edac controller
> Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
> > [ 13.565746] EDAC igen6: Expected 2 mcs, but only 1 detected.
>
> Well, folks, if you've detected only one memory controller, then work with
> only one and do not kill the machine:
>
Yes.
> diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c index
> 1930dc00c791..23e26ba2d49b 100644
> --- a/drivers/edac/igen6_edac.c
> +++ b/drivers/edac/igen6_edac.c
> @@ -1350,9 +1350,11 @@ static int igen6_register_mcis(struct pci_dev *pdev,
> u64 mchbar)
> return -ENODEV;
> }
>
> - if (lmc < res_cfg->num_imc)
> + if (lmc < res_cfg->num_imc) {
> igen6_printk(KERN_WARNING, "Expected %d mcs, but
> only %d detected.",
> res_cfg->num_imc, lmc);
> + res_cfg->num_imc = lmc;
> + }
>
> return 0;
>
> ---
>
> but then that cfg struct is const :-\
>
> drivers/edac/igen6_edac.c: In function ‘igen6_register_mcis’:
> drivers/edac/igen6_edac.c:1356:34: error: assignment of member ‘num_imc’
> in read-only object
> 1356 | res_cfg->num_imc = lmc;
> | ^
>
>
> Unless it is some gunky crap this coreboot does - then we will have to have a
> longer talk.
>
> 😝
In the 10nm_edac driver for Intel Xeon server, 'cfg' is non-const, and the field
'cfg->ddr_imc_num' [1] is overwritten with the number of detected DDR memory
controllers at runtime.
Reverting 'cfg' in this igen6_edac driver to non-const, allowing it to be set
with the actual number of detected memory controllers seems reasonable.
After that then applying Boris' fix above is the simplest way to resolve the
issue. 😊
[1] https://github.com/torvalds/linux/blob/master/drivers/edac/i10nm_base.c#L479
Thanks.
-Qiuxu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NULL pointer dereference in igen6_probe - 6.16-rc2
2025-06-17 14:09 ` Zhuo, Qiuxu
@ 2025-06-17 14:51 ` Borislav Petkov
2025-06-17 16:16 ` Zhuo, Qiuxu
0 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2025-06-17 14:51 UTC (permalink / raw)
To: Zhuo, Qiuxu
Cc: Luck, Tony, Marek Marczykowski-Górecki, open list:EDAC-IGEN6,
open list
On Tue, Jun 17, 2025 at 02:09:42PM +0000, Zhuo, Qiuxu wrote:
> In the 10nm_edac driver for Intel Xeon server, 'cfg' is non-const, and the field
> 'cfg->ddr_imc_num' [1] is overwritten with the number of detected DDR memory
> controllers at runtime.
>
> Reverting 'cfg' in this igen6_edac driver to non-const, allowing it to be set
> with the actual number of detected memory controllers seems reasonable.
Question is: is that something the driver should allow? Detecting more memory
controllers but enabling less. How can that even happen?
> After that then applying Boris' fix above is the simplest way to resolve the
> issue. 😊
Right, just prepare a proper patch, please, so that Marek can test and
confirm.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: NULL pointer dereference in igen6_probe - 6.16-rc2
2025-06-17 14:51 ` Borislav Petkov
@ 2025-06-17 16:16 ` Zhuo, Qiuxu
2025-06-17 18:20 ` Borislav Petkov
0 siblings, 1 reply; 15+ messages in thread
From: Zhuo, Qiuxu @ 2025-06-17 16:16 UTC (permalink / raw)
To: Borislav Petkov
Cc: Luck, Tony, Marek Marczykowski-Górecki, open list:EDAC-IGEN6,
open list
> From: Borislav Petkov <bp@alien8.de>
> [...]
> > Reverting 'cfg' in this igen6_edac driver to non-const, allowing it to
> > be set with the actual number of detected memory controllers seems
> reasonable.
>
> Question is: is that something the driver should allow? Detecting more
In the igen6_edac driver, when notified of the memory errors, it checks all
the memory controllers specified by 'cfg->num_imc' to identify the source
of the error.
Either checking if imc->window == NULL (indicating null MMIO for absent
memory controllers) before each usage point, or updating 'cfg->num_imc' to
reflect the real present memory controllers should fix the issue. The latter
one is simpler.
> memory controllers but enabling less. How can that even happen?
>
The maximum number of detected memory controllers is bounded
by the macro NUM_IMC [1]. This value is what we know as the maximum
possible value now.
[1] https://github.com/torvalds/linux/blob/master/drivers/edac/igen6_edac.c#L1324
> > After that then applying Boris' fix above is the simplest way to
> > resolve the issue. 😊
>
> Right, just prepare a proper patch, please, so that Marek can test and confirm.
>
OK.
I'll make a patch for Marek to test first.
Thanks Boris.
- Qiuxu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NULL pointer dereference in igen6_probe - 6.16-rc2
2025-06-17 16:16 ` Zhuo, Qiuxu
@ 2025-06-17 18:20 ` Borislav Petkov
0 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2025-06-17 18:20 UTC (permalink / raw)
To: Zhuo, Qiuxu
Cc: Luck, Tony, Marek Marczykowski-Górecki, open list:EDAC-IGEN6,
open list
On Tue, Jun 17, 2025 at 04:16:41PM +0000, Zhuo, Qiuxu wrote:
> The maximum number of detected memory controllers is bounded
> by the macro NUM_IMC [1]. This value is what we know as the maximum
> possible value now.
I don't think you're answering my question:
Can this happen in real life and why was it added?
if (lmc < res_cfg->num_imc)
igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.",
res_cfg->num_imc, lmc);
/me does git archeology...
Aha, I guess it can even with Intel and official fw and so on - not only
coreboot:
20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers")
> I'll make a patch for Marek to test first.
Yes, that would be the right thing to do.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference
2025-06-17 11:13 NULL pointer dereference in igen6_probe - 6.16-rc2 Marek Marczykowski-Górecki
2025-06-17 11:57 ` Borislav Petkov
@ 2025-06-18 3:18 ` Qiuxu Zhuo
2025-06-18 3:26 ` Zhuo, Qiuxu
2025-06-18 15:06 ` Luck, Tony
1 sibling, 2 replies; 15+ messages in thread
From: Qiuxu Zhuo @ 2025-06-18 3:18 UTC (permalink / raw)
To: Tony Luck, Borislav Petkov, marmarek
Cc: Qiuxu Zhuo, James Morse, Mauro Carvalho Chehab, Robert Richter,
linux-edac, linux-kernel
A kernel panic was reported with the following kernel log:
EDAC igen6: Expected 2 mcs, but only 1 detected.
BUG: unable to handle page fault for address: 000000000000d570
...
Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024
RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac]
...
igen6_probe+0x2a0/0x343 [igen6_edac]
...
igen6_init+0xc5/0xff0 [igen6_edac]
...
This issue occurred because one memory controller was fused off by
the BIOS but the igen6_edac driver still checked all the memory
controllers, including this absent one, to identify the source of
the error. Accessing the null MMIO for the absent memory controller
resulted in the oops above.
Fix this issue by reverting the configuration structure to non-const
and updating the field 'res_cfg->num_imc' to reflect the number of
detected memory controllers.
Fixes: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
drivers/edac/igen6_edac.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 1930dc00c791..1cb5c67e78ae 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -125,7 +125,7 @@
#define MEM_SLICE_HASH_MASK(v) (GET_BITFIELD(v, 6, 19) << 6)
#define MEM_SLICE_HASH_LSB_MASK_BIT(v) GET_BITFIELD(v, 24, 26)
-static const struct res_config {
+static struct res_config {
bool machine_check;
/* The number of present memory controllers. */
int num_imc;
@@ -479,7 +479,7 @@ static u64 rpl_p_err_addr(u64 ecclog)
return ECC_ERROR_LOG_ADDR45(ecclog);
}
-static const struct res_config ehl_cfg = {
+static struct res_config ehl_cfg = {
.num_imc = 1,
.imc_base = 0x5000,
.ibecc_base = 0xdc00,
@@ -489,7 +489,7 @@ static const struct res_config ehl_cfg = {
.err_addr_to_imc_addr = ehl_err_addr_to_imc_addr,
};
-static const struct res_config icl_cfg = {
+static struct res_config icl_cfg = {
.num_imc = 1,
.imc_base = 0x5000,
.ibecc_base = 0xd800,
@@ -499,7 +499,7 @@ static const struct res_config icl_cfg = {
.err_addr_to_imc_addr = ehl_err_addr_to_imc_addr,
};
-static const struct res_config tgl_cfg = {
+static struct res_config tgl_cfg = {
.machine_check = true,
.num_imc = 2,
.imc_base = 0x5000,
@@ -513,7 +513,7 @@ static const struct res_config tgl_cfg = {
.err_addr_to_imc_addr = tgl_err_addr_to_imc_addr,
};
-static const struct res_config adl_cfg = {
+static struct res_config adl_cfg = {
.machine_check = true,
.num_imc = 2,
.imc_base = 0xd800,
@@ -524,7 +524,7 @@ static const struct res_config adl_cfg = {
.err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
};
-static const struct res_config adl_n_cfg = {
+static struct res_config adl_n_cfg = {
.machine_check = true,
.num_imc = 1,
.imc_base = 0xd800,
@@ -535,7 +535,7 @@ static const struct res_config adl_n_cfg = {
.err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
};
-static const struct res_config rpl_p_cfg = {
+static struct res_config rpl_p_cfg = {
.machine_check = true,
.num_imc = 2,
.imc_base = 0xd800,
@@ -547,7 +547,7 @@ static const struct res_config rpl_p_cfg = {
.err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
};
-static const struct res_config mtl_ps_cfg = {
+static struct res_config mtl_ps_cfg = {
.machine_check = true,
.num_imc = 2,
.imc_base = 0xd800,
@@ -558,7 +558,7 @@ static const struct res_config mtl_ps_cfg = {
.err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
};
-static const struct res_config mtl_p_cfg = {
+static struct res_config mtl_p_cfg = {
.machine_check = true,
.num_imc = 2,
.imc_base = 0xd800,
@@ -569,7 +569,7 @@ static const struct res_config mtl_p_cfg = {
.err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
};
-static const struct pci_device_id igen6_pci_tbl[] = {
+static struct pci_device_id igen6_pci_tbl[] = {
{ PCI_VDEVICE(INTEL, DID_EHL_SKU5), (kernel_ulong_t)&ehl_cfg },
{ PCI_VDEVICE(INTEL, DID_EHL_SKU6), (kernel_ulong_t)&ehl_cfg },
{ PCI_VDEVICE(INTEL, DID_EHL_SKU7), (kernel_ulong_t)&ehl_cfg },
@@ -1350,9 +1350,11 @@ static int igen6_register_mcis(struct pci_dev *pdev, u64 mchbar)
return -ENODEV;
}
- if (lmc < res_cfg->num_imc)
+ if (lmc < res_cfg->num_imc) {
igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.",
res_cfg->num_imc, lmc);
+ res_cfg->num_imc = lmc;
+ }
return 0;
base-commit: e04c78d86a9699d136910cfc0bdcf01087e3267e
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference
2025-06-18 3:18 ` [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference Qiuxu Zhuo
@ 2025-06-18 3:26 ` Zhuo, Qiuxu
2025-06-18 13:23 ` marmarek
2025-06-18 15:06 ` Luck, Tony
1 sibling, 1 reply; 15+ messages in thread
From: Zhuo, Qiuxu @ 2025-06-18 3:26 UTC (permalink / raw)
To: marmarek@invisiblethingslab.com
Cc: Borislav Petkov, Luck, Tony, James Morse, Mauro Carvalho Chehab,
Robert Richter, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org
Hi Marek,
> From: Zhuo, Qiuxu <qiuxu.zhuo@intel.com>
> [...]
> Subject: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference
Thank you for reporting this issue.
Could you please test this patch on your machine to verify if it fixes the issue?
Thanks!
-Qiuxu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference
2025-06-18 3:26 ` Zhuo, Qiuxu
@ 2025-06-18 13:23 ` marmarek
2025-06-18 13:39 ` Zhuo, Qiuxu
0 siblings, 1 reply; 15+ messages in thread
From: marmarek @ 2025-06-18 13:23 UTC (permalink / raw)
To: Zhuo, Qiuxu
Cc: Borislav Petkov, Luck, Tony, James Morse, Mauro Carvalho Chehab,
Robert Richter, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 620 bytes --]
On Wed, Jun 18, 2025 at 03:26:43AM +0000, Zhuo, Qiuxu wrote:
> Hi Marek,
>
> > From: Zhuo, Qiuxu <qiuxu.zhuo@intel.com>
> > [...]
> > Subject: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference
>
> Thank you for reporting this issue.
> Could you please test this patch on your machine to verify if it fixes the issue?
I can confirm it works now, I have the "EDAC igen6: Expected 2 mcs, but only
1 detected" message and it doesn't crash anymore. Thanks!
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference
2025-06-18 13:23 ` marmarek
@ 2025-06-18 13:39 ` Zhuo, Qiuxu
0 siblings, 0 replies; 15+ messages in thread
From: Zhuo, Qiuxu @ 2025-06-18 13:39 UTC (permalink / raw)
To: marmarek@invisiblethingslab.com
Cc: Borislav Petkov, Luck, Tony, James Morse, Mauro Carvalho Chehab,
Robert Richter, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org
> From: marmarek@invisiblethingslab.com
> [...]
> > Could you please test this patch on your machine to verify if it fixes the
> issue?
>
> I can confirm it works now, I have the "EDAC igen6: Expected 2 mcs, but only
> 1 detected" message and it doesn't crash anymore. Thanks!
>
> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Thank you!
-Qiuxu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference
2025-06-18 3:18 ` [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference Qiuxu Zhuo
2025-06-18 3:26 ` Zhuo, Qiuxu
@ 2025-06-18 15:06 ` Luck, Tony
2025-06-18 15:42 ` Zhuo, Qiuxu
2025-06-18 16:23 ` [PATCH v2 1/2] " Qiuxu Zhuo
1 sibling, 2 replies; 15+ messages in thread
From: Luck, Tony @ 2025-06-18 15:06 UTC (permalink / raw)
To: Qiuxu Zhuo
Cc: Borislav Petkov, marmarek, James Morse, Mauro Carvalho Chehab,
Robert Richter, linux-edac, linux-kernel
On Wed, Jun 18, 2025 at 11:18:55AM +0800, Qiuxu Zhuo wrote:
> A kernel panic was reported with the following kernel log:
>
> EDAC igen6: Expected 2 mcs, but only 1 detected.
> BUG: unable to handle page fault for address: 000000000000d570
> ...
> Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024
> RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac]
> ...
> igen6_probe+0x2a0/0x343 [igen6_edac]
> ...
> igen6_init+0xc5/0xff0 [igen6_edac]
> ...
>
> This issue occurred because one memory controller was fused off by
Maybe "disabled by BIOS" rather than "fused off by BIOS".
> the BIOS but the igen6_edac driver still checked all the memory
> controllers, including this absent one, to identify the source of
> the error. Accessing the null MMIO for the absent memory controller
> resulted in the oops above.
>
> Fix this issue by reverting the configuration structure to non-const
> and updating the field 'res_cfg->num_imc' to reflect the number of
> detected memory controllers.
>
> Fixes: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/
> Suggested-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
[snip]
> @@ -1350,9 +1350,11 @@ static int igen6_register_mcis(struct pci_dev *pdev, u64 mchbar)
> return -ENODEV;
> }
>
> - if (lmc < res_cfg->num_imc)
> + if (lmc < res_cfg->num_imc) {
> igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.",
> res_cfg->num_imc, lmc);
KERN_WARNING seems overly dramatic. BIOS likely had good reasons to
disable the memory controller (e.g. it isn't connected to any DIMM
slots on the motherboard for this system). So there's nothing actually
wrong that needs to be fixed.
KERN_INFO is enough. Perhaps KERN_DEBUG?
-Tony
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference
2025-06-18 15:06 ` Luck, Tony
@ 2025-06-18 15:42 ` Zhuo, Qiuxu
2025-06-18 16:23 ` [PATCH v2 1/2] " Qiuxu Zhuo
1 sibling, 0 replies; 15+ messages in thread
From: Zhuo, Qiuxu @ 2025-06-18 15:42 UTC (permalink / raw)
To: Luck, Tony
Cc: Borislav Petkov, marmarek@invisiblethingslab.com, James Morse,
Mauro Carvalho Chehab, Robert Richter, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org
Hi Tony,
> From: Luck, Tony <tony.luck@intel.com>
> [...]
> >
> > This issue occurred because one memory controller was fused off by
>
> Maybe "disabled by BIOS" rather than "fused off by BIOS".
The phrase "disabled by BIOS" should be more appropriate.
Will update it in v2. Thanks.
> > the BIOS but the igen6_edac driver still checked all the memory
> > controllers, including this absent one, to identify the source of the
> [...]
> >
> > - if (lmc < res_cfg->num_imc)
> > + if (lmc < res_cfg->num_imc) {
> > igen6_printk(KERN_WARNING, "Expected %d mcs, but
> only %d detected.",
> > res_cfg->num_imc, lmc);
>
> KERN_WARNING seems overly dramatic. BIOS likely had good reasons to
> disable the memory controller (e.g. it isn't connected to any DIMM slots on
> the motherboard for this system). So there's nothing actually wrong that
> needs to be fixed.
Yes. That's true.
> KERN_INFO is enough. Perhaps KERN_DEBUG?
Will change the log level to "KERN_DEBUG" in v2 to reduce noise.
- Qiuxu
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/2] EDAC/igen6: Fix NULL pointer dereference
2025-06-18 15:06 ` Luck, Tony
2025-06-18 15:42 ` Zhuo, Qiuxu
@ 2025-06-18 16:23 ` Qiuxu Zhuo
2025-06-18 16:23 ` [PATCH v2 2/2] EDAC/igen6: Reduce log level to debug for absent memory controllers Qiuxu Zhuo
2025-06-18 17:46 ` [PATCH v2 1/2] EDAC/igen6: Fix NULL pointer dereference Luck, Tony
1 sibling, 2 replies; 15+ messages in thread
From: Qiuxu Zhuo @ 2025-06-18 16:23 UTC (permalink / raw)
To: tony.luck, bp
Cc: james.morse, linux-edac, linux-kernel, marmarek, mchehab,
qiuxu.zhuo, rric
A kernel panic was reported with the following kernel log:
EDAC igen6: Expected 2 mcs, but only 1 detected.
BUG: unable to handle page fault for address: 000000000000d570
...
Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024
RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac]
...
igen6_probe+0x2a0/0x343 [igen6_edac]
...
igen6_init+0xc5/0xff0 [igen6_edac]
...
This issue occurred because one memory controller was disabled by
the BIOS but the igen6_edac driver still checked all the memory
controllers, including this absent one, to identify the source of
the error. Accessing the null MMIO for the absent memory controller
resulted in the oops above.
Fix this issue by reverting the configuration structure to non-const
and updating the field 'res_cfg->num_imc' to reflect the number of
detected memory controllers.
Fixes: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/
Suggested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
v1->v2:
- Add "Tested-by" tag from Marek.
- s/fused off/disabled/ in the commit message, as suggested by Tony.
drivers/edac/igen6_edac.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 1930dc00c791..1cb5c67e78ae 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -125,7 +125,7 @@
#define MEM_SLICE_HASH_MASK(v) (GET_BITFIELD(v, 6, 19) << 6)
#define MEM_SLICE_HASH_LSB_MASK_BIT(v) GET_BITFIELD(v, 24, 26)
-static const struct res_config {
+static struct res_config {
bool machine_check;
/* The number of present memory controllers. */
int num_imc;
@@ -479,7 +479,7 @@ static u64 rpl_p_err_addr(u64 ecclog)
return ECC_ERROR_LOG_ADDR45(ecclog);
}
-static const struct res_config ehl_cfg = {
+static struct res_config ehl_cfg = {
.num_imc = 1,
.imc_base = 0x5000,
.ibecc_base = 0xdc00,
@@ -489,7 +489,7 @@ static const struct res_config ehl_cfg = {
.err_addr_to_imc_addr = ehl_err_addr_to_imc_addr,
};
-static const struct res_config icl_cfg = {
+static struct res_config icl_cfg = {
.num_imc = 1,
.imc_base = 0x5000,
.ibecc_base = 0xd800,
@@ -499,7 +499,7 @@ static const struct res_config icl_cfg = {
.err_addr_to_imc_addr = ehl_err_addr_to_imc_addr,
};
-static const struct res_config tgl_cfg = {
+static struct res_config tgl_cfg = {
.machine_check = true,
.num_imc = 2,
.imc_base = 0x5000,
@@ -513,7 +513,7 @@ static const struct res_config tgl_cfg = {
.err_addr_to_imc_addr = tgl_err_addr_to_imc_addr,
};
-static const struct res_config adl_cfg = {
+static struct res_config adl_cfg = {
.machine_check = true,
.num_imc = 2,
.imc_base = 0xd800,
@@ -524,7 +524,7 @@ static const struct res_config adl_cfg = {
.err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
};
-static const struct res_config adl_n_cfg = {
+static struct res_config adl_n_cfg = {
.machine_check = true,
.num_imc = 1,
.imc_base = 0xd800,
@@ -535,7 +535,7 @@ static const struct res_config adl_n_cfg = {
.err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
};
-static const struct res_config rpl_p_cfg = {
+static struct res_config rpl_p_cfg = {
.machine_check = true,
.num_imc = 2,
.imc_base = 0xd800,
@@ -547,7 +547,7 @@ static const struct res_config rpl_p_cfg = {
.err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
};
-static const struct res_config mtl_ps_cfg = {
+static struct res_config mtl_ps_cfg = {
.machine_check = true,
.num_imc = 2,
.imc_base = 0xd800,
@@ -558,7 +558,7 @@ static const struct res_config mtl_ps_cfg = {
.err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
};
-static const struct res_config mtl_p_cfg = {
+static struct res_config mtl_p_cfg = {
.machine_check = true,
.num_imc = 2,
.imc_base = 0xd800,
@@ -569,7 +569,7 @@ static const struct res_config mtl_p_cfg = {
.err_addr_to_imc_addr = adl_err_addr_to_imc_addr,
};
-static const struct pci_device_id igen6_pci_tbl[] = {
+static struct pci_device_id igen6_pci_tbl[] = {
{ PCI_VDEVICE(INTEL, DID_EHL_SKU5), (kernel_ulong_t)&ehl_cfg },
{ PCI_VDEVICE(INTEL, DID_EHL_SKU6), (kernel_ulong_t)&ehl_cfg },
{ PCI_VDEVICE(INTEL, DID_EHL_SKU7), (kernel_ulong_t)&ehl_cfg },
@@ -1350,9 +1350,11 @@ static int igen6_register_mcis(struct pci_dev *pdev, u64 mchbar)
return -ENODEV;
}
- if (lmc < res_cfg->num_imc)
+ if (lmc < res_cfg->num_imc) {
igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.",
res_cfg->num_imc, lmc);
+ res_cfg->num_imc = lmc;
+ }
return 0;
base-commit: e04c78d86a9699d136910cfc0bdcf01087e3267e
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 2/2] EDAC/igen6: Reduce log level to debug for absent memory controllers
2025-06-18 16:23 ` [PATCH v2 1/2] " Qiuxu Zhuo
@ 2025-06-18 16:23 ` Qiuxu Zhuo
2025-06-18 17:46 ` [PATCH v2 1/2] EDAC/igen6: Fix NULL pointer dereference Luck, Tony
1 sibling, 0 replies; 15+ messages in thread
From: Qiuxu Zhuo @ 2025-06-18 16:23 UTC (permalink / raw)
To: tony.luck, bp
Cc: james.morse, linux-edac, linux-kernel, marmarek, mchehab,
qiuxu.zhuo, rric
The current KERN_WARNING level message for detecting absent memory
controllers is overly dramatic. The BIOS likely had valid reasons to
disable the memory controller (e.g. it isn't connected to any DIMM
slots on the motherboard for this system). So there's nothing actually
wrong that needs to be fixed.
Reduce the log level to KERN_DEBUG to eliminate the false warning.
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
drivers/edac/igen6_edac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 1cb5c67e78ae..5ffe9579959f 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1351,7 +1351,7 @@ static int igen6_register_mcis(struct pci_dev *pdev, u64 mchbar)
}
if (lmc < res_cfg->num_imc) {
- igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.",
+ igen6_printk(KERN_DEBUG, "Expected %d mcs, but only %d detected.",
res_cfg->num_imc, lmc);
res_cfg->num_imc = lmc;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: [PATCH v2 1/2] EDAC/igen6: Fix NULL pointer dereference
2025-06-18 16:23 ` [PATCH v2 1/2] " Qiuxu Zhuo
2025-06-18 16:23 ` [PATCH v2 2/2] EDAC/igen6: Reduce log level to debug for absent memory controllers Qiuxu Zhuo
@ 2025-06-18 17:46 ` Luck, Tony
1 sibling, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2025-06-18 17:46 UTC (permalink / raw)
To: Zhuo, Qiuxu, bp@alien8.de
Cc: james.morse@arm.com, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org, marmarek@invisiblethingslab.com,
mchehab@kernel.org, rric@kernel.org
> This issue occurred because one memory controller was disabled by
> the BIOS but the igen6_edac driver still checked all the memory
> controllers, including this absent one, to identify the source of
> the error. Accessing the null MMIO for the absent memory controller
> resulted in the oops above.
>
> Fix this issue by reverting the configuration structure to non-const
> and updating the field 'res_cfg->num_imc' to reflect the number of
> detected memory controllers.
>
> Fixes: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/
> Suggested-by: Borislav Petkov <bp@alien8.de>
> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Applied (both this and patch 2/2) to RAS edac-drivers branch.
Thanks
-Tony
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-06-18 17:46 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-17 11:13 NULL pointer dereference in igen6_probe - 6.16-rc2 Marek Marczykowski-Górecki
2025-06-17 11:57 ` Borislav Petkov
2025-06-17 14:09 ` Zhuo, Qiuxu
2025-06-17 14:51 ` Borislav Petkov
2025-06-17 16:16 ` Zhuo, Qiuxu
2025-06-17 18:20 ` Borislav Petkov
2025-06-18 3:18 ` [PATCH 1/1] EDAC/igen6: Fix NULL pointer dereference Qiuxu Zhuo
2025-06-18 3:26 ` Zhuo, Qiuxu
2025-06-18 13:23 ` marmarek
2025-06-18 13:39 ` Zhuo, Qiuxu
2025-06-18 15:06 ` Luck, Tony
2025-06-18 15:42 ` Zhuo, Qiuxu
2025-06-18 16:23 ` [PATCH v2 1/2] " Qiuxu Zhuo
2025-06-18 16:23 ` [PATCH v2 2/2] EDAC/igen6: Reduce log level to debug for absent memory controllers Qiuxu Zhuo
2025-06-18 17:46 ` [PATCH v2 1/2] EDAC/igen6: Fix NULL pointer dereference Luck, Tony
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).