Linux EDAC development
 help / color / mirror / Atom feed
* Flood of edac-related errors since 6.13
@ 2025-02-06 22:28 Ramses
  2025-02-07  3:18 ` Zhuo, Qiuxu
  0 siblings, 1 reply; 9+ messages in thread
From: Ramses @ 2025-02-06 22:28 UTC (permalink / raw)
  To: Linux Edac

Hi

Since 6.13, I get a flood of the following messages in the logs of my intel n100 machine, always with the exact same memory address:

jan 20 12:40:57 beelink kernel: EDAC MC: Ver: 3.0.0
jan 20 12:40:57 beelink kernel: caller igen6_probe+0x1b2/0x83b [igen6_edac] mapping multiple BARs
jan 20 12:40:57 beelink kernel: EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (POLLED)
jan 20 12:40:57 beelink kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
jan 20 12:40:57 beelink kernel: EDAC igen6 MC0: ADDR 0x7fffffffe0
jan 20 12:40:57 beelink kernel: EDAC igen6: v2.5.1
jan 20 12:40:59 beelink kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
jan 20 12:40:59 beelink kernel: EDAC igen6 MC0: ADDR 0x7fffffffe0
jan 20 12:41:00 beelink kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
jan 20 12:41:00 beelink kernel: EDAC igen6 MC0: ADDR 0x7fffffffe0
jan 20 12:41:01 beelink kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
jan 20 12:41:01 beelink kernel: EDAC igen6 MC0: ADDR 0x7fffffffe0
jan 20 12:41:02 beelink kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
jan 20 12:41:02 beelink kernel: EDAC igen6 MC0: ADDR 0x7fffffffe0
jan 20 12:41:03 beelink kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
jan 20 12:41:04 beelink kernel: EDAC igen6 MC0: ADDR 0x7fffffffe0
jan 20 12:41:04 beelink kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
jan 20 12:41:05 beelink kernel: EDAC igen6 MC0: ADDR 0x7fffffffe0

On previous kernels, I only got this single message right after booting:

jan 20 03:17:45 beelink kernel: EDAC MC: Ver: 3.0.0
jan 20 03:17:45 beelink kernel: caller igen6_probe+0x191/0x810 [igen6_edac] mapping multiple BARs
jan 20 03:17:45 beelink kernel: EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
jan 20 03:17:45 beelink kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
jan 20 03:17:45 beelink kernel: EDAC igen6 MC0: ADDR 0x7fffffffe0
jan 20 03:17:45 beelink kernel: EDAC igen6: v2.5.1

I assume this change is because of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.13.1&id=e14232afa94445e03fc3a0291b07a68f3408c120

The system works perfectly fine otherwise and I've had the message at boot for as long as the logs go back (until mid october 2024). I didn't actually notice it before, so I'm only know questioning what's its significance.

Is there a possibility that this is a false positive, or is this definitely an indication that something is wrong with the system's memory and I should replace the DIMM?
I ran memtest for about two hours earlier and it didn't report any failure.

I'd like to hear your thoughts on this.

Cheers,
Ramses


^ permalink raw reply	[flat|nested] 9+ messages in thread
* Re: Flood of edac-related errors since 6.13
@ 2025-02-09 11:50 John
  0 siblings, 0 replies; 9+ messages in thread
From: John @ 2025-02-09 11:50 UTC (permalink / raw)
  To: linux-edac@vger.kernel.org; +Cc: qiuxu.zhuo@intel.com, amses@well-founded.dev

In-Reply-To: <CY8PR11MB7134594FDBF7AED80E415AEC89F12@CY8PR11MB7134.namprd11.prod.outlook.com>

I have been following this thread as I am similar behavior on a Beelink N100 based system as well.

For me, booting into the 6.13.1 kernel gives the same flood of errors but also has functional consequences. I get core dumps of /usr/lib/kodi/kodi.bin in libcurl. I do not know if this is related specifically to the EDAC changes or to something else. If boot into the 6.12.12 kernel instead of the 6.13.1 kernel, I get neither the flood of errors nor the coredumps. Please let me know if I can provide any logs to diagnose.

In the meantime, I will compile 6.13.1 with the following reverted and see if both improve: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.13.1&id=e14232afa94445e03fc3a0291b07a68f3408c120

Example entries, full journactl output upon request:
...
en6 MC0: HANDLING IBECC MEMORY ERROR
en6 MC0: ADDR 0x7fffffffe0 
en6 MC0: HANDLING IBECC MEMORY ERROR
en6 MC0: ADDR 0x7fffffffe0 
JobWorker[17237] general protection fault ip:76a01e02ffe6 sp:769f823fb948 error:0 in libcurl.so.4.8.0[71fe6,76a01dfc9000+9f000]
p[17254]: Process 1757 (kodi.bin) of user 989 terminated abnormally with signal 11/SEGV, processing...
rted Process Core Dump (PID 17254/UID 0).
en6 MC0: HANDLING IBECC MEMORY ERROR
p[17255]: [🡕] Process 1757 (kodi.bin) of user 989 dumped core.
          
          Module [dso] without build-id.
          Module [dso] without build-id.
          Module [dso] without build-id.
          Stack trace of thread 17237:
          #0  0x000076a01e02ffe6 n/a (n/a + 0x0)
          #1  0x000076a01e05320f n/a (n/a + 0x0)
          #2  0x000076a01e04f2a4 n/a (n/a + 0x0)
          #3  0x000076a01e046c7a n/a (n/a + 0x0)
          #4  0x000076a01dfdf080 n/a (n/a + 0x0)
          #5  0x000076a01dfcb7c9 n/a (n/a + 0x0)
          #6  0x000076a01dfd4fee n/a (n/a + 0x0)
          #7  0x000076a01e01f640 n/a (n/a + 0x0)
          #8  0x000076a01e022275 n/a (n/a + 0x0)
          #9  0x000060550d929f1a n/a (n/a + 0x0)
          #10 0x000060550d92abfd n/a (n/a + 0x0)
          #11 0x000060550d94528e n/a (n/a + 0x0)
          #12 0x000060550d9779a4 n/a (n/a + 0x0)
          #13 0x000060550d978aea n/a (n/a + 0x0)
          #14 0x000060550d978d67 n/a (n/a + 0x0)
          #15 0x000060550d0017eb n/a (n/a + 0x0)
          #16 0x000060550d001ed9 n/a (n/a + 0x0)
          #17 0x000060550d304f3c n/a (n/a + 0x0)
          #18 0x000060550d307998 n/a (n/a + 0x0)
          #19 0x000060550d2d8bbc n/a (n/a + 0x0)
          #20 0x000060550d28890d n/a (n/a + 0x0)
          #21 0x000060550cdbaaa9 n/a (n/a + 0x0)
          #22 0x000060550ce7779f n/a (n/a + 0x0)
          #23 0x000060550eaa6b4b n/a (n/a + 0x0)
          #24 0x000060550ce66143 n/a (n/a + 0x0)
          #25 0x000076a01c0e1f74 n/a (n/a + 0x0)
          #26 0x000076a01bea370a n/a (n/a + 0x0)
          #27 0x000076a01bf27aac n/a (n/a + 0x0)
          ELF object binary architecture: AMD x86-64


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-02-12  8:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-06 22:28 Flood of edac-related errors since 6.13 Ramses
2025-02-07  3:18 ` Zhuo, Qiuxu
2025-02-07  9:25   ` Ramses
2025-02-10  8:04     ` Zhuo, Qiuxu
2025-02-10  8:49       ` Ramses
2025-02-11  1:55         ` Zhuo, Qiuxu
2025-02-11 19:47           ` John
2025-02-12  8:35             ` Zhuo, Qiuxu
  -- strict thread matches above, loose matches on Subject: below --
2025-02-09 11:50 John

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox