* RE: Flood of edac-related errors since 6.13
[not found] ` <OIUifRt--F-9@well-founded.dev>
@ 2025-02-10 8:04 ` Zhuo, Qiuxu
2025-02-10 8:49 ` Ramses
0 siblings, 1 reply; 5+ messages in thread
From: Zhuo, Qiuxu @ 2025-02-10 8:04 UTC (permalink / raw)
To: Ramses, John; +Cc: Linux Edac, linux-kernel@vger.kernel.org, Luck, Tony
[-- Attachment #1: Type: text/plain, Size: 611 bytes --]
Hi Ramses,
> From: Ramses <ramses@well-founded.dev>
> [...]
>
> Thanks for your reply!
>
> I recompiled the kernel with that option enabled, and attached the dmesg
> output to this email. Let me know if I can do anything else to help debug this.
Thanks for helping debug the issue and taking the useful dmesg log.
From the dmesg log, the ECC error log register of this SoC contained the
invalid error value ~0, resulting in a flood of invalid error reports in polling mode.
@Ramses & @John,
Can you please apply the attached fix patch and see whether it fixes the issue?
Thanks!
-Qiuxu
[-- Attachment #2: 0001-EDAC-igen6-Fix-the-flood-of-invalid-error-reports.patch --]
[-- Type: application/octet-stream, Size: 2007 bytes --]
From fa843550da0589a1fae2fa7713767bbc98b3a02c Mon Sep 17 00:00:00 2001
From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Date: Mon, 10 Feb 2025 15:17:37 +0800
Subject: [PATCH 1/1] EDAC/igen6: Fix the flood of invalid error reports
The ECC_ERROR_LOG register of certain SoCs may contain the invalid value
~0, which results in a flood of invalid error reports in polling mode.
Fix the flood of invalid error reports by skipping the invalid ECC error
log value ~0.
Fixes: e14232afa944 ("EDAC/igen6: Add polling support")
Reported-by: Ramses <ramses@well-founded.dev>
Closes: https://lore.kernel.org/all/OISL8Rv--F-9@well-founded.dev/
Reported-by: John <therealgraysky@proton.me>
Closes: https://lore.kernel.org/all/p5YcxOE6M3Ncxpn2-Ia_wCt61EM4LwIiN3LroQvT_-G2jMrFDSOW5k2A9D8UUzD2toGpQBN1eI0sL5dSKnkO8iteZegLoQEj-DwQaMhGx4A=@proton.me/
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
drivers/edac/igen6_edac.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index fdf3a84fe698..595908af9e5c 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -785,13 +785,22 @@ static u64 ecclog_read_and_clear(struct igen6_imc *imc)
{
u64 ecclog = readq(imc->window + ECC_ERROR_LOG_OFFSET);
- if (ecclog & (ECC_ERROR_LOG_CE | ECC_ERROR_LOG_UE)) {
- /* Clear CE/UE bits by writing 1s */
- writeq(ecclog, imc->window + ECC_ERROR_LOG_OFFSET);
- return ecclog;
- }
+ /*
+ * Quirk: The ECC_ERROR_LOG register of certain SoCs may contain
+ * the invalid value ~0. This will result in a flood of invalid
+ * error reports in polling mode. Skip it.
+ */
+ if (ecclog == ~0)
+ return 0;
- return 0;
+ /* Neither a CE nor a UE. Skip it.*/
+ if (!(ecclog & (ECC_ERROR_LOG_CE | ECC_ERROR_LOG_UE)))
+ return 0;
+
+ /* Clear CE/UE bits by writing 1s */
+ writeq(ecclog, imc->window + ECC_ERROR_LOG_OFFSET);
+
+ return ecclog;
}
static void errsts_clear(struct igen6_imc *imc)
--
2.17.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: Flood of edac-related errors since 6.13
2025-02-10 8:04 ` Flood of edac-related errors since 6.13 Zhuo, Qiuxu
@ 2025-02-10 8:49 ` Ramses
2025-02-11 1:55 ` Zhuo, Qiuxu
0 siblings, 1 reply; 5+ messages in thread
From: Ramses @ 2025-02-10 8:49 UTC (permalink / raw)
To: Zhuo, Qiuxu; +Cc: John, Linux Edac, linux-kernel@vger.kernel.org, Luck, Tony
Feb 10, 2025, 09:05 by qiuxu.zhuo@intel.com:
> Hi Ramses,
>
>> From: Ramses <>> ramses@well-founded.dev>> >
>> [...]
>>
>> Thanks for your reply!
>>
>> I recompiled the kernel with that option enabled, and attached the dmesg
>> output to this email. Let me know if I can do anything else to help debug this.
>>
>
> Thanks for helping debug the issue and taking the useful dmesg log.
> From the dmesg log, the ECC error log register of this SoC contained the
> invalid error value ~0, resulting in a flood of invalid error reports in polling mode.
>
> @Ramses & @John,
> Can you please apply the attached fix patch and see whether it fixes the issue?
> Thanks!
>
> -Qiuxu
>
I just booted into a kernel with that patch applied and I'm not getting the errors anymore, so that seems to fix the issue for me indeed!
Thanks a bunch!
Ramses
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Flood of edac-related errors since 6.13
2025-02-10 8:49 ` Ramses
@ 2025-02-11 1:55 ` Zhuo, Qiuxu
2025-02-11 19:47 ` John
0 siblings, 1 reply; 5+ messages in thread
From: Zhuo, Qiuxu @ 2025-02-11 1:55 UTC (permalink / raw)
To: Ramses; +Cc: John, Linux Edac, linux-kernel@vger.kernel.org, Luck, Tony
Hi Ramses,
> From: Ramses <ramses@well-founded.dev>
> [...]
>
> I just booted into a kernel with that patch applied and I'm not getting the
> errors anymore, so that seems to fix the issue for me indeed!
Thank you for your testing.
I'm waiting a bit for John's test result (if he has the chance to test it).
After that, I'll post the fix path to the EDAC mailing list.
-Qiuxu
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Flood of edac-related errors since 6.13
2025-02-11 1:55 ` Zhuo, Qiuxu
@ 2025-02-11 19:47 ` John
2025-02-12 8:35 ` Zhuo, Qiuxu
0 siblings, 1 reply; 5+ messages in thread
From: John @ 2025-02-11 19:47 UTC (permalink / raw)
To: Zhuo, Qiuxu; +Cc: Ramses, Linux Edac, linux-kernel@vger.kernel.org, Luck, Tony
On Monday, February 10th, 2025 at 8:55 PM, Zhuo, Qiuxu qiuxu.zhuo@intel.com wrote:
> I'm waiting a bit for John's test result (if he has the chance to test it).
> After that, I'll post the fix path to the EDAC mailing list.
Confirmed, your patch fixed the flood experienced with 6.13.2-arch1-1 on my Beelink EQ12 (N100). Many thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Flood of edac-related errors since 6.13
2025-02-11 19:47 ` John
@ 2025-02-12 8:35 ` Zhuo, Qiuxu
0 siblings, 0 replies; 5+ messages in thread
From: Zhuo, Qiuxu @ 2025-02-12 8:35 UTC (permalink / raw)
To: John; +Cc: Ramses, Linux Edac, linux-kernel@vger.kernel.org, Luck, Tony
> From: John <therealgraysky@proton.me>
> [...]
> Confirmed, your patch fixed the flood experienced with 6.13.2-arch1-1 on my
> Beelink EQ12 (N100). Many thanks.
Hi John,
Thanks so much for your verification.
-Qiuxu
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-02-12 8:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <OISL8Rv--F-9@well-founded.dev>
[not found] ` <CY8PR11MB7134594FDBF7AED80E415AEC89F12@CY8PR11MB7134.namprd11.prod.outlook.com>
[not found] ` <OIUifRt--F-9@well-founded.dev>
2025-02-10 8:04 ` Flood of edac-related errors since 6.13 Zhuo, Qiuxu
2025-02-10 8:49 ` Ramses
2025-02-11 1:55 ` Zhuo, Qiuxu
2025-02-11 19:47 ` John
2025-02-12 8:35 ` Zhuo, Qiuxu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox