From: "Li,Rongqing" <lirongqing@baidu.com>
To: Borislav Petkov <bp@alien8.de>, "Luck, Tony" <tony.luck@intel.com>
Cc: Nikolay Borisov <nik.borisov@suse.com>,
Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"H . Peter Anvin" <hpa@zytor.com>,
"Yazen Ghannam" <yazen.ghannam@amd.com>,
"Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>,
Avadhut Naik <avadhut.naik@amd.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: 答复: [外部邮件] Re: [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event
Date: Mon, 2 Feb 2026 23:49:40 +0000 [thread overview]
Message-ID: <8c967791348a4f4d815c7612a15eee15@baidu.com> (raw)
In-Reply-To: <20260202151828.GAaYDARDsP21UVEPTb@fat_crate.local>
> On Wed, Jan 14, 2026 at 03:48:13PM +0100, Borislav Petkov wrote:
> > Now on to find what causes this. Even if we can't find the proper
> > commit, I guess testing 6.18 and 6.12 - the LTS kernels - should be
> > good enough as to backport a fix there.
>
> Ok, finally back to staring at this.
>
> Looks like adding this:
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 34440021e8cf..b94efe5950c4 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -154,6 +154,8 @@ void mce_log(struct mce_hw_err *err) {
> if (mce_gen_pool_add(err))
> irq_work_queue(&mce_irq_work);
> +
> + set_bit(0, &mce_need_notify);
> }
> EXPORT_SYMBOL_GPL(mce_log);
>
> makes the interval halve again, see below for the timestamps.
>
> I guess I'll do a proper patch from the hunk here:
>
> https://lore.kernel.org/r/20260113224152.GVaWbKMMzManQ5WwlT@fat_cr
> ate.local
>
Is it possible where CPU0 sets mce_need_notify, but CPU1 concurrently calls mce_notify_irq in mce_timer_fn, and then CPU1 sets its own timer to 1/2 instead of CPU0's
[Li,Rongqing]
> along with 6.12 and 6.18 backports and see whether that's a good enough as a
> stable fix too.
>
> Thx.
>
> [ 316.795248] mce: [Hardware Error]: Machine check events logged
> [ 316.795262] mce: [Hardware Error]: Machine check events logged
> [ 316.798331] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:
> 9c2041000000011b [ 316.800104] mce: [Hardware Error]: TSC 0 ADDR
> 6d3d483b [ 316.801442] mce: [Hardware Error]: PROCESSOR 2:800f82 TIME
> 1770040950 SOCKET 0 APIC 0 microcode 800820d [ 628.091492] mce:
> [Hardware Error]: Machine check events logged [ 628.091515] mce:
> [Hardware Error]: Machine check events logged [ 628.097216] mce:
> [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9c2041000000011b
> [ 628.101393] mce: [Hardware Error]: TSC 0 ADDR 6d3d483b [ 628.103992]
> mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1770041262 SOCKET 0
> APIC 0 microcode 800820d
>
> <--- it starts decreasing the interval here.
>
> [ 825.581354] hrtimer: interrupt took 18820 ns [ 939.387367] mce:
> [Hardware Error]: Machine check events logged [ 939.390185] mce:
> [Hardware Error]: Machine check events logged [ 939.392936] mce:
> [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9c2041000000011b
> [ 939.396465] mce: [Hardware Error]: TSC 0 ADDR 6d3d483b [ 939.399042]
> mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1770041573 SOCKET 0
> APIC 0 microcode 800820d [ 1103.227402] mce: [Hardware Error]: Machine
> check events logged [ 1103.230267] mce: [Hardware Error]: Machine check
> events logged [ 1103.233018] mce: [Hardware Error]: CPU 0: Machine Check: 0
> Bank 4: 9c2041000000011b [ 1103.236565] mce: [Hardware Error]: TSC 0
> ADDR 6d3d483b [ 1103.239146] mce: [Hardware Error]: PROCESSOR 2:800f82
> TIME 1770041737 SOCKET 0 APIC 0 microcode 800820d [ 1179.003479] mce:
> [Hardware Error]: Machine check events logged [ 1179.006452] mce:
> [Hardware Error]: Machine check events logged [ 1179.009144] mce:
> [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9c2041000000011b
> [ 1179.012757] mce: [Hardware Error]: TSC 0 ADDR 6d3d483b [ 1179.015338]
> mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1770041813 SOCKET 0
> APIC 0 microcode 800820d [ 1217.915386] mce: [Hardware Error]: CPU 0:
> Machine Check: 0 Bank 4: 9c2041000000011b [ 1217.919088] mce:
> [Hardware Error]: TSC 0 ADDR 6d3d483b [ 1217.921662] mce: [Hardware
> Error]: PROCESSOR 2:800f82 TIME 1770041852 SOCKET 0 APIC 0 microcode
> 800820d [ 1238.395440] mce: [Hardware Error]: CPU 0: Machine Check: 0
> Bank 4: 9c2041000000011b [ 1238.399041] mce: [Hardware Error]: TSC 0
> ADDR 6d3d483b [ 1238.401619] mce: [Hardware Error]: PROCESSOR 2:800f82
> TIME 1770041872 SOCKET 0 APIC 0 microcode 800820d [ 1269.115368]
> mce_notify_irq: 4 callbacks suppressed [ 1269.117829] mce: [Hardware Error]:
> Machine check events logged [ 1269.120586] mce: [Hardware Error]: Machine
> check events logged [ 1269.123412] mce: [Hardware Error]: CPU 0: Machine
> Check: 0 Bank 4: 9c2041000000011b [ 1269.126950] mce: [Hardware Error]:
> TSC 0 ADDR 6d3d483b [ 1269.129511] mce: [Hardware Error]: PROCESSOR
> 2:800f82 TIME 1770041903 SOCKET 0 APIC 0 microcode 800820d
>
> and then it started enlarging it again when I changed the injection interval to
> 300s.
>
> [ 1578.363408] mce: [Hardware Error]: Machine check events logged
> [ 1578.366346] mce: [Hardware Error]: Machine check events logged
> [ 1578.369174] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:
> 9c2041000000011b [ 1578.372742] mce: [Hardware Error]: TSC 0 ADDR
> 6d3d483b [ 1578.375226] mce: [Hardware Error]: PROCESSOR 2:800f82 TIME
> 1770042212 SOCKET 0 APIC 0 microcode 800820d [ 2119.035460] mce:
> [Hardware Error]: Machine check events logged [ 2119.038432] mce:
> [Hardware Error]: Machine check events logged [ 2119.041236] mce:
> [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9c2041000000011b
> [ 2119.044846] mce: [Hardware Error]: TSC 0 ADDR 6d3d483b [ 2119.047340]
> mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1770042753 SOCKET 0
> APIC 0 microcode 800820d [ 2282.875491] mce: [Hardware Error]: Machine
> check events logged [ 2282.878409] mce: [Hardware Error]: Machine check
> events logged [ 2282.881277] mce: [Hardware Error]: CPU 0: Machine Check: 0
> Bank 4: 9c2041000000011b [ 2282.884978] mce: [Hardware Error]: TSC 0
> ADDR 6d3d483b [ 2282.887482] mce: [Hardware Error]: PROCESSOR 2:800f82
> TIME 1770042917 SOCKET 0 APIC 0 microcode 800820d [ 2512.251516] mce:
> [Hardware Error]: Machine check events logged [ 2512.254371] mce:
> [Hardware Error]: Machine check events logged [ 2512.257261] mce:
> [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9c2041000000011b
> [ 2512.260841] mce: [Hardware Error]: TSC 0 ADDR 6d3d483b [ 2512.263406]
> mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1770043146 SOCKET 0
> APIC 0 microcode 800820d
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
next prev parent reply other threads:[~2026-02-02 23:50 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-12 8:27 [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event lirongqing
2026-01-12 8:56 ` Nikolay Borisov
2026-01-12 9:36 ` 答复: [外部邮件] " Li,Rongqing
2026-01-12 9:51 ` Borislav Petkov
2026-01-12 10:24 ` 答复: " Li,Rongqing
2026-01-13 9:51 ` Borislav Petkov
[not found] ` <39cfb093256f4da78fe0bc9e814ce5d0@baidu.com>
2026-01-13 12:48 ` 答复: " Borislav Petkov
2026-01-13 18:53 ` Luck, Tony
2026-01-13 18:55 ` Nikolay Borisov
2026-01-13 19:13 ` Borislav Petkov
2026-01-13 19:25 ` Nikolay Borisov
2026-01-13 19:33 ` Borislav Petkov
2026-01-13 19:37 ` Nikolay Borisov
2026-01-13 19:44 ` Borislav Petkov
2026-01-13 19:51 ` Nikolay Borisov
2026-01-13 20:33 ` Borislav Petkov
2026-01-13 19:10 ` Borislav Petkov
2026-01-13 19:31 ` Nikolay Borisov
2026-01-13 20:30 ` Thomas Gleixner
2026-01-13 20:56 ` Borislav Petkov
2026-01-13 21:05 ` Luck, Tony
2026-01-13 21:31 ` Borislav Petkov
2026-01-13 22:41 ` Borislav Petkov
2026-01-14 0:30 ` Luck, Tony
2026-01-14 13:50 ` Borislav Petkov
2026-01-14 14:48 ` Borislav Petkov
2026-02-02 15:18 ` Borislav Petkov
2026-02-02 23:49 ` Li,Rongqing [this message]
2026-02-06 22:03 ` 答复: [外部邮件] " Borislav Petkov
2026-02-07 11:51 ` Borislav Petkov
2026-02-09 17:37 ` Luck, Tony
2026-02-10 15:01 ` Borislav Petkov
2026-03-06 7:37 ` 答复: [外部邮件] " Li,Rongqing(ACG CCN)
2026-03-06 14:00 ` Borislav Petkov
2026-03-06 14:38 ` 答复: " Li,Rongqing(ACG CCN)
2026-03-06 15:29 ` Borislav Petkov
2026-03-07 1:18 ` 答复: " Li,Rongqing(ACG CCN)
2026-03-16 13:44 ` Borislav Petkov
2026-04-06 22:49 ` [PATCH] x86/mce: Restore MCA polling interval halving Borislav Petkov
2026-04-07 12:51 ` Nikolay Borisov
2026-04-07 15:04 ` Zhuo, Qiuxu
2026-04-14 21:18 ` Borislav Petkov
2026-04-14 22:22 ` Luck, Tony
2026-04-15 19:27 ` Borislav Petkov
2026-04-15 19:53 ` Luck, Tony
2026-04-15 20:02 ` Borislav Petkov
2026-04-17 11:50 ` Borislav Petkov
2026-04-20 14:14 ` Zhuo, Qiuxu
2026-04-21 12:05 ` Borislav Petkov
2026-04-21 15:49 ` Zhuo, Qiuxu
2026-04-23 12:49 ` Borislav Petkov
2026-04-15 13:39 ` Zhuo, Qiuxu
2026-04-15 19:35 ` Borislav Petkov
2026-01-14 6:17 ` [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event Nikolay Borisov
2026-01-14 13:52 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8c967791348a4f4d815c7612a15eee15@baidu.com \
--to=lirongqing@baidu.com \
--cc=avadhut.naik@amd.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nik.borisov@suse.com \
--cc=qiuxu.zhuo@intel.com \
--cc=tglx@kernel.org \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox