public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Li,Rongqing" <lirongqing@baidu.com>
To: Borislav Petkov <bp@alien8.de>, "Luck, Tony" <tony.luck@intel.com>
Cc: Nikolay Borisov <nik.borisov@suse.com>,
	Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"H . Peter Anvin" <hpa@zytor.com>,
	"Yazen Ghannam" <yazen.ghannam@amd.com>,
	"Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>,
	Avadhut Naik <avadhut.naik@amd.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: 答复: [外部邮件] Re: [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event
Date: Mon, 2 Feb 2026 23:49:40 +0000	[thread overview]
Message-ID: <8c967791348a4f4d815c7612a15eee15@baidu.com> (raw)
In-Reply-To: <20260202151828.GAaYDARDsP21UVEPTb@fat_crate.local>

> On Wed, Jan 14, 2026 at 03:48:13PM +0100, Borislav Petkov wrote:
> > Now on to find what causes this. Even if we can't find the proper
> > commit, I guess testing 6.18 and 6.12 - the LTS kernels - should be
> > good enough as to backport a fix there.
> 
> Ok, finally back to staring at this.
> 
> Looks like adding this:
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 34440021e8cf..b94efe5950c4 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -154,6 +154,8 @@ void mce_log(struct mce_hw_err *err)  {
>         if (mce_gen_pool_add(err))
>                 irq_work_queue(&mce_irq_work);
> +
> +       set_bit(0, &mce_need_notify);
>  }
>  EXPORT_SYMBOL_GPL(mce_log);
> 
> makes the interval halve again, see below for the timestamps.
> 
> I guess I'll do a proper patch from the hunk here:
> 
> https://lore.kernel.org/r/20260113224152.GVaWbKMMzManQ5WwlT@fat_cr
> ate.local
> 

Is it possible where CPU0 sets mce_need_notify, but CPU1 concurrently calls mce_notify_irq in mce_timer_fn, and then CPU1 sets its own timer to 1/2 instead of CPU0's

[Li,Rongqing] 



> along with 6.12 and 6.18 backports and see whether that's a good enough as a
> stable fix too.
> 
> Thx.
> 
> [  316.795248] mce: [Hardware Error]: Machine check events logged
> [  316.795262] mce: [Hardware Error]: Machine check events logged
> [  316.798331] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:
> 9c2041000000011b [  316.800104] mce: [Hardware Error]: TSC 0 ADDR
> 6d3d483b [  316.801442] mce: [Hardware Error]: PROCESSOR 2:800f82 TIME
> 1770040950 SOCKET 0 APIC 0 microcode 800820d [  628.091492] mce:
> [Hardware Error]: Machine check events logged [  628.091515] mce:
> [Hardware Error]: Machine check events logged [  628.097216] mce:
> [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9c2041000000011b
> [  628.101393] mce: [Hardware Error]: TSC 0 ADDR 6d3d483b [  628.103992]
> mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1770041262 SOCKET 0
> APIC 0 microcode 800820d
> 
> <--- it starts decreasing the interval here.
> 
> [  825.581354] hrtimer: interrupt took 18820 ns [  939.387367] mce:
> [Hardware Error]: Machine check events logged [  939.390185] mce:
> [Hardware Error]: Machine check events logged [  939.392936] mce:
> [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9c2041000000011b
> [  939.396465] mce: [Hardware Error]: TSC 0 ADDR 6d3d483b [  939.399042]
> mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1770041573 SOCKET 0
> APIC 0 microcode 800820d [ 1103.227402] mce: [Hardware Error]: Machine
> check events logged [ 1103.230267] mce: [Hardware Error]: Machine check
> events logged [ 1103.233018] mce: [Hardware Error]: CPU 0: Machine Check: 0
> Bank 4: 9c2041000000011b [ 1103.236565] mce: [Hardware Error]: TSC 0
> ADDR 6d3d483b [ 1103.239146] mce: [Hardware Error]: PROCESSOR 2:800f82
> TIME 1770041737 SOCKET 0 APIC 0 microcode 800820d [ 1179.003479] mce:
> [Hardware Error]: Machine check events logged [ 1179.006452] mce:
> [Hardware Error]: Machine check events logged [ 1179.009144] mce:
> [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9c2041000000011b
> [ 1179.012757] mce: [Hardware Error]: TSC 0 ADDR 6d3d483b [ 1179.015338]
> mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1770041813 SOCKET 0
> APIC 0 microcode 800820d [ 1217.915386] mce: [Hardware Error]: CPU 0:
> Machine Check: 0 Bank 4: 9c2041000000011b [ 1217.919088] mce:
> [Hardware Error]: TSC 0 ADDR 6d3d483b [ 1217.921662] mce: [Hardware
> Error]: PROCESSOR 2:800f82 TIME 1770041852 SOCKET 0 APIC 0 microcode
> 800820d [ 1238.395440] mce: [Hardware Error]: CPU 0: Machine Check: 0
> Bank 4: 9c2041000000011b [ 1238.399041] mce: [Hardware Error]: TSC 0
> ADDR 6d3d483b [ 1238.401619] mce: [Hardware Error]: PROCESSOR 2:800f82
> TIME 1770041872 SOCKET 0 APIC 0 microcode 800820d [ 1269.115368]
> mce_notify_irq: 4 callbacks suppressed [ 1269.117829] mce: [Hardware Error]:
> Machine check events logged [ 1269.120586] mce: [Hardware Error]: Machine
> check events logged [ 1269.123412] mce: [Hardware Error]: CPU 0: Machine
> Check: 0 Bank 4: 9c2041000000011b [ 1269.126950] mce: [Hardware Error]:
> TSC 0 ADDR 6d3d483b [ 1269.129511] mce: [Hardware Error]: PROCESSOR
> 2:800f82 TIME 1770041903 SOCKET 0 APIC 0 microcode 800820d
> 
> and then it started enlarging it again when I changed the injection interval to
> 300s.
> 
> [ 1578.363408] mce: [Hardware Error]: Machine check events logged
> [ 1578.366346] mce: [Hardware Error]: Machine check events logged
> [ 1578.369174] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:
> 9c2041000000011b [ 1578.372742] mce: [Hardware Error]: TSC 0 ADDR
> 6d3d483b [ 1578.375226] mce: [Hardware Error]: PROCESSOR 2:800f82 TIME
> 1770042212 SOCKET 0 APIC 0 microcode 800820d [ 2119.035460] mce:
> [Hardware Error]: Machine check events logged [ 2119.038432] mce:
> [Hardware Error]: Machine check events logged [ 2119.041236] mce:
> [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9c2041000000011b
> [ 2119.044846] mce: [Hardware Error]: TSC 0 ADDR 6d3d483b [ 2119.047340]
> mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1770042753 SOCKET 0
> APIC 0 microcode 800820d [ 2282.875491] mce: [Hardware Error]: Machine
> check events logged [ 2282.878409] mce: [Hardware Error]: Machine check
> events logged [ 2282.881277] mce: [Hardware Error]: CPU 0: Machine Check: 0
> Bank 4: 9c2041000000011b [ 2282.884978] mce: [Hardware Error]: TSC 0
> ADDR 6d3d483b [ 2282.887482] mce: [Hardware Error]: PROCESSOR 2:800f82
> TIME 1770042917 SOCKET 0 APIC 0 microcode 800820d [ 2512.251516] mce:
> [Hardware Error]: Machine check events logged [ 2512.254371] mce:
> [Hardware Error]: Machine check events logged [ 2512.257261] mce:
> [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9c2041000000011b
> [ 2512.260841] mce: [Hardware Error]: TSC 0 ADDR 6d3d483b [ 2512.263406]
> mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1770043146 SOCKET 0
> APIC 0 microcode 800820d
> 
> --
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2026-02-02 23:50 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-12  8:27 [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event lirongqing
2026-01-12  8:56 ` Nikolay Borisov
2026-01-12  9:36   ` 答复: [外部邮件] " Li,Rongqing
2026-01-12  9:51     ` Borislav Petkov
2026-01-12 10:24       ` 答复: " Li,Rongqing
2026-01-13  9:51         ` Borislav Petkov
     [not found]           ` <39cfb093256f4da78fe0bc9e814ce5d0@baidu.com>
2026-01-13 12:48             ` 答复: " Borislav Petkov
2026-01-13 18:53               ` Luck, Tony
2026-01-13 18:55                 ` Nikolay Borisov
2026-01-13 19:13                   ` Borislav Petkov
2026-01-13 19:25                     ` Nikolay Borisov
2026-01-13 19:33                       ` Borislav Petkov
2026-01-13 19:37                         ` Nikolay Borisov
2026-01-13 19:44                           ` Borislav Petkov
2026-01-13 19:51                             ` Nikolay Borisov
2026-01-13 20:33                               ` Borislav Petkov
2026-01-13 19:10                 ` Borislav Petkov
2026-01-13 19:31                 ` Nikolay Borisov
2026-01-13 20:30                 ` Thomas Gleixner
2026-01-13 20:56                 ` Borislav Petkov
2026-01-13 21:05                   ` Luck, Tony
2026-01-13 21:31                     ` Borislav Petkov
2026-01-13 22:41                       ` Borislav Petkov
2026-01-14  0:30                         ` Luck, Tony
2026-01-14 13:50                           ` Borislav Petkov
2026-01-14 14:48                             ` Borislav Petkov
2026-02-02 15:18                               ` Borislav Petkov
2026-02-02 23:49                                 ` Li,Rongqing [this message]
2026-02-06 22:03                                   ` 答复: [外部邮件] " Borislav Petkov
2026-02-07 11:51                             ` Borislav Petkov
2026-02-09 17:37                               ` Luck, Tony
2026-02-10 15:01                                 ` Borislav Petkov
2026-03-06  7:37                                   ` 答复: [外部邮件] " Li,Rongqing(ACG CCN)
2026-03-06 14:00                                     ` Borislav Petkov
2026-03-06 14:38                                       ` 答复: " Li,Rongqing(ACG CCN)
2026-03-06 15:29                                         ` Borislav Petkov
2026-03-07  1:18                                           ` 答复: " Li,Rongqing(ACG CCN)
2026-03-16 13:44                                             ` Borislav Petkov
2026-04-06 22:49                                               ` [PATCH] x86/mce: Restore MCA polling interval halving Borislav Petkov
2026-04-07 12:51                                                 ` Nikolay Borisov
2026-04-07 15:04                                                 ` Zhuo, Qiuxu
2026-04-14 21:18                                                   ` Borislav Petkov
2026-04-14 22:22                                                     ` Luck, Tony
2026-04-15 19:27                                                       ` Borislav Petkov
2026-04-15 19:53                                                         ` Luck, Tony
2026-04-15 20:02                                                           ` Borislav Petkov
2026-04-17 11:50                                                             ` Borislav Petkov
2026-04-20 14:14                                                               ` Zhuo, Qiuxu
2026-04-21 12:05                                                                 ` Borislav Petkov
2026-04-21 15:49                                                                   ` Zhuo, Qiuxu
2026-04-23 12:49                                                                     ` Borislav Petkov
2026-04-15 13:39                                                     ` Zhuo, Qiuxu
2026-04-15 19:35                                                       ` Borislav Petkov
2026-01-14  6:17                         ` [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event Nikolay Borisov
2026-01-14 13:52                           ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c967791348a4f4d815c7612a15eee15@baidu.com \
    --to=lirongqing@baidu.com \
    --cc=avadhut.naik@amd.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nik.borisov@suse.com \
    --cc=qiuxu.zhuo@intel.com \
    --cc=tglx@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox