From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>,
"Li,Rongqing(ACG CCN)" <lirongqing@baidu.com>,
Nikolay Borisov <nik.borisov@suse.com>,
"Thomas Gleixner" <tglx@kernel.org>,
Ingo Molnar <mingo@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"H . Peter Anvin" <hpa@zytor.com>,
Yazen Ghannam <yazen.ghannam@amd.com>,
Avadhut Naik <avadhut.naik@amd.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: Re: [PATCH] x86/mce: Restore MCA polling interval halving
Date: Tue, 14 Apr 2026 15:22:23 -0700 [thread overview]
Message-ID: <ad6-H5eihazhqsmC@agluck-desk3> (raw)
In-Reply-To: <20260414211803.GFad6vC9LSYxGScTNH@fat_crate.local>
On Tue, Apr 14, 2026 at 11:18:03PM +0200, Borislav Petkov wrote:
> On Tue, Apr 07, 2026 at 03:04:04PM +0000, Zhuo, Qiuxu wrote:
> > I injected a correctable error with the CMCI interrupt enabled on an Intel testing machine,
> > and this mce_early_notifier() was invoked. But the following code in mce_notify_irq() is now
> > never executed, and I didn't see the error log message "Machine check events logged".
>
> You did disable the CEC, right?
>
> In any case, let's have a look:
>
> When we log an MCE, we do:
>
> mce_log # add it to the genpool and run the works
> -> mce_irq_work
> -> mce_schedule_work
> -> ..
> -> mce_gen_pool_process # this'll send it down the notifier chain
> -> x86_mce_decoder_chain
> -> mce_early_notifier # that guy sees it here and issues the trace record
>
> Now, mce_notify_irq() would do mce_work_trigger() and issue the printk
> - dunno, I guess we still want our printk and probably should add it back
> - but the first one - the work triggering - that's mcelog. It is using that
> usermode helper gunk, dunno if you guys still need it.
>
> Because mcelog does register to the decoder chain so it'll get to see the MCE
> eventually. So that part is fine.
>
> The only question is the usermode helper gunk...
>
> Tony?
Ran my own test. RAS_CEC disabled. Booted with mce=no_cmci injected a
corrected error every twenty seconds. Added pr_info() to mce_timer_fn()
to say which CPUs were doubling or halving interval.
Results:
I did see some "Machine check events logged" console messages.
The debug messages are "interesting". Polling timers on CPUs aren't
synchronized, so I got random bursts of debug messages where some
CPUs found an error and halved their interval, while others didn't
see an error and doubled their interval. The machine check banks for
memory corrected errors are socket scoped, so when an error is logged
whichever CPU on the socket polls next will find the error.
Both mcelog and EDAC were invoked on the mce decode chain and logged
errors OK.
When I stopped injecting, all the CPUs doubled back up to maximum
polling interval.
Summary: This is working as well as can be expected given the shared
scope of the machine check banks. If Linux were to understand the
scope of machine check banks it might designate a single CPU in
that scope to do the polling. But Intel doesn't make it easy to derive
the scope. In any case, the common case is CMCI enabled.
-Tony
next prev parent reply other threads:[~2026-04-14 22:22 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-12 8:27 [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event lirongqing
2026-01-12 8:56 ` Nikolay Borisov
2026-01-12 9:36 ` 答复: [外部邮件] " Li,Rongqing
2026-01-12 9:51 ` Borislav Petkov
2026-01-12 10:24 ` 答复: " Li,Rongqing
2026-01-13 9:51 ` Borislav Petkov
[not found] ` <39cfb093256f4da78fe0bc9e814ce5d0@baidu.com>
2026-01-13 12:48 ` 答复: " Borislav Petkov
2026-01-13 18:53 ` Luck, Tony
2026-01-13 18:55 ` Nikolay Borisov
2026-01-13 19:13 ` Borislav Petkov
2026-01-13 19:25 ` Nikolay Borisov
2026-01-13 19:33 ` Borislav Petkov
2026-01-13 19:37 ` Nikolay Borisov
2026-01-13 19:44 ` Borislav Petkov
2026-01-13 19:51 ` Nikolay Borisov
2026-01-13 20:33 ` Borislav Petkov
2026-01-13 19:10 ` Borislav Petkov
2026-01-13 19:31 ` Nikolay Borisov
2026-01-13 20:30 ` Thomas Gleixner
2026-01-13 20:56 ` Borislav Petkov
2026-01-13 21:05 ` Luck, Tony
2026-01-13 21:31 ` Borislav Petkov
2026-01-13 22:41 ` Borislav Petkov
2026-01-14 0:30 ` Luck, Tony
2026-01-14 13:50 ` Borislav Petkov
2026-01-14 14:48 ` Borislav Petkov
2026-02-02 15:18 ` Borislav Petkov
2026-02-02 23:49 ` 答复: [外部邮件] " Li,Rongqing
2026-02-06 22:03 ` Borislav Petkov
2026-02-07 11:51 ` Borislav Petkov
2026-02-09 17:37 ` Luck, Tony
2026-02-10 15:01 ` Borislav Petkov
2026-03-06 7:37 ` 答复: [外部邮件] " Li,Rongqing(ACG CCN)
2026-03-06 14:00 ` Borislav Petkov
2026-03-06 14:38 ` 答复: " Li,Rongqing(ACG CCN)
2026-03-06 15:29 ` Borislav Petkov
2026-03-07 1:18 ` 答复: " Li,Rongqing(ACG CCN)
2026-03-16 13:44 ` Borislav Petkov
2026-04-06 22:49 ` [PATCH] x86/mce: Restore MCA polling interval halving Borislav Petkov
2026-04-07 12:51 ` Nikolay Borisov
2026-04-07 15:04 ` Zhuo, Qiuxu
2026-04-14 21:18 ` Borislav Petkov
2026-04-14 22:22 ` Luck, Tony [this message]
2026-04-15 19:27 ` Borislav Petkov
2026-04-15 19:53 ` Luck, Tony
2026-04-15 20:02 ` Borislav Petkov
2026-04-17 11:50 ` Borislav Petkov
2026-04-20 14:14 ` Zhuo, Qiuxu
2026-04-21 12:05 ` Borislav Petkov
2026-04-21 15:49 ` Zhuo, Qiuxu
2026-04-23 12:49 ` Borislav Petkov
2026-04-15 13:39 ` Zhuo, Qiuxu
2026-04-15 19:35 ` Borislav Petkov
2026-01-14 6:17 ` [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event Nikolay Borisov
2026-01-14 13:52 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ad6-H5eihazhqsmC@agluck-desk3 \
--to=tony.luck@intel.com \
--cc=avadhut.naik@amd.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lirongqing@baidu.com \
--cc=mingo@redhat.com \
--cc=nik.borisov@suse.com \
--cc=qiuxu.zhuo@intel.com \
--cc=tglx@kernel.org \
--cc=x86@kernel.org \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox