From: Thomas Gleixner <tglx@linutronix.de>
To: Chen Gong <gong.chen@linux.intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
tony.luck@intel.com, bp@amd64.org, x86@kernel.org,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [patch 2/2] x86: mce: Implement cmci poll mode for intel machines
Date: Tue, 5 Jun 2012 15:35:21 +0200 (CEST) [thread overview]
Message-ID: <alpine.LFD.2.02.1206051503560.3086@ionos> (raw)
In-Reply-To: <4FCDF1C8.9020007@linux.intel.com>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4000 bytes --]
On Tue, 5 Jun 2012, Chen Gong wrote:
> 于 2012/6/5 4:01, Thomas Gleixner 写道:
> > On Mon, 4 Jun 2012, Chen Gong wrote:
> > static void mce_timer_fn(unsigned long data)
> > {
> > ...
> > + /* Might have become 0 after CMCI storm subsided */
> > + if (iv) {
> > + t->expires = jiffies + iv;
> > + add_timer_on(t, smp_processor_id());
> > + }
> > +}
>
> I've found under some conditions, *t* is pending on the timer tree, so
> add_timer_on will crash the whole system. Furthermore, if this timer
How should that happen? This is the timer callback function, which is
called from the timer code when the timer expired. It CANNOT be
pending at that point. The add_timer_on() in mce_timer_kick() might
cause that BUG to trigger, but definitely not the one here in the
timer callback.
> function triggers "WARN_ON(smp_processor_id() != data);", this timer
What causes the timer to be added on the wrong CPU in the first place?
The WARN_ON meriliy detects it after the fact.
> will be added on the other CPU, which means it loses the chance to
> decrement *cmci_storm_on_cpus* to zero to reenable the CMCI. Maybe
> this situation happens seldomly, but once it happens, CMCI will never
> be actived again after it is disabled.
>
> > +void mce_timer_kick(unsigned long interval)
> > +{
> > + struct timer_list *t = &__get_cpu_var(mce_timer);
> > + unsigned long when = jiffies + interval;
> > + unsigned long iv = __this_cpu_read(mce_next_interval);
> > +
> > + if (time_before(when, t->expires) && timer_pending(t)) {
> > + mod_timer(t, when);
> > + } else if (!mce_next_interval) {
> > + t->expires = round_jiffies(jiffies + iv);
> > + add_timer_on(t, smp_processor_id());
>
> I've changed "else if (!mce_next_interval)" to "else if (iv)", but
else if (!iv) perhaps ?
> I still think it is not right. Imaging *when* is after t->expires and
> this timer is pending on the timer tree, so it will hit *else if*
> if iv is not zero(common situations), again, add_timer_on will trigger
> BUG_ON because this timer is pending.
See above. Aside of that I rewrote the code to be more clear. See
delta patch below.
> > static void intel_threshold_interrupt(void)
> > {
> > + if (cmci_storm_detect())
> > + return;
> > machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
> > mce_notify_irq();
> > }
>
> I think cmci_storm_detect should be placed in the machine_check_poll,
> not out of it. Because machine_check_poll it the core execution logic
> for CMCI handling, in the meanwhile, poll timer and mce-inject module
> call machine_check_poll at any time. If poll timer or mce-inject run
> too quickly, the CMCI handler has trouble. Whereas, if
> cmci_storm_detect is in the machine_check_poll, this kind of possibility
> can be avoid.
No, it's wrong to put it into machine_check_poll().
machine_check_poll() is a generic functionality which has nothing to
do with CMCI storms. That CMCI storm/poll stuff is intel specific and
therefor the detection logic is in the intel specific interrupt
handler and nowhere else.
Thanks,
tglx
---
arch/x86/kernel/cpu/mcheck/mce.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce.c
+++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1305,13 +1305,14 @@ void mce_timer_kick(unsigned long interv
unsigned long when = jiffies + interval;
unsigned long iv = __this_cpu_read(mce_next_interval);
- if (time_before(when, t->expires) && timer_pending(t)) {
- mod_timer(t, when);
- } else if (!mce_next_interval) {
- t->expires = round_jiffies(jiffies + iv);
+ if (timer_pending(t)) {
+ if (time_before(when, t->expires))
+ mod_timer(t, when);
+ } else {
+ t->expires = round_jiffies(jiffies + when);
add_timer_on(t, smp_processor_id());
}
- if (interval < iv)
+ if (interval > iv)
__this_cpu_write(mce_next_interval, iv);
}
next prev parent reply other threads:[~2012-06-05 13:35 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-24 17:54 [patch 0/2] x86: mce: Implement poll mode for CMCI Thomas Gleixner
2012-05-24 17:54 ` [patch 1/2] x86: mce Cleanup timer mess Thomas Gleixner
2012-05-25 6:16 ` Borislav Petkov
2012-06-04 2:22 ` Chen Gong
2012-06-04 18:14 ` Luck, Tony
2012-06-04 19:57 ` Thomas Gleixner
2012-06-04 22:18 ` Borislav Petkov
2012-05-24 17:54 ` [patch 2/2] x86: mce: Implement cmci poll mode for intel machines Thomas Gleixner
2012-05-25 6:24 ` Borislav Petkov
2012-05-25 7:31 ` Chen Gong
2012-05-25 9:20 ` Thomas Gleixner
2012-05-25 12:17 ` Thomas Gleixner
2012-05-28 9:47 ` Chen Gong
2012-05-28 9:52 ` Chen Gong
2012-06-04 2:37 ` Chen Gong
2012-06-04 20:01 ` Thomas Gleixner
2012-06-05 11:47 ` Chen Gong
2012-06-05 12:57 ` Borislav Petkov
2012-06-06 1:36 ` Chen Gong
2012-06-06 9:04 ` Borislav Petkov
2012-06-05 13:35 ` Thomas Gleixner [this message]
2012-06-06 7:21 ` Chen Gong
2012-06-06 9:18 ` Thomas Gleixner
2012-06-06 10:23 ` Thomas Gleixner
2012-06-06 12:24 ` Chen Gong
2012-06-06 12:27 ` Peter Zijlstra
2012-06-06 14:15 ` Thomas Gleixner
2012-06-06 14:46 ` Thomas Gleixner
2012-06-07 3:32 ` Chen Gong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.02.1206051503560.3086@ionos \
--to=tglx@linutronix.de \
--cc=bp@amd64.org \
--cc=gong.chen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox