From: Chen Gong <gong.chen@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
tony.luck@intel.com, bp@amd64.org, x86@kernel.org,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [patch 2/2] x86: mce: Implement cmci poll mode for intel machines
Date: Tue, 05 Jun 2012 19:47:20 +0800 [thread overview]
Message-ID: <4FCDF1C8.9020007@linux.intel.com> (raw)
In-Reply-To: <alpine.LFD.2.02.1206042158200.3086@ionos>
于 2012/6/5 4:01, Thomas Gleixner 写道:
> On Mon, 4 Jun 2012, Chen Gong wrote:
>>> +/*
>>> + * Ensure that the timer is firing in @interval from now.
>>> + */
>>> +void mce_timer_kick(unsigned long interval)
>>> +{
>>> + struct timer_list *t = &__get_cpu_var(mce_timer);
>>> + unsigned long when = jiffies + interval;
>>> + unsigned long iv = __this_cpu_read(mce_next_interval);
>>> +
>>> + if (time_before(when, t->expires) && timer_pending(t)) {
>>> + mod_timer(t, when);
>>> + } else if (!mce_next_interval) {
>>
>> Why using mce_next_interval, it is a per_cpu var, should be non-NULL
>> definitely, right? How about using *iv* here?
>
> iv is the thing to use. No idea why I typoed mce_next_interval into
> that.
>
>>> + t->expires = round_jiffies(jiffies + iv);
>>> + add_timer_on(t, smp_processor_id());
>>> + }
>>> + if (interval < iv)
>>> + __this_cpu_write(mce_next_interval, iv);
>>> }
>>
>> This code should be __this_cpu_write(mce_next_interval, interval);?
>
> Duh, yes.
>
> Thanks,
>
> tglx
>
Hi, Thomas
Besides above issues, I still have some other questions as below:
> static void mce_timer_fn(unsigned long data)
> {
> ...
> + /* Might have become 0 after CMCI storm subsided */
> + if (iv) {
> + t->expires = jiffies + iv;
> + add_timer_on(t, smp_processor_id());
> + }
> +}
I've found under some conditions, *t* is pending on the timer tree, so
add_timer_on will crash the whole system. Furthermore, if this timer
function triggers "WARN_ON(smp_processor_id() != data);", this timer
will be added on the other CPU, which means it loses the chance to
decrement *cmci_storm_on_cpus* to zero to reenable the CMCI. Maybe
this situation happens seldomly, but once it happens, CMCI will never
be actived again after it is disabled.
> +void mce_timer_kick(unsigned long interval)
> +{
> + struct timer_list *t = &__get_cpu_var(mce_timer);
> + unsigned long when = jiffies + interval;
> + unsigned long iv = __this_cpu_read(mce_next_interval);
> +
> + if (time_before(when, t->expires) && timer_pending(t)) {
> + mod_timer(t, when);
> + } else if (!mce_next_interval) {
> + t->expires = round_jiffies(jiffies + iv);
> + add_timer_on(t, smp_processor_id());
I've changed "else if (!mce_next_interval)" to "else if (iv)", but
I still think it is not right. Imaging *when* is after t->expires and
this timer is pending on the timer tree, so it will hit *else if*
if iv is not zero(common situations), again, add_timer_on will trigger
BUG_ON because this timer is pending.
> static void intel_threshold_interrupt(void)
> {
> + if (cmci_storm_detect())
> + return;
> machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
> mce_notify_irq();
> }
I think cmci_storm_detect should be placed in the machine_check_poll,
not out of it. Because machine_check_poll it the core execution logic
for CMCI handling, in the meanwhile, poll timer and mce-inject module
call machine_check_poll at any time. If poll timer or mce-inject run
too quickly, the CMCI handler has trouble. Whereas, if
cmci_storm_detect is in the machine_check_poll, this kind of possibility
can be avoid.
next prev parent reply other threads:[~2012-06-05 11:47 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-24 17:54 [patch 0/2] x86: mce: Implement poll mode for CMCI Thomas Gleixner
2012-05-24 17:54 ` [patch 1/2] x86: mce Cleanup timer mess Thomas Gleixner
2012-05-25 6:16 ` Borislav Petkov
2012-06-04 2:22 ` Chen Gong
2012-06-04 18:14 ` Luck, Tony
2012-06-04 19:57 ` Thomas Gleixner
2012-06-04 22:18 ` Borislav Petkov
2012-05-24 17:54 ` [patch 2/2] x86: mce: Implement cmci poll mode for intel machines Thomas Gleixner
2012-05-25 6:24 ` Borislav Petkov
2012-05-25 7:31 ` Chen Gong
2012-05-25 9:20 ` Thomas Gleixner
2012-05-25 12:17 ` Thomas Gleixner
2012-05-28 9:47 ` Chen Gong
2012-05-28 9:52 ` Chen Gong
2012-06-04 2:37 ` Chen Gong
2012-06-04 20:01 ` Thomas Gleixner
2012-06-05 11:47 ` Chen Gong [this message]
2012-06-05 12:57 ` Borislav Petkov
2012-06-06 1:36 ` Chen Gong
2012-06-06 9:04 ` Borislav Petkov
2012-06-05 13:35 ` Thomas Gleixner
2012-06-06 7:21 ` Chen Gong
2012-06-06 9:18 ` Thomas Gleixner
2012-06-06 10:23 ` Thomas Gleixner
2012-06-06 12:24 ` Chen Gong
2012-06-06 12:27 ` Peter Zijlstra
2012-06-06 14:15 ` Thomas Gleixner
2012-06-06 14:46 ` Thomas Gleixner
2012-06-07 3:32 ` Chen Gong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FCDF1C8.9020007@linux.intel.com \
--to=gong.chen@linux.intel.com \
--cc=bp@amd64.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.