From: Borislav Petkov <bp@alien8.de>
To: Tony Luck <tony.luck@gmail.com>
Cc: Havard Skinnemoen <hskinnemoen@google.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Ewout van Bekkum <ewout@google.com>
Subject: Re: [PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports.
Date: Fri, 18 Jul 2014 23:31:57 +0200 [thread overview]
Message-ID: <20140718213157.GB29366@pd.tnic> (raw)
In-Reply-To: <CA+8MBbLdD=MHVjQpUAsVhMJEfa3nxsnja6hDAytP_esRotF=Vg@mail.gmail.com>
On Fri, Jul 18, 2014 at 02:23:04PM -0700, Tony Luck wrote:
> On Thu, Jul 17, 2014 at 3:50 AM, Borislav Petkov <bp@alien8.de> wrote:
> > Well, maybe it is about time we tracked shared banks.
>
> For cpus that support CMCI and the MCi_CTL2 registers we do track
> sharing. Only one cpu gets to be the "owner" of a bank that supports
> CMCI (the first one to find it and set bit 30 in the CTL2 register).
>
> The test_bit() at the top of the loop in machine_check_poll() makes
> sure only the owner of a bank actually looks at it.
>
> for (i = 0; i < mca_cfg.banks; i++) {
> if (!mce_banks[i].ctl || !test_bit(i, *b))
> continue;
>
> If we don't have CMCI, then we don't have the CTL2 registers, and
> so have no way to find out which banks are shared.
Ah, so Havard's corrected explanation was this:
"I don't think we got the description right here. I think the real issue
here was machine check polls happening on multiple CPUs with shared
banks, all reporting the same MCEs. This is very reproducible when
booting with mce=no_cmci, since all CPUs will handle all banks, and
there's AFAICT no good way to identify shared banks without enabling
CMCI."
Remind me, why would one boot with mce=no_cmci at all, on a CMCI
machine?
> I'd be surprised if it was a problem in practice. If we have CMCI,
> then we limit the banks that we look at (and if we see a high rate
> of interrupts, then we turn off interrupts an poll).
>
> If we don't have CMCI, then we are polling at a pretty low rate
> (current code adjusts the rate higher if we are finding errors to
> log, but we don't let that rate rise forever ... cap is ~ 1HZ).
Right, it would be interesting to see how a huuge machine (4 sockets
with lotsa memory) behaves under a CMCI storm...
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
next prev parent reply other threads:[~2014-07-18 21:32 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-09 17:09 [PATCH 0/6] x86 mce fixes Havard Skinnemoen
2014-07-09 17:09 ` [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values Havard Skinnemoen
2014-07-09 19:17 ` Borislav Petkov
2014-07-09 21:24 ` Havard Skinnemoen
2014-07-10 9:01 ` Chen, Gong
2014-07-10 17:16 ` Havard Skinnemoen
2014-07-11 2:12 ` Chen, Gong
2014-07-10 11:42 ` Borislav Petkov
2014-07-10 17:51 ` Havard Skinnemoen
2014-07-10 18:55 ` Tony Luck
2014-07-10 22:45 ` Havard Skinnemoen
2014-07-11 15:35 ` Borislav Petkov
2014-07-11 18:56 ` Havard Skinnemoen
2014-07-11 20:10 ` Borislav Petkov
2014-07-11 20:39 ` Havard Skinnemoen
2014-07-14 14:57 ` Borislav Petkov
2014-07-11 20:22 ` Borislav Petkov
2014-07-12 0:10 ` Havard Skinnemoen
2014-07-14 15:14 ` Borislav Petkov
2014-07-11 20:36 ` Borislav Petkov
2014-07-11 21:05 ` Havard Skinnemoen
2014-07-09 17:09 ` [PATCH 2/6] x86-mce: Modify CMCI storm exit to reenable instead of rediscover banks Havard Skinnemoen
2014-07-09 20:20 ` Luck, Tony
2014-07-09 21:34 ` Havard Skinnemoen
2014-07-10 15:51 ` Borislav Petkov
2014-07-10 18:32 ` Havard Skinnemoen
2014-07-09 17:09 ` [PATCH 3/6] x86-mce: Clear CMCI enable on all claimed CMCI banks before reboot Havard Skinnemoen
2014-07-09 20:36 ` Luck, Tony
2014-07-09 21:40 ` Havard Skinnemoen
2014-07-10 16:24 ` Borislav Petkov
2014-07-10 16:33 ` Tony Luck
2014-07-10 17:56 ` Havard Skinnemoen
2014-07-10 18:27 ` Tony Luck
2014-07-10 18:30 ` Borislav Petkov
2014-07-09 17:09 ` [PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports Havard Skinnemoen
2014-07-09 20:35 ` Andi Kleen
2014-07-09 21:51 ` Havard Skinnemoen
2014-07-09 23:32 ` Luck, Tony
2014-07-10 8:16 ` Borislav Petkov
2014-07-09 20:47 ` Luck, Tony
2014-07-09 21:56 ` Havard Skinnemoen
2014-07-10 16:41 ` Borislav Petkov
2014-07-10 18:03 ` Havard Skinnemoen
2014-07-10 18:44 ` Borislav Petkov
2014-07-10 18:57 ` Tony Luck
2014-07-10 19:12 ` Borislav Petkov
2014-07-11 9:24 ` Borislav Petkov
2014-07-11 19:06 ` Tony Luck
2014-07-11 19:52 ` Borislav Petkov
2014-07-11 21:15 ` Havard Skinnemoen
2014-07-17 10:50 ` Borislav Petkov
2014-07-18 21:23 ` Tony Luck
2014-07-18 21:31 ` Borislav Petkov [this message]
2014-07-09 17:09 ` [PATCH 5/6] x86-mce: check if no_way_out applies before deciding not to clear MCE banks Havard Skinnemoen
2014-07-09 21:00 ` Luck, Tony
2014-07-09 23:00 ` Havard Skinnemoen
2014-07-09 23:27 ` Luck, Tony
2014-07-10 16:49 ` Borislav Petkov
2014-07-09 17:09 ` [PATCH 6/6] x86-mce: ensure the MCP timer is not already set in the mce_timer_fn Havard Skinnemoen
2014-07-09 21:04 ` Luck, Tony
2014-07-09 23:01 ` Havard Skinnemoen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140718213157.GB29366@pd.tnic \
--to=bp@alien8.de \
--cc=ewout@google.com \
--cc=hskinnemoen@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox