public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>, LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 09/13] x86/mce: Reenable CMCI banks when swiching back to interrupt mode
Date: Wed, 12 Aug 2015 18:29:41 +0200	[thread overview]
Message-ID: <1439396985-12812-10-git-send-email-bp@alien8.de> (raw)
In-Reply-To: <1439396985-12812-1-git-send-email-bp@alien8.de>

From: Xie XiuQi <xiexiuqi@huawei.com>

Zhang Liguang reported the following issue:

1) System detects a CMCI storm on the current CPU.

2) Kernel disables the CMCI interrupt on banks owned by the current CPU and
   switches to poll mode

3) After the CMCI storm subsides, kernel switches back to interrupt mode

4) We expect the system to reenable the CMCI interrupt on banks owned by
   the current CPU

   mce_intel_adjust_timer
   |-> cmci_reenable
       |-> cmci_discover     # owned banks are ignored here

static void cmci_discover(int banks)
	...
	for (i = 0; i < banks; i++) {
		...
		if (test_bit(i, owned))	# ownd banks is ignore here
			continue;

So convert cmci_storm_disable_banks() to cmci_toggle_interrupt_mode()
which controls whether to enable or disable CMCI interrupts with its
argument.

NB: We cannot clear the owned bit because the banks won't be polled,
otherwise. See

  27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")

for more info.

Reported-by: Zhang Liguang <zhangliguang@huawei.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Cc: <stable@vger.kernel.org>  # v3.15+
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: huawei.libin@huawei.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: rui.xiang@huawei.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1439347871-2702-1-git-send-email-xiexiuqi@huawei.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce_intel.c | 41 +++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index c5c003291861..1e8bb6c94f14 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -146,6 +146,27 @@ void mce_intel_hcpu_update(unsigned long cpu)
 	per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
 }
 
+static void cmci_toggle_interrupt_mode(bool on)
+{
+	unsigned long flags, *owned;
+	int bank;
+	u64 val;
+
+	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+	owned = this_cpu_ptr(mce_banks_owned);
+	for_each_set_bit(bank, owned, MAX_NR_BANKS) {
+		rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+
+		if (on)
+			val |= MCI_CTL2_CMCI_EN;
+		else
+			val &= ~MCI_CTL2_CMCI_EN;
+
+		wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+	}
+	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+}
+
 unsigned long cmci_intel_adjust_timer(unsigned long interval)
 {
 	if ((this_cpu_read(cmci_backoff_cnt) > 0) &&
@@ -175,7 +196,7 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval)
 		 */
 		if (!atomic_read(&cmci_storm_on_cpus)) {
 			__this_cpu_write(cmci_storm_state, CMCI_STORM_NONE);
-			cmci_reenable();
+			cmci_toggle_interrupt_mode(true);
 			cmci_recheck();
 		}
 		return CMCI_POLL_INTERVAL;
@@ -186,22 +207,6 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval)
 	}
 }
 
-static void cmci_storm_disable_banks(void)
-{
-	unsigned long flags, *owned;
-	int bank;
-	u64 val;
-
-	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
-	owned = this_cpu_ptr(mce_banks_owned);
-	for_each_set_bit(bank, owned, MAX_NR_BANKS) {
-		rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
-		val &= ~MCI_CTL2_CMCI_EN;
-		wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
-	}
-	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
-}
-
 static bool cmci_storm_detect(void)
 {
 	unsigned int cnt = __this_cpu_read(cmci_storm_cnt);
@@ -223,7 +228,7 @@ static bool cmci_storm_detect(void)
 	if (cnt <= CMCI_STORM_THRESHOLD)
 		return false;
 
-	cmci_storm_disable_banks();
+	cmci_toggle_interrupt_mode(false);
 	__this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE);
 	r = atomic_add_return(1, &cmci_storm_on_cpus);
 	mce_timer_kick(CMCI_STORM_INTERVAL);
-- 
2.5.0.rc2.28.g6003e7f


  parent reply	other threads:[~2015-08-12 16:29 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-12 16:29 [PATCH 00/13] x86/RAS queue for 4.3 Borislav Petkov
2015-08-12 16:29 ` [PATCH 01/13] x86/mce: Reuse one of the u16 padding fields in struct mce Borislav Petkov
2015-08-13 10:45   ` [tip:ras/core] x86/mce: Reuse one of the u16 padding fields in ' struct mce' tip-bot for Borislav Petkov
2015-08-12 16:29 ` [PATCH 02/13] x86/mce: Provide a lockless memory pool to save error records Borislav Petkov
2015-08-13 10:46   ` [tip:ras/core] " tip-bot for Chen, Gong
2015-08-12 16:29 ` [PATCH 03/13] x86/mce: Don't use percpu workqueues Borislav Petkov
2015-08-13 10:46   ` [tip:ras/core] " tip-bot for Chen, Gong
2015-08-12 16:29 ` [PATCH 04/13] x86/mce: Remove the MCE ring for Action Optional errors Borislav Petkov
2015-08-13 10:47   ` [tip:ras/core] " tip-bot for Chen, Gong
2015-08-12 16:29 ` [PATCH 05/13] x86/mce: Avoid potential deadlock due to printk() in MCE context Borislav Petkov
2015-08-13 10:47   ` [tip:ras/core] " tip-bot for Chen, Gong
2015-08-12 16:29 ` [PATCH 06/13] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
2015-08-13 10:47   ` [tip:ras/core] " tip-bot for Borislav Petkov
2015-08-12 16:29 ` [PATCH 07/13] x86/mce: Remove unused function declarations Borislav Petkov
2015-08-13 10:48   ` [tip:ras/core] " tip-bot for Ashok Raj
2015-08-12 16:29 ` [PATCH 08/13] x86/mce: Clear Local MCE opt-in before kexec Borislav Petkov
2015-08-13 10:48   ` [tip:ras/core] " tip-bot for Ashok Raj
2015-08-12 16:29 ` Borislav Petkov [this message]
2015-08-13 10:48   ` [tip:ras/core] x86/mce: Reenable CMCI banks when swiching back to interrupt mode tip-bot for Xie XiuQi
2015-08-12 16:29 ` [PATCH 10/13] RAS: Add a menuconfig option with descriptive text Borislav Petkov
2015-08-13 10:49   ` [tip:ras/core] " tip-bot for Borislav Petkov
2015-08-12 16:29 ` [PATCH 11/13] x86/mce: Rename rcu_dereference_check_mce() to mce_log_get_idx_check() Borislav Petkov
2015-08-13 10:49   ` [tip:ras/core] " tip-bot for Borislav Petkov
2015-08-12 16:29 ` [PATCH 12/13] x86/mce: Add a wrapper around mce_log() for injection Borislav Petkov
2015-08-13 10:49   ` [tip:ras/core] " tip-bot for Borislav Petkov
2015-08-12 16:29 ` [PATCH 13/13] x86/ras: Move AMD MCE injector to arch/x86/ras/ Borislav Petkov
2015-08-13 10:50   ` [tip:ras/core] " tip-bot for Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1439396985-12812-10-git-send-email-bp@alien8.de \
    --to=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox