Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Tony Luck <tony.luck@intel.com>, Sasha Levin <sashal@kernel.org>,
	bp@alien8.de, mchehab@kernel.org, linux-edac@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 6.14 10/28] EDAC/igen6: Skip absent memory controllers
Date: Fri, 30 May 2025 08:39:16 -0400	[thread overview]
Message-ID: <20250530123934.2574748-10-sashal@kernel.org> (raw)
In-Reply-To: <20250530123934.2574748-1-sashal@kernel.org>

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

[ Upstream commit 20e190b1c1fd88b21cc5106c12cfe6def5ab849d ]

Some BIOS versions may fuse off certain memory controllers and set the
registers of these absent memory controllers to ~0. The current igen6_edac
mistakenly enumerates these absent memory controllers and registers them
with the EDAC core.

Skip the absent memory controllers to avoid mistakenly enumerating them.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250408132455.489046-2-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

**YES** This commit should be backported to stable kernel trees. Here's
my extensive analysis: ## Bug Impact and User Experience This commit
fixes a real problem where **some BIOS versions fuse off certain memory
controllers** and set their registers to `~0` (all 1s). The current
igen6_edac driver mistakenly enumerates these absent memory controllers
and registers them with the EDAC core, which can lead to: 1. **Incorrect
memory controller enumeration** - showing non-existent memory
controllers to userspace 2. **Potential false error reporting** - trying
to read error logs from non-existent hardware 3. **Resource waste** -
allocating structures and registering devices for hardware that doesn't
exist ## Code Analysis - Minimal and Safe Changes Looking at the
specific code changes: ### 1. New Detection Function ```c +/* Check
whether the memory controller is absent. model/ prompt/ src/ target/
+static bool igen6_imc_absent(void __iomem *window) +{ + return
readl(window + MAD_INTER_CHANNEL_OFFSET) == ~0; +} ``` This is a
**simple, conservative check** that reads a known register
(`MAD_INTER_CHANNEL_OFFSET`) and checks if it returns all 1s (`~0`),
which is the standard pattern for absent/fused-off hardware. ### 2.
Modified Registration Logic The key change in `igen6_register_mcis()`:
```c + if (igen6_imc_absent(window)) { + iounmap(window); + edac_dbg(2,
"Skip absent mc%d\n", pmc); + continue; + } ``` This **safely skips**
absent controllers rather than trying to register them. The logic is: -
Map the memory controller's MMIO space - Check if it's absent (registers
return ~0) - If absent: unmap and continue to next controller - If
present: proceed with normal registration ### 3. Robust Error Handling
The code maintains proper resource cleanup and adds logging for when
fewer controllers are found than expected: ```c + if (lmc <
res_cfg->num_imc) + igen6_printk(KERN_WARNING, "Expected %d mcs, but
only %d detected.", + res_cfg->num_imc, lmc); ``` ## Comparison with
Similar Commits Looking at the historical examples: **Similar Commit #1
(YES)** - "EDAC/i10nm: Skip the absent memory controllers" has nearly
identical logic and was backported because it fixed absent memory
controller detection causing call traces. **Similar Commit #2 (YES)** -
A dependency fix that prevented build errors was backported. This commit
follows the same pattern as **Similar Commit #1** - it's a focused bug
fix for absent hardware detection with minimal risk. ## Why This Should
Be Backported ### 1. **Clear Bug Fix** This addresses a specific,
reproducible issue where BIOS configurations cause incorrect driver
behavior. ### 2. **Minimal Risk** - Small, contained change - Only
affects the memory controller enumeration path - Adds defensive checks
rather than changing core logic - Maintains backward compatibility ###
3. **Follows Established Patterns** The "check register for ~0" pattern
is well-established in kernel hardware detection and matches the
successful backport in Similar Commit #1. ### 4. **Hardware Support
Issue** This fixes support for legitimate hardware configurations where
BIOS has fused off memory controllers - these systems should work
correctly. ### 5. **Stable Tree Criteria Met** - ✅ Fixes important bug
affecting users - ✅ Small and contained change - ✅ No architectural
changes - ✅ Minimal regression risk - ✅ Confined to single driver
subsystem The commit represents exactly the type of targeted hardware
fix that stable trees are designed to include - it resolves a real-world
compatibility issue with a safe, minimal change.

 drivers/edac/igen6_edac.c | 78 +++++++++++++++++++++++++++++++--------
 1 file changed, 62 insertions(+), 16 deletions(-)

diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 595908af9e5c9..14692c2da6222 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -127,6 +127,7 @@
 
 static struct res_config {
 	bool machine_check;
+	/* The number of present memory controllers. */
 	int num_imc;
 	u32 imc_base;
 	u32 cmf_base;
@@ -1201,23 +1202,21 @@ static void igen6_check(struct mem_ctl_info *mci)
 		irq_work_queue(&ecclog_irq_work);
 }
 
-static int igen6_register_mci(int mc, u64 mchbar, struct pci_dev *pdev)
+/* Check whether the memory controller is absent. */
+static bool igen6_imc_absent(void __iomem *window)
+{
+	return readl(window + MAD_INTER_CHANNEL_OFFSET) == ~0;
+}
+
+static int igen6_register_mci(int mc, void __iomem *window, struct pci_dev *pdev)
 {
 	struct edac_mc_layer layers[2];
 	struct mem_ctl_info *mci;
 	struct igen6_imc *imc;
-	void __iomem *window;
 	int rc;
 
 	edac_dbg(2, "\n");
 
-	mchbar += mc * MCHBAR_SIZE;
-	window = ioremap(mchbar, MCHBAR_SIZE);
-	if (!window) {
-		igen6_printk(KERN_ERR, "Failed to ioremap 0x%llx\n", mchbar);
-		return -ENODEV;
-	}
-
 	layers[0].type = EDAC_MC_LAYER_CHANNEL;
 	layers[0].size = NUM_CHANNELS;
 	layers[0].is_virt_csrow = false;
@@ -1283,7 +1282,6 @@ static int igen6_register_mci(int mc, u64 mchbar, struct pci_dev *pdev)
 fail2:
 	edac_mc_free(mci);
 fail:
-	iounmap(window);
 	return rc;
 }
 
@@ -1309,6 +1307,56 @@ static void igen6_unregister_mcis(void)
 	}
 }
 
+static int igen6_register_mcis(struct pci_dev *pdev, u64 mchbar)
+{
+	void __iomem *window;
+	int lmc, pmc, rc;
+	u64 base;
+
+	for (lmc = 0, pmc = 0; pmc < NUM_IMC; pmc++) {
+		base   = mchbar + pmc * MCHBAR_SIZE;
+		window = ioremap(base, MCHBAR_SIZE);
+		if (!window) {
+			igen6_printk(KERN_ERR, "Failed to ioremap 0x%llx for mc%d\n", base, pmc);
+			rc = -ENOMEM;
+			goto out_unregister_mcis;
+		}
+
+		if (igen6_imc_absent(window)) {
+			iounmap(window);
+			edac_dbg(2, "Skip absent mc%d\n", pmc);
+			continue;
+		}
+
+		rc = igen6_register_mci(lmc, window, pdev);
+		if (rc)
+			goto out_iounmap;
+
+		/* Done, if all present MCs are detected and registered. */
+		if (++lmc >= res_cfg->num_imc)
+			break;
+	}
+
+	if (!lmc) {
+		igen6_printk(KERN_ERR, "No mc found.\n");
+		return -ENODEV;
+	}
+
+	if (lmc < res_cfg->num_imc)
+		igen6_printk(KERN_WARNING, "Expected %d mcs, but only %d detected.",
+			     res_cfg->num_imc, lmc);
+
+	return 0;
+
+out_iounmap:
+	iounmap(window);
+
+out_unregister_mcis:
+	igen6_unregister_mcis();
+
+	return rc;
+}
+
 static int igen6_mem_slice_setup(u64 mchbar)
 {
 	struct igen6_imc *imc = &igen6_pvt->imc[0];
@@ -1405,7 +1453,7 @@ static void opstate_set(struct res_config *cfg, const struct pci_device_id *ent)
 static int igen6_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	u64 mchbar;
-	int i, rc;
+	int rc;
 
 	edac_dbg(2, "\n");
 
@@ -1421,11 +1469,9 @@ static int igen6_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	opstate_set(res_cfg, ent);
 
-	for (i = 0; i < res_cfg->num_imc; i++) {
-		rc = igen6_register_mci(i, mchbar, pdev);
-		if (rc)
-			goto fail2;
-	}
+	rc = igen6_register_mcis(pdev, mchbar);
+	if (rc)
+		goto fail;
 
 	if (res_cfg->num_imc > 1) {
 		rc = igen6_mem_slice_setup(mchbar);
-- 
2.39.5


  parent reply	other threads:[~2025-05-30 12:39 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-30 12:39 [PATCH AUTOSEL 6.14 01/28] ACPICA: fix acpi operand cache leak in dswstate.c Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 02/28] ASoC: amd: yc: Add quirk for Lenovo Yoga Pro 7 14ASP9 Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 03/28] clocksource: Fix the CPUs' choice in the watchdog per CPU verification Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 04/28] power: supply: gpio-charger: Fix wakeup source leaks on device unbind Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 05/28] tools/nolibc: use intmax definitions from compiler Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 06/28] power: supply: collie: Fix wakeup source leaks on device unbind Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 07/28] mmc: Add quirk to disable DDR50 tuning Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 08/28] ACPICA: Avoid sequence overread in call to strncmp() Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 09/28] mmc: sdhci-esdhc-imx: Save tuning value when card stays powered in suspend Sasha Levin
2025-05-30 12:39 ` Sasha Levin [this message]
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 11/28] ASoC: tas2770: Power cycle amp on ISENSE/VSENSE change Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 12/28] ASoC: intel/sdw_utils: Assign initial value in asoc_sdw_rt_amp_spk_rtd_init() Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 13/28] ACPI: bus: Bail out if acpi_kobj registration fails Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 14/28] ACPI: Add missing prototype for non CONFIG_SUSPEND/CONFIG_X86 case Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 15/28] ACPICA: fix acpi parse and parseext cache leaks Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 16/28] ACPICA: Apply pack(1) to union aml_resource Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 17/28] ALSA: hda: cs35l41: Fix swapped l/r audio channels for Acer Helios laptops Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 18/28] power: supply: bq27xxx: Retrieve again when busy Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 19/28] pmdomain: core: Reset genpd->states to avoid freeing invalid data Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 20/28] ACPICA: utilities: Fix overflow check in vsnprintf() Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 21/28] platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 22/28] ASoC: tegra210_ahub: Add check to of_device_get_match_data() Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 23/28] Make 'cc-option' work correctly for the -Wno-xyzzy pattern Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 24/28] gpiolib: of: Add polarity quirk for s5m8767 Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 25/28] PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn() Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 26/28] tools/nolibc: use pselect6_time64 if available Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 27/28] power: supply: max17040: adjust thermal channel scaling Sasha Levin
2025-05-30 12:39 ` [PATCH AUTOSEL 6.14 28/28] ACPI: battery: negate current when discharging Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250530123934.2574748-10-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=qiuxu.zhuo@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox