From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31B6C3A3E66; Mon, 20 Apr 2026 13:16:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776690960; cv=none; b=P8FTK65RgCwK/ugoOJydDLa7yI+/ldXDgH5U7mPSAiGP+DaCUzdlXXUn7VwNXf4ZZJRPTf95emR/3jgyffdXwDZ3ShP3XxDlYkJn/n2Ry677+L4386EmuVUVUWvwQ/xVWZ0SDRBNNVJQt5BSg/gO0l4hKBeJsOP3/KmQW4VkgaE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776690960; c=relaxed/simple; bh=7XmZ6akHRE7O9IB/dFcnB8pxMxs3OlsefPwJMEwNtH4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=YEINcgtF7hZVHz+ftmeiS9bdKwZfQXVcqwNdVxCDCrasVxOxLPxOgjQTHEXrGFC9/PWw26T2lqmhkprMhtwuFxWHWTURJLOxP5xUHN2dCgfpQpeI6BdFgzVw2zwoN0XQHCU0xEg5+qFqLMwif6pEA4J2+Q8tQHw3XZNC8NNYrSo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mkZkOUG9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mkZkOUG9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DC222C2BCB4; Mon, 20 Apr 2026 13:15:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776690960; bh=7XmZ6akHRE7O9IB/dFcnB8pxMxs3OlsefPwJMEwNtH4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mkZkOUG9F+sSVWM5v+lFTDXn2qvJxF8OLYcVku+dCs3CuWuZ5NfIb+LBf9CRX+136 QcV26ybDXpjfAZoOxrZo24F4h94RTJbVPlAiDqTft4CkbUcTOZIJ8iJq8giWouJRQ4 CZzh9EO6smzG/5UnyDtY4XZx0XjBdw5R5xwziDWmcutTGwLLuTYxWiueonf2QALSBA ey3RD6qJIuzXIFAl8GXPxlKO1Z1ZPhPlY4m70t7rkA1JWfVVln9d05qpyIM4mrp/P+ oFnUAG9/9uL118b825Wjzy5omCwsAqbsFq8BRSNefZ35gpGEagTHN6UYWjdZxy3DWC NsUhnC38Z4LmA== From: Sasha Levin To: patches@lists.linux.dev, stable@vger.kernel.org Cc: Luke Wang , Bough Chen , Adrian Hunter , Ulf Hansson , Sasha Levin , shawnguo@kernel.org, linux-imx@nxp.com, linux-mmc@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH AUTOSEL 7.0-6.18] mmc: sdhci-esdhc-imx: wait for data transfer completion before reset Date: Mon, 20 Apr 2026 09:07:59 -0400 Message-ID: <20260420131539.986432-13-sashal@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260420131539.986432-1-sashal@kernel.org> References: <20260420131539.986432-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 7.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Luke Wang [ Upstream commit 8ceb70c9f970bfbdceb1e51578850a60b9de2236 ] On IMX7ULP platforms, certain SD cards (e.g. Kingston Canvas Go! Plus) cause system hangs and reboots during manual tuning. These cards exhibit large gaps (~16us) between tuning command response and data transmission. When cmd CRC errors occur during tuning, the code assumes data errors even tuning data hasn't been fully received and then reset host data circuit. Per IMX7ULP reference manual, reset operations (RESET_DATA/ALL) need to make sure no active data transfers. Previously, resetting while data was in-flight would clear data circuit, including ADMA/SDMA address, causing data to be transmitted to incorrect memory address. This patch adds polling for data transfer completion before executing resets. Signed-off-by: Luke Wang Reviewed-by: Bough Chen Acked-by: Adrian Hunter Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin --- LLM Generated explanations, may be completely bogus: Now I have all the information needed for a comprehensive analysis. Let me compile the final assessment. --- ## PHASE 1: COMMIT MESSAGE FORENSICS ### Step 1.1: Subject Line - **Subsystem**: `mmc: sdhci-esdhc-imx:` (MMC host controller driver for NXP i.MX ESDHC) - **Action verb**: "wait" (defensive/preventive action) - **Summary**: Wait for data transfer completion before resetting the host data circuit to avoid system hang. Record: [mmc: sdhci-esdhc-imx] [wait/prevent] [Prevent system hang by ensuring data transfer is complete before reset] ### Step 1.2: Tags - **Signed-off-by**: Luke Wang (author, NXP engineer) - **Reviewed-by**: Bough Chen (NXP subsystem maintainer for sdhci-esdhc-imx, prolific contributor) - **Acked-by**: Adrian Hunter (senior SDHCI/MMC subsystem maintainer) - **Signed-off-by**: Ulf Hansson (MMC subsystem maintainer who merged this) - No Fixes: tag (expected for autosel candidates) - No Cc: stable (expected) - No Link: tag Record: Reviewed by NXP subsystem expert (haibo.chen), ACK'd by sdhci co-maintainer (Adrian Hunter), merged by MMC maintainer (Ulf Hansson). Strong review chain. ### Step 1.3: Body Analysis - **Bug**: On IMX7ULP platforms, certain SD cards (Kingston Canvas Go! Plus) cause **system hangs and reboots** during manual tuning. - **Root cause**: Large gaps (~16us) between tuning command response and data transmission. When CRC errors occur during tuning, code resets the host data circuit while data is still in-flight. - **Failure mechanism**: Per IMX7ULP reference manual, RESET_DATA/RESET_ALL must not be issued during active data transfer. Resetting while data is in-flight clears the data circuit including ADMA/SDMA address, causing **data to be transmitted to incorrect memory address**. - **Symptom**: System hang and reboot. Record: Critical bug - system hang/reboot. DMA address corruption from reset during active transfer. Hardware-documented requirement violated. Specific SD card makes the timing gap visible. ### Step 1.4: Hidden Bug Fix Detection This is NOT hidden - it's an explicit fix for system hangs. The commit clearly describes a hardware requirement (per reference manual) that was being violated, leading to DMA address corruption and system hangs. Record: Explicit bug fix - system hang prevention. Not a disguised fix. ## PHASE 2: DIFF ANALYSIS ### Step 2.1: Inventory - **Files changed**: 1 file (`drivers/mmc/host/sdhci-esdhc-imx.c`) - **Lines added**: ~18 (2 for the define, 16 for the polling logic) - **Lines removed**: 0 - **Functions modified**: `esdhc_reset()` - the core reset callback for this driver - **Scope**: Single-file, single-function, surgical fix Record: [sdhci-esdhc-imx.c +18/-0] [esdhc_reset() modified] [Single-file surgical fix] ### Step 2.2: Code Flow Change 1. **New define**: `ESDHC_DATA_INHIBIT_WAIT_US 100000` (100ms timeout) 2. **Before**: `esdhc_reset()` directly called `sdhci_and_cqhci_reset()` without checking data transfer state 3. **After**: Before reset, if the reset mask includes `SDHCI_RESET_DATA` or `SDHCI_RESET_ALL`, poll `ESDHC_PRSSTAT` register waiting for `SDHCI_DATA_INHIBIT` to clear (indicating no active data transfer). Timeout at 100ms with a warning. Then proceed to reset. Record: Added defensive wait-for-idle before data/full reset. 100ms timeout with warning on failure. Non-blocking (proceeds even on timeout). ### Step 2.3: Bug Mechanism Category: **Hardware workaround / DMA corruption fix** - The bug is a violation of hardware specification requirements (IMX7ULP reference manual) - Resetting while `SDHCI_DATA_INHIBIT` is set clears ADMA/SDMA addresses mid-transfer - Data goes to wrong memory address → system hang/reboot (effectively memory corruption) - The fix polls the Present State register bit 1 (DATA_INHIBIT) before issuing reset - Uses `readl_poll_timeout_atomic` with 2us polling interval and 100ms max wait Record: [HW requirement violation → DMA address corruption → system hang] [Fix: poll for data idle before reset] ### Step 2.4: Fix Quality - **Obviously correct**: Yes. The reference manual explicitly requires waiting. The pattern of polling ESDHC_PRSSTAT is already used twice in this driver (lines 471, 1028). - **Minimal/surgical**: Yes. Only adds the required wait before existing reset call. - **Regression risk**: Very low. On timeout, it warns but still proceeds with reset (graceful degradation). The 100ms timeout is generous. Using `readl_poll_timeout_atomic` is appropriate since reset can be called from interrupt context. - **Red flags**: None. Well-contained, uses established patterns from the same driver. Record: High quality fix. Uses existing driver patterns. Graceful timeout handling. Minimal regression risk. ## PHASE 3: GIT HISTORY INVESTIGATION ### Step 3.1: Blame - `esdhc_reset()` introduced in commit `0718e59ae259f7` by Russell King (2014-04-25), present since ~v3.16 - Modified by `fb1dec44c6750b` (Brian Norris, 2022-10-26) to use `sdhci_and_cqhci_reset`, present since v6.2 - The function has been stable in its current form since v6.2 Record: esdhc_reset() has existed since v3.16 (2014). Current form since v6.2. Bug has been present since the function was introduced - the hardware requirement was never respected. ### Step 3.2: No Fixes: tag present (expected). ### Step 3.3: File History Recent changes to the file are mostly tuning-related fixes (manual tuning, clock loopback, PM refactoring). The `esdhc_reset()` function itself hasn't been touched recently (last change was the cqhci fix in 2022). Record: No prerequisites identified. The fix is standalone. ### Step 3.4: Author Luke Wang (ziniu.wang_1@nxp.com) is a regular NXP contributor with 14+ commits in the MMC subsystem and sdhci-esdhc-imx driver specifically. He's contributed tuning improvements, PM refactoring, and other driver fixes. Record: Regular subsystem contributor from the hardware vendor (NXP). ### Step 3.5: Dependencies - Uses `readl_poll_timeout_atomic` from `` - already included in all stable versions - Uses `ESDHC_PRSSTAT` and `SDHCI_DATA_INHIBIT` - both already defined - Uses `SDHCI_RESET_DATA` and `SDHCI_RESET_ALL` - standard SDHCI defines - Only dependency: `sdhci_and_cqhci_reset` (present since v6.2). For v5.15, the function uses `sdhci_reset` instead - minor backport adjustment needed. Record: Fully standalone for v6.1+. Minor adjustment needed for v5.15 (different reset function name). All APIs/macros already available. ## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH ### Step 4.1-4.5 I was unable to find the specific mailing list thread for this patch via b4 dig (commit not in tree) or web searches. The patch was found indirectly via the "1-bit bus width" series which built on top of the file state after this patch was applied (blob `97461e20425d`). The commit has strong review signals: - **Reviewed-by** from Bough Chen (NXP maintainer of this driver, 30+ commits) - **Acked-by** from Adrian Hunter (SDHCI co-maintainer, 100+ SDHCI commits) - **Signed-off-by** from Ulf Hansson (MMC subsystem maintainer who merged it) Record: Could not find lore thread directly (commit not yet in tree). But review chain is complete: hardware vendor reviewer + SDHCI maintainer ACK + subsystem maintainer merge. ## PHASE 5: CODE SEMANTIC ANALYSIS ### Step 5.1: Functions Modified - `esdhc_reset()` - the `.reset` callback in `sdhci_esdhc_ops` ### Step 5.2: Callers `esdhc_reset` is called via `sdhci_do_reset()` (line 247 of sdhci.c) through the ops->reset function pointer. `sdhci_do_reset` is called from: - `sdhci_reset_for_all()` - init, suspend/resume paths (SDHCI_RESET_ALL) - `sdhci_reset_for_reason()` - error recovery, tuning abort, card removal, CQE recovery (SDHCI_RESET_CMD, SDHCI_RESET_DATA) - These are called from tuning abort, data error paths, card removal, CQE recovery, and initialization The fix specifically triggers on `SDHCI_RESET_DATA | SDHCI_RESET_ALL`, which covers error recovery (data errors, request errors) and full initialization. Record: Called from multiple critical paths - error recovery, tuning abort, card removal, init. High-traffic code path. ### Step 5.3-5.4: The affected code path is triggered during normal card operations (tuning, error recovery). Any user of an i.MX SDHCI host controller can trigger this. ### Step 5.5: Similar Patterns The Freescale ESDHC of-driver (`sdhci-of-esdhc.c`) has a separate `quirk_ignore_data_inhibit` for unreliable DATA_INHIBIT bits on some controllers. The `readl_poll_timeout` pattern is already used twice in this same driver for similar hardware waits. Record: Pattern is consistent with existing driver practices. ## PHASE 6: STABLE TREE ANALYSIS ### Step 6.1: Buggy Code in Stable Trees - `esdhc_reset()` exists in **all stable trees** (v5.15, v6.1, v6.6, v6.12, v6.19) - The bug has been present since the function was introduced in v3.16 (2014) - IMX7ULP support was added before v5.15 Record: Bug exists in ALL active stable trees. ### Step 6.2: Backport Complications - For v6.1, v6.6, v6.12, v6.19: Patch applies cleanly. `esdhc_reset()` is identical. - For v5.15: Minor adjustment needed - function calls `sdhci_reset()` instead of `sdhci_and_cqhci_reset()`, but the added code goes BEFORE that call, so it's unaffected. Record: Clean apply for v6.1+. Trivial adjustment for v5.15. ### Step 6.3: No related fixes already in stable for this issue. ## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT ### Step 7.1 - **Subsystem**: drivers/mmc/host - MMC host controller drivers - **Criticality**: IMPORTANT - MMC/SD cards are used for storage on embedded platforms, IoT devices, and Android devices running i.MX SoCs. System hangs on these platforms = production device failure. ### Step 7.2 The sdhci-esdhc-imx driver is actively maintained by NXP engineers. 28 changes between v6.6 and v6.19. ## PHASE 8: IMPACT AND RISK ASSESSMENT ### Step 8.1: Affected Population - Users of NXP i.MX SoCs with SDHCI host controllers (IMX7ULP specifically named, but the fix applies to all i.MX ESDHC variants) - Embedded/IoT devices, industrial controllers, automotive platforms using NXP i.MX chips - The bug is triggered with specific SD cards (Kingston Canvas Go! Plus mentioned) during tuning ### Step 8.2: Trigger Conditions - Occurs during SD card tuning (happens on card initialization/re- initialization) - Triggered when CRC errors occur during tuning while data has gaps in transmission - Not every card triggers it - depends on card timing characteristics - Can happen on any boot/card insertion with affected cards ### Step 8.3: Failure Mode Severity - **System hang and reboot** = CRITICAL - DMA writes to incorrect memory address = potential **memory corruption** - The reset clears ADMA/SDMA addresses, so DMA writes to address 0 or stale address - This is a hardware-documented requirement violation Record: CRITICAL severity. System hang, reboot, potential memory corruption. ### Step 8.4: Risk-Benefit Ratio - **BENEFIT**: HIGH - prevents system hangs/reboots on NXP i.MX platforms with certain SD cards - **RISK**: VERY LOW - ~18 lines added, single function, single file - Uses existing patterns from the same driver - Graceful timeout (warning + proceed) prevents any new hangs from the fix itself - `readl_poll_timeout_atomic` is safe for all calling contexts - Only adds a wait before an existing operation Record: HIGH benefit / VERY LOW risk. Strongly favorable ratio. ## PHASE 9: FINAL SYNTHESIS ### Step 9.1: Evidence Summary **FOR backporting:** 1. Fixes **system hangs and reboots** (CRITICAL severity) 2. Fixes **DMA address corruption** from violating hardware specification 3. Small, surgical fix (~18 lines in one function, one file) 4. Uses existing patterns from the same driver (`readl_poll_timeout`) 5. Reviewed by NXP driver maintainer (haibo.chen), ACK'd by SDHCI co- maintainer (Adrian Hunter), merged by MMC maintainer (Ulf Hansson) 6. Author is NXP engineer with deep knowledge of the hardware 7. Bug exists in ALL active stable trees (code unchanged since v6.2) 8. Patch applies cleanly to v6.1+ with no modifications needed 9. Graceful degradation on timeout (warn + continue) 10. References hardware reference manual as justification **AGAINST backporting:** - No concrete signals against. The fix is well-contained and low-risk. **UNRESOLVED:** - Could not find the original lore thread (commit appears very recent/not yet merged) - No Fixes: tag identifying original buggy commit (but bug has existed since 2014) ### Step 9.2: Stable Rules Checklist 1. **Obviously correct and tested?** YES - follows hardware manual requirement, reviewed by 3 maintainers, uses established driver patterns 2. **Fixes a real bug?** YES - system hangs and reboots on real hardware with real SD cards 3. **Important issue?** YES - system hang, reboot, DMA corruption = CRITICAL 4. **Small and contained?** YES - ~18 lines, single function, single file 5. **No new features/APIs?** CORRECT - no new features, just defensive hardware wait 6. **Can apply to stable?** YES - applies cleanly to v6.1+, minor adjustment for v5.15 ### Step 9.3: Exception Categories Not needed - this meets standard stable criteria as a critical bug fix. ### Step 9.4: Decision This is a clear YES. It fixes a **critical** system hang/reboot caused by violating a hardware-documented requirement, with a small, surgical, well-reviewed patch that carries minimal regression risk. ## Verification - [Phase 1] Parsed tags: Reviewed-by: haibo.chen (NXP), Acked-by: Adrian Hunter (SDHCI maintainer), SOB: Ulf Hansson (MMC maintainer) - [Phase 2] Diff analysis: ~18 lines added to `esdhc_reset()`, adds `readl_poll_timeout_atomic` for DATA_INHIBIT before reset - [Phase 2] Verified `readl_poll_timeout_atomic` is defined in `include/linux/iopoll.h` (line 230) - [Phase 2] Verified `ESDHC_PRSSTAT` defined at offset 0x24 in `sdhci- esdhc.h` (line 34), same as `SDHCI_PRESENT_STATE` - [Phase 2] Verified `SDHCI_DATA_INHIBIT` is 0x00000002 at `sdhci.h:82` - [Phase 3] git blame: `esdhc_reset()` introduced in `0718e59ae259f7` (2014), modified by `fb1dec44c6750b` (2022) - [Phase 3] Confirmed `sdhci_and_cqhci_reset` present in v6.1 and v6.6 via `git merge-base --is-ancestor` - [Phase 3] Confirmed `esdhc_reset()` in v6.1 and v6.6 is identical to current mainline (before this patch) - [Phase 3] Author Luke Wang has 14+ commits in MMC subsystem, regular NXP contributor - [Phase 3] No prerequisites identified - patch is standalone - [Phase 4] b4 dig could not find thread (commit not in tree). Web searches confirmed author's other patches and expertise. - [Phase 5] `esdhc_reset` called via `sdhci_do_reset()` from error recovery, tuning abort, init, CQE recovery paths - [Phase 5] `readl_poll_timeout` pattern already used twice in this driver (lines 471, 1028) - [Phase 6] Confirmed `esdhc_reset()` exists identically in v6.1 and v6.6 stable trees - [Phase 6] Confirmed `` is included in v6.1 (verified directly) - [Phase 6] For v5.15, `esdhc_reset` uses `sdhci_reset()` instead - minor backport adjustment needed - [Phase 7] IMX7ULP support (`usdhc_imx7ulp_data`) present in driver, confirmed at line 324 - [Phase 8] Failure mode: system hang + reboot + DMA address corruption → CRITICAL severity - UNVERIFIED: Could not access the original lore thread for this specific patch **YES** drivers/mmc/host/sdhci-esdhc-imx.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c index a7a5df673b0f6..97461e20425d8 100644 --- a/drivers/mmc/host/sdhci-esdhc-imx.c +++ b/drivers/mmc/host/sdhci-esdhc-imx.c @@ -216,6 +216,8 @@ #define ESDHC_FLAG_DUMMY_PAD BIT(19) #define ESDHC_AUTO_TUNING_WINDOW 3 +/* 100ms timeout for data inhibit */ +#define ESDHC_DATA_INHIBIT_WAIT_US 100000 enum wp_types { ESDHC_WP_NONE, /* no WP, neither controller nor gpio */ @@ -1453,6 +1455,22 @@ static void esdhc_set_uhs_signaling(struct sdhci_host *host, unsigned timing) static void esdhc_reset(struct sdhci_host *host, u8 mask) { + u32 present_state; + int ret; + + /* + * For data or full reset, ensure any active data transfer completes + * before resetting to avoid system hang. + */ + if (mask & (SDHCI_RESET_DATA | SDHCI_RESET_ALL)) { + ret = readl_poll_timeout_atomic(host->ioaddr + ESDHC_PRSSTAT, present_state, + !(present_state & SDHCI_DATA_INHIBIT), 2, + ESDHC_DATA_INHIBIT_WAIT_US); + if (ret == -ETIMEDOUT) + dev_warn(mmc_dev(host->mmc), + "timeout waiting for data transfer completion\n"); + } + sdhci_and_cqhci_reset(host, mask); sdhci_writel(host, host->ier, SDHCI_INT_ENABLE); -- 2.53.0