public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Brian Kao <powenkao@google.com>,
	Bart Van Assche <bvanassche@acm.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.1 28/72] scsi: ufs: core: Fix EH failure after W-LUN resume error
Date: Thu, 15 Jan 2026 17:48:38 +0100	[thread overview]
Message-ID: <20260115164144.515755898@linuxfoundation.org> (raw)
In-Reply-To: <20260115164143.482647486@linuxfoundation.org>

6.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Kao <powenkao@google.com>

[ Upstream commit b4bb6daf4ac4d4560044ecdd81e93aa2f6acbb06 ]

When a W-LUN resume fails, its parent devices in the SCSI hierarchy,
including the scsi_target, may be runtime suspended. Subsequently, the
error handler in ufshcd_recover_pm_error() fails to set the W-LUN device
back to active because the parent target is not active.  This results in
the following errors:

  google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ...
  ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000
  ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5
  ...
  ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active

Address this by:

 1. Ensuring the W-LUN's parent scsi_target is runtime resumed before
    attempting to set the W-LUN to active within
    ufshcd_recover_pm_error().

 2. Explicitly checking for power.runtime_error on the HBA and W-LUN
    devices before calling pm_runtime_set_active() to clear the error
    state.

 3. Adding pm_runtime_get_sync(hba->dev) in
    ufshcd_err_handling_prepare() to ensure the HBA itself is active
    during error recovery, even if a child device resume failed.

These changes ensure the device power states are managed correctly
during error recovery.

Signed-off-by: Brian Kao <powenkao@google.com>
Tested-by: Brian Kao <powenkao@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251112063214.1195761-1-powenkao@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/ufs/core/ufshcd.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 2435ea7ec089b..d553169b4d9ad 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6154,6 +6154,11 @@ static void ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend)
 
 static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
 {
+	/*
+	 * A WLUN resume failure could potentially lead to the HBA being
+	 * runtime suspended, so take an extra reference on hba->dev.
+	 */
+	pm_runtime_get_sync(hba->dev);
 	ufshcd_rpm_get_sync(hba);
 	if (pm_runtime_status_suspended(&hba->ufs_device_wlun->sdev_gendev) ||
 	    hba->is_sys_suspended) {
@@ -6194,6 +6199,7 @@ static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
 	if (ufshcd_is_clkscaling_supported(hba))
 		ufshcd_clk_scaling_suspend(hba, false);
 	ufshcd_rpm_put(hba);
+	pm_runtime_put(hba->dev);
 }
 
 static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
@@ -6208,28 +6214,42 @@ static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
 #ifdef CONFIG_PM
 static void ufshcd_recover_pm_error(struct ufs_hba *hba)
 {
+	struct scsi_target *starget = hba->ufs_device_wlun->sdev_target;
 	struct Scsi_Host *shost = hba->host;
 	struct scsi_device *sdev;
 	struct request_queue *q;
-	int ret;
+	bool resume_sdev_queues = false;
 
 	hba->is_sys_suspended = false;
+
 	/*
-	 * Set RPM status of wlun device to RPM_ACTIVE,
-	 * this also clears its runtime error.
+	 * Ensure the parent's error status is cleared before proceeding
+	 * to the child, as the parent must be active to activate the child.
 	 */
-	ret = pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev);
+	if (hba->dev->power.runtime_error) {
+		/* hba->dev has no functional parent thus simplily set RPM_ACTIVE */
+		pm_runtime_set_active(hba->dev);
+		resume_sdev_queues = true;
+	}
+
+	if (hba->ufs_device_wlun->sdev_gendev.power.runtime_error) {
+		/*
+		 * starget, parent of wlun, might be suspended if wlun resume failed.
+		 * Make sure parent is resumed before set child (wlun) active.
+		 */
+		pm_runtime_get_sync(&starget->dev);
+		pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev);
+		pm_runtime_put_sync(&starget->dev);
+		resume_sdev_queues = true;
+	}
 
-	/* hba device might have a runtime error otherwise */
-	if (ret)
-		ret = pm_runtime_set_active(hba->dev);
 	/*
 	 * If wlun device had runtime error, we also need to resume those
 	 * consumer scsi devices in case any of them has failed to be
 	 * resumed due to supplier runtime resume failure. This is to unblock
 	 * blk_queue_enter in case there are bios waiting inside it.
 	 */
-	if (!ret) {
+	if (resume_sdev_queues) {
 		shost_for_each_device(sdev, shost) {
 			q = sdev->request_queue;
 			if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
-- 
2.51.0




  parent reply	other threads:[~2026-01-15 17:11 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-15 16:48 [PATCH 6.1 00/72] 6.1.161-rc1 review Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 01/72] atm: Fix dma_free_coherent() size Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 02/72] net: 3com: 3c59x: fix possible null dereference in vortex_probe1() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 03/72] btrfs: always detect conflicting inodes when logging inode refs Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 04/72] mei: me: add nova lake point S DID Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 05/72] lib/crypto: aes: Fix missing MMU protection for AES S-box Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 06/72] counter: interrupt-cnt: Drop IRQF_NO_THREAD flag Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 07/72] drm/pl111: Fix error handling in pl111_amba_probe Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 08/72] gpio: rockchip: mark the GPIO controller as sleeping Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 09/72] wifi: avoid kernel-infoleak from struct iw_point Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 10/72] libceph: prevent potential out-of-bounds reads in handle_auth_done() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 11/72] libceph: replace overzealous BUG_ON in osdmap_apply_incremental() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 12/72] libceph: make free_choose_arg_map() resilient to partial allocation Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 13/72] libceph: return the handler error from mon_handle_auth_done() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 14/72] libceph: make calc_target() set t->paused, not just clear it Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 15/72] ext4: introduce ITAIL helper Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 16/72] ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 17/72] net: Add locking to protect skb->dev access in ip_output Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 18/72] tls: Use __sk_dst_get() and dst_dev_rcu() in get_netdev_for_sock() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 19/72] csky: fix csky_cmpxchg_fixup not working Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 20/72] ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 21/72] alpha: dont reference obsolete termio struct for TC* constants Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 22/72] NFSv4: ensure the open stateid seqid doesnt go backwards Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 23/72] NFS: Fix up the automount fs_context to use the correct cred Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 24/72] smb/client: fix NT_STATUS_UNABLE_TO_FREE_VM value Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 25/72] smb/client: fix NT_STATUS_DEVICE_DOOR_OPEN value Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 26/72] smb/client: fix NT_STATUS_NO_DATA_DETECTED value Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 27/72] scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset Greg Kroah-Hartman
2026-01-15 16:48 ` Greg Kroah-Hartman [this message]
2026-01-15 16:48 ` [PATCH 6.1 29/72] scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 30/72] arm64: dts: add off-on-delay-us for usdhc2 regulator Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 31/72] ARM: dts: imx6q-ba16: fix RTC interrupt level Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 32/72] arm64: dts: imx8mp: Fix LAN8740Ai PHY reference clock on DH electronics i.MX8M Plus DHCOM Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 33/72] netfilter: nft_synproxy: avoid possible data-race on update operation Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 34/72] netfilter: nf_tables: fix memory leak in nf_tables_newrule() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 35/72] netfilter: nf_conncount: update last_gc only when GC has been performed Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 36/72] net: marvell: prestera: fix NULL dereference on devlink_alloc() failure Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 37/72] bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 38/72] net: mscc: ocelot: Fix crash when adding interface under a lag Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 39/72] inet: ping: Fix icmp out counting Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 40/72] net: sock: fix hardened usercopy panic in sock_recv_errqueue Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 41/72] netdev: preserve NETIF_F_ALL_FOR_ALL across TSO updates Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 42/72] net/mlx5e: Dont print error message due to invalid module Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 43/72] net: wwan: iosm: Fix memory leak in ipc_mux_deinit() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 44/72] eth: bnxt: move and rename reset helpers Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 45/72] bnxt_en: Fix potential data corruption with HW GRO/LRO Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 46/72] net: fix memory leak in skb_segment_list for GRO packets Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 47/72] HID: quirks: work around VID/PID conflict for appledisplay Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 48/72] net/sched: sch_qfq: Fix NULL deref when deactivating inactive aggregate in qfq_reset Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 49/72] net: usb: pegasus: fix memory leak in update_eth_regs_async() Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 50/72] net: enetc: fix build warning when PAGE_SIZE is greater than 128K Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 51/72] arp: do not assume dev_hard_header() does not change skb->head Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 52/72] pinctrl: qcom: lpass-lpi: mark the GPIO controller as sleeping Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 53/72] mm/pagewalk: add walk_page_range_vma() Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 54/72] ksm: use range-walk function to jump over holes in scan_get_next_rmap_item Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 55/72] ALSA: ac97bus: Use guard() for mutex locks Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 56/72] ALSA: ac97: fix a double free in snd_ac97_controller_register() Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 57/72] nfsd: provide locking for v4_end_grace Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 58/72] NFS: trace: show TIMEDOUT instead of 0x6e Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 59/72] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 60/72] NFSD: Remove NFSERR_EAGAIN Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 61/72] bpf: Fix an issue in bpf_prog_test_run_xdp when page size greater than 4K Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 62/72] bpf: Make variables in bpf_prog_test_run_xdp less confusing Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 63/72] bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 64/72] bpf, test_run: Subtract size of xdp_frame from allowed metadata size Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 65/72] bpf: Fix reference count leak in bpf_prog_test_run_xdp() Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 66/72] powercap: fix race condition in register_control_type() Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 67/72] powercap: fix sscanf() error return value handling Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 68/72] can: j1939: make j1939_session_activate() fail if device is no longer registered Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 69/72] ASoC: amd: yc: Add quirk for Honor MagicBook X16 2025 Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 70/72] ASoC: fsl_sai: Add missing registers to cache default Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 71/72] scsi: sg: Fix occasional bogus elapsed time that exceeds timeout Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 72/72] bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path Greg Kroah-Hartman
2026-01-15 19:15 ` [PATCH 6.1 00/72] 6.1.161-rc1 review Brett A C Sheffield
2026-01-15 19:36 ` Slade Watkins
2026-01-15 22:14 ` Florian Fainelli
2026-01-16  7:55 ` Francesco Dolcini
2026-01-16  9:14 ` Peter Schneider
2026-01-16 10:22 ` Ron Economos
2026-01-16 10:33 ` Jon Hunter
2026-01-16 15:39 ` Mark Brown
2026-01-16 17:44 ` Hardik Garg
2026-01-16 19:28 ` Shuah Khan
2026-01-17 14:35 ` Miguel Ojeda
2026-01-19 11:10 ` Jeffrin Thalakkottoor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260115164144.515755898@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bvanassche@acm.org \
    --cc=martin.petersen@oracle.com \
    --cc=patches@lists.linux.dev \
    --cc=powenkao@google.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox