Re: [PATCH] scsi: libsas: fix HA resume deadlock and hisi_sas disk-wake race

Linux SCSI subsystem development
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Xingui Yang" <yangxingui@huawei.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH] scsi: libsas: fix HA resume deadlock and hisi_sas disk-wake race
Date: Tue, 30 Jun 2026 07:19:34 +0000	[thread overview]
Message-ID: <20260630071935.A15C61F00A3A@smtp.kernel.org> (raw)
In-Reply-To: <20260630065157.421203-1-yangxingui@huawei.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] The `flush_workqueue(hisi_hba->wq)` placement is ineffective, failing to prevent the autosuspend race for newly discovered PHYs.

Pre-existing issues:
- [High] PHY down events are silently dropped during resume because `HISI_SAS_RESETTING_BIT` remains set, permanently leaving libsas state out of sync.
--

commit ce8bc2a47295d4675bf4efa43c713662a8cd5696
Author: Xingui Yang <yangxingui@huawei.com>

scsi: libsas: fix HA resume deadlock and hisi_sas disk-wake race

This patch resolves a deadlock during HA resume by moving the
PHYE_RESUME_TIMEOUT notification to after sas_drain_work(). It restores
the draining sas_resume_ha() behavior in hisi_sas and introduces a
flush_workqueue() to prevent an autosuspend race.

> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> index 213d5b5dea94f..901f508e8be77 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> @@ -5262,14 +5262,8 @@ static int _resume_v3_hw(struct device *device)
>  	}
>  	phys_init_v3_hw(hisi_hba);
>  
> -	/*
> -	 * If a directly-attached disk is removed during suspend, a deadlock
> -	 * may occur, as the PHYE_RESUME_TIMEOUT processing will require the
> -	 * hisi_hba->device to be active, which can only happen when resume
> -	 * completes. So don't wait for the HA event workqueue to drain upon
> -	 * resume.
> -	 */
> -	sas_resume_ha_no_sync(sha);
> +	sas_resume_ha(sha);
> +	flush_workqueue(hisi_hba->wq);

[Severity: High]
Does this flush_workqueue() placement miss newly queued discovery events?

sas_resume_ha() drains the libsas event_q and disco_q. By flushing hisi_hba->wq
after this drain, any hisi_sas_phyup_pm_work items on hisi_hba->wq will queue
PORTE_BYTES_DMAED events to event_q that bypass the drain.

Since hisi_sas_phyup_pm_work then calls pm_runtime_put_sync(), its PM reference
is dropped. When _resume_v3_hw() returns, the system PM reference is also
dropped, allowing the controller to autosuspend while the missed discovery
works execute asynchronously and attempt to send IO to a suspended controller.

Should the flush be placed before sas_resume_ha() so the generated events are
caught and synchronously processed by sas_drain_work()?

>  	clear_bit(HISI_SAS_RESETTING_BIT, &hisi_hba->flags);

[Severity: High]
This is a pre-existing issue, but are PHY down events silently dropped during
resume because HISI_SAS_RESETTING_BIT remains set?

In _resume_v3_hw(), HISI_SAS_RESETTING_BIT is kept active across the
sas_resume_ha() call. If a PHY loses its link during this time, the hardware
interrupt handler drops the event:

drivers/scsi/hisi_sas/hisi_sas_main.c:hisi_sas_phy_down() {
		if (test_bit(HISI_SAS_RESETTING_BIT, &hisi_hba->flags) ||
		    phy->in_reset) {
			dev_info(dev, "ignore flutter phy%d down\n", phy_no);
			return;
		}
}

When HISI_SAS_RESETTING_BIT is finally cleared here, it does not call
hisi_sas_rescan_topology() to recover dropped events. Could this permanently
leave libsas state out of sync if a PHY disconnects during the resume wait?

>  
>  	dev_warn(dev, "end of resuming controller\n");

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260630065157.421203-1-yangxingui@huawei.com?part=1

     prev parent reply	other threads:[~2026-06-30  7:19 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-30  6:51 [PATCH] scsi: libsas: fix HA resume deadlock and hisi_sas disk-wake race Xingui Yang
2026-06-30  7:19 ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260630071935.A15C61F00A3A@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=yangxingui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox