From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33FC9382F35 for ; Tue, 30 Jun 2026 07:19:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782803977; cv=none; b=HaY5DW6ZtKuRqtWFhI51dV9vIoUZUZ8MW25exawDk74SB8Cb4ElpfH0Ztppowgkzbs69BOkR6RmQpvJdndcEUrF4yJm6MjCh9x3l4gnL6rig9H56dmmc0tqrBAQHGdkt9tFqsN5xurYTMiHfGcWi//yWsaJAS+36IhejVrmIbjE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782803977; c=relaxed/simple; bh=6EK1xRgNk+RM6BEx8bbc26YBDjEyd2oNC6cMGcOoulY=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=KENb6mz0cnR16T9PQiGalUPCq9V6MR07fVMZzFTi2UuSLc1Hp8dcMuk2I4Got5YrEpydmW89yi5oUQtM3udA62L1im8wteFhQcXCqX92jhlE81tTXfo1kzVdORmqGtk9bB71I5OHBEVoNS0miJWyXy6VMV6mRQZlNGpkIFmaAtw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XLmBOoVr; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XLmBOoVr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A15C61F00A3A; Tue, 30 Jun 2026 07:19:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782803975; bh=fyWv7VVAdpPunVvgKXDheXxQgOrf6xDTEAH2tI0ZPuw=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=XLmBOoVr54DVhAEn+t2h5V8Gwx5uKbOrn4R+RUFJVBh1SWnjrAUhoTab/qFNhL26i EgHtR2TOImzcyVPud8u1SRnOdrbPXTl+7qrhm0fCClCHdOO4VPdqb7++YyHR9dyCO/ 2WjVCDe8FsgjQDoaIYVQLcQQKEvw+7SlEVuMUhVhxppLIujv06YtSBu1CHw+/n/58P QLi6CMk3upsWCQ0vA6dUodNEuVMScAOpgGBs9TacRp/NQ2lbSSyml407hptsVhGL6H 3ydS8TAV5ncX3DDhyFBsjjmEkzlfkT1wQnF+V4V947ELyg6y9SQG9EADKuULasgSVq tdmOLyiVQ8ioQ== From: sashiko-bot@kernel.org Subject: Re: [PATCH] scsi: libsas: fix HA resume deadlock and hisi_sas disk-wake race Reply-To: sashiko-reviews@lists.linux.dev To: "Xingui Yang" Cc: linux-scsi@vger.kernel.org In-Reply-To: <20260630065157.421203-1-yangxingui@huawei.com> References: <20260630065157.421203-1-yangxingui@huawei.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 30 Jun 2026 07:19:34 +0000 Message-Id: <20260630071935.A15C61F00A3A@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 2 potential issue(= s) to consider: New issues: - [High] The `flush_workqueue(hisi_hba->wq)` placement is ineffective, fail= ing to prevent the autosuspend race for newly discovered PHYs. Pre-existing issues: - [High] PHY down events are silently dropped during resume because `HISI_S= AS_RESETTING_BIT` remains set, permanently leaving libsas state out of sync. -- commit ce8bc2a47295d4675bf4efa43c713662a8cd5696 Author: Xingui Yang scsi: libsas: fix HA resume deadlock and hisi_sas disk-wake race This patch resolves a deadlock during HA resume by moving the PHYE_RESUME_TIMEOUT notification to after sas_drain_work(). It restores the draining sas_resume_ha() behavior in hisi_sas and introduces a flush_workqueue() to prevent an autosuspend race. > diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_s= as/hisi_sas_v3_hw.c > index 213d5b5dea94f..901f508e8be77 100644 > --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c > +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c > @@ -5262,14 +5262,8 @@ static int _resume_v3_hw(struct device *device) > } > phys_init_v3_hw(hisi_hba); > =20 > - /* > - * If a directly-attached disk is removed during suspend, a deadlock > - * may occur, as the PHYE_RESUME_TIMEOUT processing will require the > - * hisi_hba->device to be active, which can only happen when resume > - * completes. So don't wait for the HA event workqueue to drain upon > - * resume. > - */ > - sas_resume_ha_no_sync(sha); > + sas_resume_ha(sha); > + flush_workqueue(hisi_hba->wq); [Severity: High] Does this flush_workqueue() placement miss newly queued discovery events? sas_resume_ha() drains the libsas event_q and disco_q. By flushing hisi_hba= ->wq after this drain, any hisi_sas_phyup_pm_work items on hisi_hba->wq will que= ue PORTE_BYTES_DMAED events to event_q that bypass the drain. Since hisi_sas_phyup_pm_work then calls pm_runtime_put_sync(), its PM refer= ence is dropped. When _resume_v3_hw() returns, the system PM reference is also dropped, allowing the controller to autosuspend while the missed discovery works execute asynchronously and attempt to send IO to a suspended controll= er. Should the flush be placed before sas_resume_ha() so the generated events a= re caught and synchronously processed by sas_drain_work()? > clear_bit(HISI_SAS_RESETTING_BIT, &hisi_hba->flags); [Severity: High] This is a pre-existing issue, but are PHY down events silently dropped duri= ng resume because HISI_SAS_RESETTING_BIT remains set? In _resume_v3_hw(), HISI_SAS_RESETTING_BIT is kept active across the sas_resume_ha() call. If a PHY loses its link during this time, the hardware interrupt handler drops the event: drivers/scsi/hisi_sas/hisi_sas_main.c:hisi_sas_phy_down() { if (test_bit(HISI_SAS_RESETTING_BIT, &hisi_hba->flags) || phy->in_reset) { dev_info(dev, "ignore flutter phy%d down\n", phy_no); return; } } When HISI_SAS_RESETTING_BIT is finally cleared here, it does not call hisi_sas_rescan_topology() to recover dropped events. Could this permanently leave libsas state out of sync if a PHY disconnects during the resume wait? > =20 > dev_warn(dev, "end of resuming controller\n"); --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260630065157.4212= 03-1-yangxingui@huawei.com?part=3D1