From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2451C433E0 for ; Mon, 3 Aug 2020 12:06:11 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AF5462076B for ; Mon, 3 Aug 2020 12:06:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="e8LxfTWK"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=mg.codeaurora.org header.i=@mg.codeaurora.org header.b="EQCGsMFE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF5462076B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:Message-ID:References:In-Reply-To:Subject:To:From: Date:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=PzTjg/tIOyJBQ2yvBFZb0xh8wKN1fbV16fqaLGrXEuA=; b=e8LxfTWKobcCqlhBsD6aB7jbY GonBfKOqsfNGVsuZ2vCcfZkJzVjwhdJIVoQ7JRAfljDT/W2uZ8gdZN5C/ISc2Cq0YtD7XIQzcJACu /l/NJngb5P1SNpE4Oz5LdY2TPtefsZ6HR3P6njBJRoAAan14idxoqc3CURx40/Nic3ULilUOibye1 piRemxgm43v+aSJ8iR/Gcu09bFi23KTm+wa6w/EnW2GOlGW6Mk1Rq22hg17HnDb1iGhP07S4JS5TQ 3lpw8Go95VEC3TQYBQOiBp8WCTvlhkyZL8UoQIv85mkJOpLLrtEqX+NXoT5EujnRVmkLrNfW7VuCw y4XrXJ/jg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1k2ZD5-0001sd-Gy; Mon, 03 Aug 2020 12:04:55 +0000 Received: from m43-7.mailgun.net ([69.72.43.7]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1k2ZD0-0001rz-Gc for linux-arm-kernel@lists.infradead.org; Mon, 03 Aug 2020 12:04:53 +0000 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1596456293; h=Message-ID: References: In-Reply-To: Subject: Cc: To: From: Date: Content-Transfer-Encoding: Content-Type: MIME-Version: Sender; bh=Q9ewY6f6T4tJS+bov5J8Mts7nZEqC8N1FwJDh5ZScHE=; b=EQCGsMFE1w9hwyz5dVi3KQR1biM3u7JjJ2tAI+FqLTxqX0CIThz0OTAFE2XV2mTRAScb2+pv OC+mrvQhz4Y3WsiejzRVe6jCniCOMlUAKLn9sSljFuQm+FiaUXjI+wqSKoFUCsk349NpjR36 hraSUQFDFeBu4vb/o39yAUd1344= X-Mailgun-Sending-Ip: 69.72.43.7 X-Mailgun-Sid: WyJiYzAxZiIsICJsaW51eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmciLCAiYmU5ZTRhIl0= Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n16.prod.us-east-1.postgun.com with SMTP id 5f27fd5bd2bd131f684b99fd (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Mon, 03 Aug 2020 12:04:43 GMT Received: by smtp.codeaurora.org (Postfix, from userid 1001) id A1743C433A1; Mon, 3 Aug 2020 12:04:41 +0000 (UTC) Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: cang) by smtp.codeaurora.org (Postfix) with ESMTPSA id CA541C433C9; Mon, 3 Aug 2020 12:04:39 +0000 (UTC) MIME-Version: 1.0 Date: Mon, 03 Aug 2020 20:04:39 +0800 From: Can Guo To: Stanley Chu Subject: Re: [PATCH v7] scsi: ufs: Quiesce all scsi devices before shutdown In-Reply-To: References: <20200803100448.2738-1-stanley.chu@mediatek.com> Message-ID: X-Sender: cang@codeaurora.org User-Agent: Roundcube Webmail/1.3.9 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200803_080453_225973_23A93E2E X-CRM114-Status: GOOD ( 31.07 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jiajie.hao@mediatek.com, linux-scsi@vger.kernel.org, martin.petersen@oracle.com, andy.teng@mediatek.com, jejb@linux.ibm.com, chun-hung.wu@mediatek.com, kuohong.wang@mediatek.com, linux-kernel@vger.kernel.org, asutoshd@codeaurora.org, avri.altman@wdc.com, linux-mediatek@lists.infradead.org, peter.wang@mediatek.com, alim.akhtar@samsung.com, matthias.bgg@gmail.com, beanhuo@micron.com, chaotian.jing@mediatek.com, cc.chou@mediatek.com, linux-arm-kernel@lists.infradead.org, bvanassche@acm.org Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Slightly updated my comments On 2020-08-03 19:50, Can Guo wrote: > Hi Stanley, > > On 2020-08-03 18:04, Stanley Chu wrote: >> Currently I/O request could be still submitted to UFS device while >> UFS is working on shutdown flow. This may lead to racing as below >> scenarios and finally system may crash due to unclocked register >> accesses. >> >> To fix this kind of issues, in ufshcd_shutdown(), >> >> 1. Use pm_runtime_get_sync() instead of resuming UFS device by >> ufshcd_runtime_resume() "internally" to let runtime PM framework >> manage and prevent concurrent runtime operations by incoming I/O >> requests. >> >> 2. Specifically quiesce all SCSI devices to block all I/O requests >> after device is resumed. >> >> Example of racing scenario: While UFS device is runtime-suspended >> >> Thread #1: Executing UFS shutdown flow, e.g., >> ufshcd_suspend(UFS_SHUTDOWN_PM) >> >> Thread #2: Executing runtime resume flow triggered by I/O request, >> e.g., ufshcd_resume(UFS_RUNTIME_PM) >> >> This breaks the assumption that UFS PM flows can not be running >> concurrently and some unexpected racing behavior may happen. >> >> Signed-off-by: Stanley Chu >> --- >> Changes: >> - Since v6: >> - Do quiesce to all SCSI devices. >> - Since v4: >> - Use pm_runtime_get_sync() instead of resuming UFS device by >> ufshcd_runtime_resume() "internally". >> --- >> drivers/scsi/ufs/ufshcd.c | 27 ++++++++++++++++++++++----- >> 1 file changed, 22 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >> index 307622284239..7cb220b3fde0 100644 >> --- a/drivers/scsi/ufs/ufshcd.c >> +++ b/drivers/scsi/ufs/ufshcd.c >> @@ -8640,6 +8640,7 @@ EXPORT_SYMBOL(ufshcd_runtime_idle); >> int ufshcd_shutdown(struct ufs_hba *hba) >> { >> int ret = 0; >> + struct scsi_target *starget; >> >> if (!hba->is_powered) >> goto out; >> @@ -8647,11 +8648,27 @@ int ufshcd_shutdown(struct ufs_hba *hba) >> if (ufshcd_is_ufs_dev_poweroff(hba) && ufshcd_is_link_off(hba)) >> goto out; >> >> - if (pm_runtime_suspended(hba->dev)) { >> - ret = ufshcd_runtime_resume(hba); >> - if (ret) >> - goto out; >> - } >> + /* >> + * Let runtime PM framework manage and prevent concurrent runtime >> + * operations with shutdown flow. >> + */ >> + pm_runtime_get_sync(hba->dev); >> + >> + /* >> + * Quiesce all SCSI devices to prevent any non-PM requests sending >> + * from block layer during and after shutdown. >> + * >> + * Here we can not use blk_cleanup_queue() since PM requests >> + * (with BLK_MQ_REQ_PREEMPT flag) are still required to be sent >> + * through block layer. Therefore SCSI command queued after the >> + * scsi_target_quiesce() call returned will block until >> + * blk_cleanup_queue() is called. >> + * >> + * Besides, scsi_target_"un"quiesce (e.g., scsi_target_resume) can >> + * be ignored since shutdown is one-way flow. >> + */ >> + list_for_each_entry(starget, &hba->host->__targets, siblings) >> + scsi_target_quiesce(starget); >> > > Sorry for misleading you to scsi_target_quiesce(), maybe below is > better. > > shost_for_each_device(sdev, hba->host) > scsi_device_quiesce(sdev); > > We may need to discuss more about this quiesce part since I missed > something. > > After we quiesce the scsi devices, only PM requests are allowed, but it > is still not safe: [1] PM requests can still pass through, [2] there > can > be tasks/reqs present in doorbells before the devices are quiesced. So, > these tasks/reqs in [1] and [2] can still be flying in parallel while > ufshcd_suspend is running. > > How about only quiescing the UFS device well known scsi device but > using > freeze_queue to the other scsi devices? blk_mq_freeze_queue can > eliminate > the risks mentioned in [1] and [2]. > > shost_for_each_device(sdev, hba->host) { > if (sdev == hba->sdev_ufs_device) > scsi_device_quiesce(sdev); > else > blk_mq_freeze_queue(sdev->request_queue); > } > > IF blk_mq_freeze_queue is not allowed to be used by LLD (I think we can > use it as I recalled Bart used to use it in one of his changes to UFS > scaling), > we can use scsi_remove_device instead, it changes scsi device's state > to > SDEV_DEL and calls blk_cleanup_queue. > > We can also make changes like below. [1] is to make sure no more PM > requests > sent to scsi devices, [2] is make sure doorbells are cleared before > invoke > ufshcd_suspend. > > shost_for_each_device(sdev, hba->host) { > scsi_autopm_get_device(sdev); [1] > scsi_device_quiesce(sdev); > } > > ufshcd_wait_for_doorbell_clr(hba, U64_MAX); [2] > > Please let me know which one you prefer or if you have better idea, > thanks! > > Regards, > > Can Guo. > >> ret = ufshcd_suspend(hba, UFS_SHUTDOWN_PM); >> out: _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel