From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stanley Chu Subject: Re: [PATCH v1 1/1] scsi: Synchronize request queue PM status only on successful resume Date: Thu, 3 Jan 2019 14:38:47 +0800 Message-ID: <1546497527.20657.12.camel@mtkswgap22> References: <1546410308-13486-1-git-send-email-stanley.chu@mediatek.com> <1546410308-13486-3-git-send-email-stanley.chu@mediatek.com> <1546445745.163063.4.camel@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1546445745.163063.4.camel-HInyCGIudOg@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-mediatek" Errors-To: linux-mediatek-bounces+glpam-linux-mediatek=m.gmane.org-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org To: Bart Van Assche Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kuohong.wang-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org, wsdupstream-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, peter.wang-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org, matthias.bgg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, mika.westerberg-VuQAYsv1563Yd54FQh9/CA@public.gmane.org List-Id: linux-scsi@vger.kernel.org Hi Bart, On Wed, 2019-01-02 at 08:15 -0800, Bart Van Assche wrote: > On Wed, 2019-01-02 at 14:25 +0800, stanley.chu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org wrote: > > From: Stanley Chu > > > > The commit 356fd2663cff ("scsi: Set request queue runtime PM status > > back to active on resume") fixed up the inconsistent RPM status between > > request queue and device. However changing request queue RPM status > > shall be done only on successful resume, otherwise status may be still > > inconsistent as below, > > > > Request queue: RPM_ACTIVE > > Device: RPM_SUSPENDED > > > > This ends up soft lockup because requests can be submitted to > > underlying devices but those devices and their required resource > > are not resumed. > > > > Signed-off-by: Stanley Chu > > Please add "Fixes:" and "Cc: stable" tags and also Cc the author of commit > 356fd2663cff. Sure. Thanks for remind. > > > > --- > > drivers/scsi/scsi_pm.c | 24 ++++++++++++++---------- > > 1 file changed, 14 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c > > index a2b4179..eff3e59 100644 > > --- a/drivers/scsi/scsi_pm.c > > +++ b/drivers/scsi/scsi_pm.c > > @@ -82,6 +82,20 @@ static int scsi_dev_type_resume(struct device *dev, > > pm_runtime_disable(dev); > > pm_runtime_set_active(dev); > > pm_runtime_enable(dev); > > + > > + /* > > + * Forcibly set runtime PM status of request queue to "active" > > + * to make sure we can again get requests from the queue > > + * (see also blk_pm_peek_request()). > > + * > > + * The resume hook will correct runtime PM status of the disk. > > + */ > > + if (!err && scsi_is_sdev_device(dev)) { > > + struct scsi_device *sdev = to_scsi_device(dev); > > + > > + if (sdev->request_queue->dev) > > + blk_set_runtime_active(sdev->request_queue); > > + } > > What makes you think that the sdev->request_queue->dev test is necessary? The > scsi_dev_type_resume() function is only called after blk_pm_runtime_init() has > finished so I don't think that test is necessary. We found NULL sdev->request_queue->dev may be dereferenced during below system resume flow, scsi_bus_resume_common() => async_schedule_domain(async_sdev_resume) And then async_sdev_resume() is invoked asynchronously, async_sdev_resume() => scsi_dev_type_resume(dev, do_scsi_resume) => blk_set_runtime_active(sdev->request_queue) If a SCSI device does not have upper layer driver (like SCSI disk), it may not be applied blk_pm_runtime_init() invoked by sd_probe() while this SCSI device is added. For example, some SCSI devices (like UFS Boot W-LUN) are added explicitly in __scsi_add_device() by ufshcd_scsi_add_wlus() first and thus sd_probe() for them is skipped because they are already visible. For those SCSI devices, null sdev->request_queue->dev will be dereferenced in blk_set_runtime_active() during above system resume flow, therefore we add a null pointer checking for this case. The same issue also happens on those SCSI devices before this patch as below system resume flow while devices are already runtime-suspended. scsi_bus_resume_common() => blk_set_runtime_active(to_scsi_device(dev)->request_queue) > > Additionally, since the above code occurs inside a block controlled by an > "if (err == 0)" statement, I think the !err test is redundant and should be > left out. Sorry this is my code merge defect. "err" here shall be returned value from pm_runtime_set_active(). I will fix it in v2. > > Thanks, > > Bart. > > _______________________________________________ > Linux-mediatek mailing list > Linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org > http://lists.infradead.org/mailman/listinfo/linux-mediatek