linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v1 1/1] scsi: Synchronize request queue PM status only on successful resume
       [not found]   ` <1546410308-13486-3-git-send-email-stanley.chu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
@ 2019-01-02 16:15     ` Bart Van Assche
       [not found]       ` <1546445745.163063.4.camel-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Bart Van Assche @ 2019-01-02 16:15 UTC (permalink / raw)
  To: stanley.chu-NuS5LvNUpcJWk0Htik3J/w,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA
  Cc: srv_wsdupstream-NuS5LvNUpcJWk0Htik3J/w,
	matthias.bgg-Re5JQEeQqe8AvxtiuMwx3w,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, 2019-01-02 at 14:25 +0800, stanley.chu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org wrote:
> From: Stanley Chu <stanley.chu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
> 
> The commit 356fd2663cff ("scsi: Set request queue runtime PM status
> back to active on resume") fixed up the inconsistent RPM status between
> request queue and device. However changing request queue RPM status
> shall be done only on successful resume, otherwise status may be still
> inconsistent as below,
> 
> Request queue: RPM_ACTIVE
> Device: RPM_SUSPENDED
> 
> This ends up soft lockup because requests can be submitted to
> underlying devices but those devices and their required resource
> are not resumed.
> 
> Signed-off-by: Stanley Chu <stanley.chu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>

Please add "Fixes:" and "Cc: stable" tags and also Cc the author of commit
356fd2663cff.


> ---
>  drivers/scsi/scsi_pm.c | 24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
> index a2b4179..eff3e59 100644
> --- a/drivers/scsi/scsi_pm.c
> +++ b/drivers/scsi/scsi_pm.c
> @@ -82,6 +82,20 @@ static int scsi_dev_type_resume(struct device *dev,
>  		pm_runtime_disable(dev);
>  		pm_runtime_set_active(dev);
>  		pm_runtime_enable(dev);
> +
> +		/*
> +		 * Forcibly set runtime PM status of request queue to "active"
> +		 * to make sure we can again get requests from the queue
> +		 * (see also blk_pm_peek_request()).
> +		 *
> +		 * The resume hook will correct runtime PM status of the disk.
> +		 */
> +		if (!err && scsi_is_sdev_device(dev)) {
> +			struct scsi_device *sdev = to_scsi_device(dev);
> +
> +			if (sdev->request_queue->dev)
> +				blk_set_runtime_active(sdev->request_queue);
> +		}

What makes you think that the sdev->request_queue->dev test is necessary? The
scsi_dev_type_resume() function is only called after blk_pm_runtime_init() has
finished so I don't think that test is necessary.

Additionally, since the above code occurs inside a block controlled by an
"if (err == 0)" statement, I think the !err test is redundant and should be
left out.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v1 1/1] scsi: Synchronize request queue PM status only on successful resume
       [not found]       ` <1546445745.163063.4.camel-HInyCGIudOg@public.gmane.org>
@ 2019-01-03  6:38         ` Stanley Chu
  2019-01-03 23:15           ` Bart Van Assche
  0 siblings, 1 reply; 3+ messages in thread
From: Stanley Chu @ 2019-01-03  6:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	kuohong.wang-NuS5LvNUpcJWk0Htik3J/w,
	wsdupstream-NuS5LvNUpcJWk0Htik3J/w,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	peter.wang-NuS5LvNUpcJWk0Htik3J/w,
	matthias.bgg-Re5JQEeQqe8AvxtiuMwx3w,
	mika.westerberg-VuQAYsv1563Yd54FQh9/CA

Hi Bart,

On Wed, 2019-01-02 at 08:15 -0800, Bart Van Assche wrote:
> On Wed, 2019-01-02 at 14:25 +0800, stanley.chu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org wrote:
> > From: Stanley Chu <stanley.chu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
> > 
> > The commit 356fd2663cff ("scsi: Set request queue runtime PM status
> > back to active on resume") fixed up the inconsistent RPM status between
> > request queue and device. However changing request queue RPM status
> > shall be done only on successful resume, otherwise status may be still
> > inconsistent as below,
> > 
> > Request queue: RPM_ACTIVE
> > Device: RPM_SUSPENDED
> > 
> > This ends up soft lockup because requests can be submitted to
> > underlying devices but those devices and their required resource
> > are not resumed.
> > 
> > Signed-off-by: Stanley Chu <stanley.chu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
> 
> Please add "Fixes:" and "Cc: stable" tags and also Cc the author of commit
> 356fd2663cff.

Sure. Thanks for remind.

> 
> 
> > ---
> >  drivers/scsi/scsi_pm.c | 24 ++++++++++++++----------
> >  1 file changed, 14 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
> > index a2b4179..eff3e59 100644
> > --- a/drivers/scsi/scsi_pm.c
> > +++ b/drivers/scsi/scsi_pm.c
> > @@ -82,6 +82,20 @@ static int scsi_dev_type_resume(struct device *dev,
> >  		pm_runtime_disable(dev);
> >  		pm_runtime_set_active(dev);
> >  		pm_runtime_enable(dev);
> > +
> > +		/*
> > +		 * Forcibly set runtime PM status of request queue to "active"
> > +		 * to make sure we can again get requests from the queue
> > +		 * (see also blk_pm_peek_request()).
> > +		 *
> > +		 * The resume hook will correct runtime PM status of the disk.
> > +		 */
> > +		if (!err && scsi_is_sdev_device(dev)) {
> > +			struct scsi_device *sdev = to_scsi_device(dev);
> > +
> > +			if (sdev->request_queue->dev)
> > +				blk_set_runtime_active(sdev->request_queue);
> > +		}
> 
> What makes you think that the sdev->request_queue->dev test is necessary? The
> scsi_dev_type_resume() function is only called after blk_pm_runtime_init() has
> finished so I don't think that test is necessary.

We found NULL sdev->request_queue->dev may be dereferenced during below
system resume flow,

scsi_bus_resume_common()
 => async_schedule_domain(async_sdev_resume)

And then async_sdev_resume() is invoked asynchronously,
 
async_sdev_resume()
 => scsi_dev_type_resume(dev, do_scsi_resume)
  => blk_set_runtime_active(sdev->request_queue)

If a SCSI device does not have upper layer driver (like SCSI disk), it
may not be applied blk_pm_runtime_init() invoked by sd_probe() while
this SCSI device is added.

For example, some SCSI devices (like UFS Boot W-LUN) are added
explicitly in __scsi_add_device() by ufshcd_scsi_add_wlus() first and
thus sd_probe() for them is skipped because they are already visible.

For those SCSI devices, null sdev->request_queue->dev will be
dereferenced in blk_set_runtime_active() during above system resume
flow, therefore we add a null pointer checking for this case.

The same issue also happens on those SCSI devices before this patch as
below system resume flow while devices are already runtime-suspended.

scsi_bus_resume_common()
 => blk_set_runtime_active(to_scsi_device(dev)->request_queue)

> 
> Additionally, since the above code occurs inside a block controlled by an
> "if (err == 0)" statement, I think the !err test is redundant and should be
> left out.

Sorry this is my code merge defect.
"err" here shall be returned value from pm_runtime_set_active().

I will fix it in v2.

> 
> Thanks,
> 
> Bart.
> 
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v1 1/1] scsi: Synchronize request queue PM status only on successful resume
  2019-01-03  6:38         ` Stanley Chu
@ 2019-01-03 23:15           ` Bart Van Assche
  0 siblings, 0 replies; 3+ messages in thread
From: Bart Van Assche @ 2019-01-03 23:15 UTC (permalink / raw)
  To: Stanley Chu
  Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	kuohong.wang-NuS5LvNUpcJWk0Htik3J/w,
	wsdupstream-NuS5LvNUpcJWk0Htik3J/w,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	peter.wang-NuS5LvNUpcJWk0Htik3J/w,
	matthias.bgg-Re5JQEeQqe8AvxtiuMwx3w,
	mika.westerberg-VuQAYsv1563Yd54FQh9/CA

On Thu, 2019-01-03 at 14:38 +0800, Stanley Chu wrote:
> We found NULL sdev->request_queue->dev may be dereferenced during below
> system resume flow,
> 
> scsi_bus_resume_common()
>  => async_schedule_domain(async_sdev_resume)
> 
> And then async_sdev_resume() is invoked asynchronously,
>  
> async_sdev_resume()
>  => scsi_dev_type_resume(dev, do_scsi_resume)
>   => blk_set_runtime_active(sdev->request_queue)
> 
> If a SCSI device does not have upper layer driver (like SCSI disk), it
> may not be applied blk_pm_runtime_init() invoked by sd_probe() while
> this SCSI device is added.
> 
> For example, some SCSI devices (like UFS Boot W-LUN) are added
> explicitly in __scsi_add_device() by ufshcd_scsi_add_wlus() first and
> thus sd_probe() for them is skipped because they are already visible.
> 
> For those SCSI devices, null sdev->request_queue->dev will be
> dereferenced in blk_set_runtime_active() during above system resume
> flow, therefore we add a null pointer checking for this case.
> 
> The same issue also happens on those SCSI devices before this patch as
> below system resume flow while devices are already runtime-suspended.
> 
> scsi_bus_resume_common()
>  => blk_set_runtime_active(to_scsi_device(dev)->request_queue)

Hi Stanley,

Thanks, this is helpful information. If you would have to repost your
patch please add a comment that refers to the __scsi_add_device() calls
in the UFS driver.

Bart.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-01-03 23:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1546410308-13486-1-git-send-email-stanley.chu@mediatek.com>
     [not found] ` <1546410308-13486-3-git-send-email-stanley.chu@mediatek.com>
     [not found]   ` <1546410308-13486-3-git-send-email-stanley.chu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
2019-01-02 16:15     ` [PATCH v1 1/1] scsi: Synchronize request queue PM status only on successful resume Bart Van Assche
     [not found]       ` <1546445745.163063.4.camel-HInyCGIudOg@public.gmane.org>
2019-01-03  6:38         ` Stanley Chu
2019-01-03 23:15           ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).