All of lore.kernel.org
 help / color / mirror / Atom feed
From: Can Guo <cang@codeaurora.org>
To: Bart Van Assche <bvanassche@acm.org>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
	hongwus@codeaurora.org, rnayak@codeaurora.org,
	stanley.chu@mediatek.com, alim.akhtar@samsung.com,
	beanhuo@micron.com, Avri.Altman@wdc.com,
	bjorn.andersson@linaro.org, linux-scsi@vger.kernel.org,
	kernel-team@android.com, saravanak@google.com,
	salyzyn@google.com, "James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 1/1] scsi: pm: Balance pm_only counter of request queue during system resume
Date: Fri, 01 May 2020 09:42:17 +0800	[thread overview]
Message-ID: <1e2a2e39dbb3a0f06fe95bbfd66e1648@codeaurora.org> (raw)
In-Reply-To: <ef23a815-118a-52fe-4880-19e7fc4fcd10@acm.org>

On 2020-05-01 04:32, Bart Van Assche wrote:
> On 2020-04-29 22:40, Can Guo wrote:
>> On 2020-04-30 13:08, Bart Van Assche wrote:
>>> On 2020-04-29 21:10, Can Guo wrote:
>>>> During system resume, scsi_resume_device() decreases a request 
>>>> queue's
>>>> pm_only counter if the scsi device was quiesced before. But after 
>>>> that,
>>>> if the scsi device's RPM status is RPM_SUSPENDED, the pm_only 
>>>> counter is
>>>> still held (non-zero). Current scsi resume hook only sets the RPM 
>>>> status
>>>> of the scsi device and its request queue to RPM_ACTIVE, but leaves 
>>>> the
>>>> pm_only counter unchanged. This may make the request queue's pm_only
>>>> counter remain non-zero after resume hook returns, hence those who 
>>>> are
>>>> waiting on the mq_freeze_wq would never be woken up. Fix this by 
>>>> calling
>>>> blk_post_runtime_resume() if pm_only is non-zero to balance the 
>>>> pm_only
>>>> counter which is held by the scsi device's RPM ops.
>>> 
>>> How was this issue discovered? How has this patch been tested?
>> 
>> As the issue was found after system resumes, so the issue was 
>> discovered
>> during system suspend/resume test, and it is very easy to be 
>> replicated.
>> After system resumes, if this issue hits some scsi devices, all bios 
>> sent
>> to their request queues are blocked, which may cause a system hang if 
>> the
>> scsi devices are vital to system functionality.
>> 
>> To make sure the patch work well, we have tested system suspend/resume
>> and made sure no system hang happen due to request queues got blocked
>> by imbalanced pm_only counter.
> 
> Thanks, that's very interesting information. My concern with this patch
> is that the power management code is not the only caller of
> blk_set_pm_only() / blk_clear_pm_only(). E.g. the SCSI SPI code also
> calls scsi_device_quiesce() and scsi_device_resume(). These last
> functions call blk_set_pm_only() and blk_clear_pm_only(). More calls of
> scsi_device_quiesce() and scsi_device_resume() might be added in the 
> future.
> 
> Has it been considered to test directly whether a SCSI device has been
> runtime suspended instead of relying on blk_queue_pm_only()? How about
> using pm_runtime_status_suspended() or adding a function in
> block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED?
> 
> Thanks,
> 
> Bart.

Hi Bart,

Slightly revised my previous mail.

Please let me address your concern.

First of all, it is allowed to call scsi_device_quiesce() multiple 
times,
but one sdev's request queue's pm_only counter can only be increased 
once
by scsi_device_quiesce(), because if a sdev has already been quiesced,
in scsi_device_quiesce(), scsi_device_set_state(sdev, SDEV_QUIESCE) 
would
return -ENIVAL (illegal state transform), then blk_clear_pm_only() shall
be called to decrease pm_only once, so no matter how many times
scsi_device_quiesce() is called, it can only increase pm_only once.

scsi_device_resume() is same, it calls blk_clear_pm_only only once and
only if the sdev was quiesced().

So, in a word, after scsi_device_resume() returns in 
scsi_dev_type_resume(),
if a sdev has block layer runtime PM enabled (sdev->request_queue->dev 
is not
NULL), its queue's pm_only counter should be 1 (if the sdev's runtime 
power
status is RPM_SUSPENDED) or 0 (if the sdev's runtime power status is 
RPM_ACTIVE).
If a sdev has block layer runtime PM disabled (sdev->request_queue->dev 
is NULL),
its queue's pm_only counter should be 0.

Has it been considered to test directly whether a SCSI device has been
runtime suspended instead of relying on blk_queue_pm_only()? How about
using pm_runtime_status_suspended() or adding a function in
block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED?

Yes, I used to make the patch like that way, and it also worked well, as
both ways are equal actually. I kinda like the current code because we
should be confident that after scsi_dev_type_resume() returns, pm_only
must be 0. Different reviewers may have different opinions, either way
works well anyways.

Thanks,

Can Guo.

  parent reply	other threads:[~2020-05-01  1:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-30  4:10 [PATCH v3 1/1] scsi: pm: Balance pm_only counter of request queue during system resume Can Guo
2020-04-30  5:08 ` Bart Van Assche
2020-04-30  5:40   ` Can Guo
2020-04-30 20:32     ` Bart Van Assche
2020-05-01  1:19       ` Can Guo
2020-05-01  1:42       ` Can Guo [this message]
2020-05-01  1:50         ` Bart Van Assche
2020-05-01  5:12           ` Can Guo
2020-05-01 17:56             ` Bart Van Assche
2020-05-02  1:59               ` Can Guo
2020-04-30  9:11   ` Avri Altman
2020-04-30 12:38     ` Can Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e2a2e39dbb3a0f06fe95bbfd66e1648@codeaurora.org \
    --to=cang@codeaurora.org \
    --cc=Avri.Altman@wdc.com \
    --cc=alim.akhtar@samsung.com \
    --cc=asutoshd@codeaurora.org \
    --cc=beanhuo@micron.com \
    --cc=bjorn.andersson@linaro.org \
    --cc=bvanassche@acm.org \
    --cc=hongwus@codeaurora.org \
    --cc=jejb@linux.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nguyenb@codeaurora.org \
    --cc=rnayak@codeaurora.org \
    --cc=salyzyn@google.com \
    --cc=saravanak@google.com \
    --cc=stanley.chu@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.