From: bugzilla-daemon@kernel.org
To: linux-scsi@vger.kernel.org
Subject: [Bug 215880] Resume process hangs for 5-6 seconds starting sometime in 5.16
Date: Fri, 26 Aug 2022 07:00:38 +0000 [thread overview]
Message-ID: <bug-215880-11613-KMetxdJz9f@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-215880-11613@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=215880
--- Comment #39 from damien.lemoal@opensource.wdc.com ---
On 8/26/22 07:15, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215880
>
> --- Comment #38 from damien.lemoal@opensource.wdc.com ---
> On 8/26/22 05:31, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=215880
>>
>> --- Comment #37 from Bart Van Assche (bvanassche@acm.org) ---
>> On 8/25/22 13:01, bugzilla-daemon@kernel.org wrote:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=215880
>>>
>>> --- Comment #36 from jason600 (jason600.groome@gmail.com) ---
>>> (In reply to Bart Van Assche from comment #27)
>>>> Thanks for testing! The patches from the sd-resume branch have been posted
>>>> on the linux-scsi mailing list. See also
>>>>
>>>>
>>>>
>>>> https://lore.kernel.org/linux-scsi/20220628222131.14780-1-bvanassche@acm.org/
>>>> T/#t
>>>
>>> Hi Bart, just an update for you. I noticed this had been removed from the
>>> 6.0-rc1 for freezing after suspend.
>>>
>>> I've been compiling my kernel with this fix on various 5.18 kernels (with
>>> opensuse tumbleweed), it has worked fine, no freezing on resume as others
>>> have
>>> mentioned.
>>>
>>> Yesterday, I updated to 5.19.2 kernel, applied the fix, recompiled, and it
>>> froze after the first suspend. Rebooted and the same thing happened again.
>>> I
>>> recompiled the kernel with the fix, just to make sure i didn't mess it up,
>>> and
>>> the same happened again.
>>>
>>> When you originally did this fix, you based it on 5.18, and indeed, it
>>> works
>>> fine on 5.18 for me. There were a lot of changes to the drivers/scsi/sd.c
>>> file
>>> for 5.19, presumably it was those changes that made this fix start freezing
>>> after suspend.
>>>
>>> Perhaps you could check if the other people that experienced freezing were
>>> using either 5.19 or 6.0-rc1.
>>
>> Multiple people reported issues with freezes during suspend with kernel
>> v6.0-rc1. Please take a look at the following report:
>>
>> https://lore.kernel.org/all/dd6844e7-f338-a4e9-2dad-0960e25b2ca1@redhat.com/.
>> It shows that if zoned ATA disks are present that blk_mq_freeze_queue()
>> may be called from inside ata_scsi_dev_rescan() on the context of a work
>> queue. ATA rescanning happens from inside the SCSI error handler. So
>> there is potential for a lockup because of the following:
>> * Execution of the START command being postponed because the SCSI error
>> handler is active.
>> * blk_mq_freeze_queue() waiting for the START command to finish.
>> * The START completion handler not being executed because it got queued
>> on the same work queue as the ATA rescan work.
Checking the code, the dev pm resume call chain for ATA look like this.
ata_port_resume() -> ata_port_request_pm() -> ata_port_schedule_eh() ->
scsi_error_handler() thread runs -> shost->transportt->eh_strategy_handler
== ata_scsi_error() -> ata_scsi_port_error_handler() ->
ata_eh_handle_port_resume() -> ap->ops->port_resume() ==
ahci_port_resume() for AHCI adapters.
There are no commands issued to the device by this chain, only
registers/port settings being changed. So this should always complete
quickly and in itself not be the reason for the START_STOP command issued
by the sd driver to get stuck.
After calling ata_eh_handle_port_resume(), ata_scsi_port_error_handler()
will trigger a reset through ata_std_error_handler() -> ata_do_eh() and
that will cause the device rescan in EH context. Device rescan will
definitely spin up the device so the START_STOP command that sd_resume()
issues seems rather useless...
As a hack, it would be good to try something like this:
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 29e2f55c6faa..1bc92c04f048 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1081,6 +1081,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev,
struct ata_device *dev)
} else {
sdev->sector_size = ata_id_logical_sector_size(dev->id);
sdev->manage_start_stop = 1;
+ sdev->no_start_on_resume = 1;
}
/*
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8f79fa6318fe..4c28ca4d038b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3761,7 +3761,7 @@ static int sd_suspend_runtime(struct device *dev)
static int sd_resume(struct device *dev)
{
struct scsi_disk *sdkp = dev_get_drvdata(dev);
- int ret;
+ int ret = 0;
if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */
return 0;
@@ -3770,7 +3770,8 @@ static int sd_resume(struct device *dev)
return 0;
sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
- ret = sd_start_stop_device(sdkp, 1);
+ if (!sdkp->device->no_start_on_resume)
+ ret = sd_start_stop_device(sdkp, 1);
if (!ret)
opal_unlock_from_suspend(sdkp->opal_dev);
return ret;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 3113471ca375..92e141536c6c 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -192,6 +192,7 @@ struct scsi_device {
unsigned no_start_on_add:1; /* do not issue start on add */
unsigned allow_restart:1; /* issue START_UNIT in error handler */
unsigned manage_start_stop:1; /* Let HLD (sd) manage start/stop */
+ unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on
resume */
unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT
*/
unsigned no_uld_attach:1; /* disable connecting to upper level drivers
*/
unsigned select_no_atn:1;
I am not sure at all this is correct though. This actually may break other
suspend/resume flavors. If we could somehow synchronize scsi pm resume to
run *after* ata pm resume, all problems should go away I think.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next prev parent reply other threads:[~2022-08-26 7:01 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-215880-11613@https.bugzilla.kernel.org/>
2022-06-27 4:09 ` [Bug 215880] Resume process hangs for 5-6 seconds starting sometime in 5.16 bugzilla-daemon
2022-06-28 21:17 ` bugzilla-daemon
2022-06-28 22:30 ` bugzilla-daemon
2022-07-15 18:44 ` bugzilla-daemon
2022-08-15 11:02 ` bugzilla-daemon
2022-08-15 13:36 ` bugzilla-daemon
2022-08-16 11:08 ` bugzilla-daemon
2022-08-16 15:44 ` bugzilla-daemon
2022-08-16 16:10 ` bugzilla-daemon
2022-08-16 16:14 ` bugzilla-daemon
2022-08-16 17:11 ` bugzilla-daemon
2022-08-25 20:01 ` bugzilla-daemon
2022-08-25 20:31 ` bugzilla-daemon
2022-08-25 22:15 ` bugzilla-daemon
2022-08-26 7:00 ` Damien Le Moal
2022-08-26 7:00 ` bugzilla-daemon [this message]
2022-10-06 15:37 ` bugzilla-daemon
2022-10-06 23:48 ` bugzilla-daemon
2022-10-07 0:10 ` bugzilla-daemon
2022-10-07 0:15 ` bugzilla-daemon
2023-07-08 23:17 ` bugzilla-daemon
2023-07-09 7:09 ` bugzilla-daemon
2023-07-09 23:18 ` bugzilla-daemon
2023-07-10 0:18 ` bugzilla-daemon
2023-07-10 0:47 ` bugzilla-daemon
2023-07-10 1:02 ` bugzilla-daemon
2023-07-10 1:07 ` bugzilla-daemon
2023-07-10 2:51 ` bugzilla-daemon
2023-07-10 3:48 ` bugzilla-daemon
2023-07-10 17:00 ` bugzilla-daemon
2023-07-10 22:48 ` bugzilla-daemon
2023-07-11 21:39 ` bugzilla-daemon
2023-07-11 22:55 ` bugzilla-daemon
2023-07-20 21:52 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-215880-11613-KMetxdJz9f@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.