All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Thorsten Leemhuis <regressions@leemhuis.info>, TW <dalzot@gmail.com>
Cc: regressions@lists.linux.dev,
	Mario Limonciello <mario.limonciello@amd.com>,
	Bart Van Assche <bvanassche@acm.org>,
	LKML <linux-kernel@vger.kernel.org>,
	stable@vger.kernel.org
Subject: Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
Date: Thu, 27 Jul 2023 08:39:10 +0900	[thread overview]
Message-ID: <c70caa9e-164c-fee5-8f85-67f6d02373ab@kernel.org> (raw)
In-Reply-To: <6b66dd9a-8bd5-2882-9168-8e6e0848c454@leemhuis.info>

On 7/26/23 22:47, Thorsten Leemhuis wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker.
> 
> On 26.07.23 13:54, TW wrote:
>> I have been having issues with the 6.x series of kernels resuming from
>> suspend with one of my drives. Far as I can tell it has trouble with the
>> cache on the drive when coming out of s3 sleep. Tried a few different
>> distros (Manjaro, OpenMandriva Rome, EndeavourOS) all that give the same
>> error message. It appears to work fine on the 5.15 kernel just fine
>> however.
>>
>> This is the error or errors that I have been getting and assume has been
>> holding up the system from resuming from suspend.
>>
>> Jul 20 04:13:41 rageworks kernel: ata10.00: device reported invalid CHS sector 0
>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current]
>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command

This sense is garbage. This issue was reported already, but it is hard
to deal with as it seems to be due to drives/adapters not correctly
reporting status bits. So for now, let's ignore this sense codes.

The start/stop unit failure is weird. On another case, I am suspecting
that this command is causing a delay on resume, but not an error like this.

>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5
>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: failed to resume async: error -5 
> 
> Thx for your report. I CCed a few people, with a bit of luck they have
> an idea. But I doubt it. If no one replies you likely will need a
> bisection to find the root of the problem. But before going down that
> route you want to check if latest mainline kernel (vanilla!) works better.
> 
> FWIW, this is not my area of expertise, so the following might be a
> misleading comment, but the problem looks somewhat similar to this one
> that iirc was never solved:
> https://bugzilla.kernel.org/show_bug.cgi?id=216087
> 
>> Jul 20 04:12:51 rageworks systemd[1]: nvidia-suspend.service: Deactivated successfully.
>> Jul 20 04:12:51 rageworks systemd[1]: Finished NVIDIA system suspend actions.
>> Jul 20 04:12:51 rageworks systemd[1]: Starting System Suspend... 
> 
> That sounds like you are using out-of tree drivers which can cause all
> sorts of issues. Please recheck if the problem happens without those as
> well and do not use them in all further tests to debug the issue.

Yes. Please retest with the latest 6.5-rc3.

And can you try this patch to see if it solves your issue ?

commit 29e81d11812ee924d19425343ec69acd34af9d35
Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Mon Jul 24 13:23:14 2023 +0900

    ata,scsi: do not issue START STOP UNIT on resume

    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 370d18aca71e..6184c7bcc16c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1100,7 +1100,13 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct
ata_device *dev)
 		}
 	} else {
 		sdev->sector_size = ata_id_logical_sector_size(dev->id);
+		/*
+		 * Stop the drive on suspend but do not issue START STOP UNIT
+		 * on resume as this is not necessary: the port is reset on
+		 * resume, which wakes up the drive.
+		 */
 		sdev->manage_start_stop = 1;
+		sdev->no_start_on_resume = 1;
 	}

 	/*
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 68b12afa0721..b8584fe3123e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev)
 static int sd_resume(struct device *dev)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
-	int ret;
+	int ret = 0;

 	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
 		return 0;
@@ -3885,7 +3885,8 @@ static int sd_resume(struct device *dev)
 		return 0;

 	sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
-	ret = sd_start_stop_device(sdkp, 1);
+	if (!sdkp->device->no_start_on_resume)
+		ret = sd_start_stop_device(sdkp, 1);
 	if (!ret)
 		opal_unlock_from_suspend(sdkp->opal_dev);
 	return ret;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 75b2235b99e2..b9230b6add04 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -194,6 +194,7 @@ struct scsi_device {
 	unsigned no_start_on_add:1;	/* do not issue start on add */
 	unsigned allow_restart:1; /* issue START_UNIT in error handler */
 	unsigned manage_start_stop:1;	/* Let HLD (sd) manage start/stop */
+	unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */
 	unsigned start_stop_pwr_cond:1;	/* Set power cond. in START_STOP_UNIT */
 	unsigned no_uld_attach:1; /* disable connecting to upper level drivers */
 	unsigned select_no_atn:1;


-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2023-07-26 23:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-26 11:54 Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep TW
2023-07-26 13:47 ` Thorsten Leemhuis
2023-07-26 23:39   ` Damien Le Moal [this message]
2023-07-27 10:06     ` TW
2023-07-27 10:22       ` TW
2023-07-27 10:25         ` Damien Le Moal
2023-07-27 10:27         ` Damien Le Moal
2023-07-27 12:25           ` TW
2023-07-28  2:33             ` Damien Le Moal
2023-07-28  2:49               ` TW
2023-07-28  4:09                 ` Damien Le Moal
2023-07-31  0:41                 ` Damien Le Moal
2023-07-31  3:47                   ` TW
2023-07-27 10:24       ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c70caa9e-164c-fee5-8f85-67f6d02373ab@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=bvanassche@acm.org \
    --cc=dalzot@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.