From: Niklas Cassel <Niklas.Cassel@wdc.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: "linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"Martin K . Petersen" <martin.petersen@oracle.com>,
John Garry <john.g.garry@oracle.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Paul Ausbeck <paula@soe.ucsc.edu>,
Kai-Heng Feng <kai.heng.feng@canonical.com>,
Joe Breuer <linux-kernel@jmbreuer.net>,
Geert Uytterhoeven <geert@linux-m68k.org>,
Chia-Lin Kao <acelan.kao@canonical.com>
Subject: Re: [PATCH v3 07/23] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
Date: Tue, 19 Sep 2023 14:00:09 +0000 [thread overview]
Message-ID: <ZQmpZfJgGPpxKonH@x1-carbon> (raw)
In-Reply-To: <20230915081507.761711-8-dlemoal@kernel.org>
On Fri, Sep 15, 2023 at 05:14:51PM +0900, Damien Le Moal wrote:
> Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
> device resume") modified ata_scsi_dev_rescan() to check the scsi device
> "is_suspended" power field to ensure that the scsi device associated
> with an ATA device is fully resumed when scsi_rescan_device() is
> executed. However, this fix is problematic as:
> 1) It relies on a PM internal field that should not be used without PM
> device locking protection.
> 2) The check for is_suspended and the call to scsi_rescan_device() are
> not atomic and a suspend PM event may be triggered between them,
> casuing scsi_rescan_device() to be called on a suspended device and
> in that function blocking while holding the scsi device lock. This
> would deadlock a following resume operation.
> These problems can trigger PM deadlocks on resume, especially with
> resume operations triggered quickly after or during suspend operations.
> E.g., a simple bash script like:
>
> for (( i=0; i<10; i++ )); do
> echo "+2 > /sys/class/rtc/rtc0/wakealarm
> echo mem > /sys/power/state
> done
>
> that triggers a resume 2 seconds after starting suspending a system can
> quickly lead to a PM deadlock preventing the system from correctly
> resuming.
>
> Fix this by replacing the check on is_suspended with a check on the
> return value given by scsi_rescan_device() as that function will fail if
> called against a suspended device. Also make sure rescan tasks already
> scheduled are first cancelled before suspending an ata port.
>
> Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
> drivers/ata/libata-core.c | 16 ++++++++++++++++
> drivers/ata/libata-scsi.c | 33 +++++++++++++++------------------
> 2 files changed, 31 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 0cf0caf77907..0479493e54bd 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5172,11 +5172,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
>
> static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
> {
> + /*
> + * We are about to suspend the port, so we do not care about
> + * scsi_rescan_device() calls scheduled by previous resume operations.
> + * The next resume will schedule the rescan again. So cancel any rescan
> + * that is not done yet.
> + */
> + cancel_delayed_work_sync(&ap->scsi_rescan_task);
> +
> ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
> }
>
> static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
> {
> + /*
> + * We are about to suspend the port, so we do not care about
> + * scsi_rescan_device() calls scheduled by previous resume operations.
> + * The next resume will schedule the rescan again. So cancel any rescan
> + * that is not done yet.
> + */
> + cancel_delayed_work_sync(&ap->scsi_rescan_task);
> +
> ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
> }
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index ac2d332b4963..6297f8c16a13 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -4760,7 +4760,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
> struct ata_link *link;
> struct ata_device *dev;
> unsigned long flags;
> - bool delay_rescan = false;
> + int ret = 0;
>
> mutex_lock(&ap->scsi_scan_mutex);
> spin_lock_irqsave(ap->lock, flags);
> @@ -4769,37 +4769,34 @@ void ata_scsi_dev_rescan(struct work_struct *work)
> ata_for_each_dev(dev, link, ENABLED) {
> struct scsi_device *sdev = dev->sdev;
>
> + /*
> + * If the port was suspended before this was scheduled,
> + * bail out.
> + */
> + if (ap->pflags & ATA_PFLAG_SUSPENDED)
> + goto unlock;
> +
> if (!sdev)
> continue;
> if (scsi_device_get(sdev))
> continue;
>
> - /*
> - * If the rescan work was scheduled because of a resume
> - * event, the port is already fully resumed, but the
> - * SCSI device may not yet be fully resumed. In such
> - * case, executing scsi_rescan_device() may cause a
> - * deadlock with the PM code on device_lock(). Prevent
> - * this by giving up and retrying rescan after a short
> - * delay.
> - */
> - delay_rescan = sdev->sdev_gendev.power.is_suspended;
> - if (delay_rescan) {
> - scsi_device_put(sdev);
> - break;
> - }
> -
> spin_unlock_irqrestore(ap->lock, flags);
> - scsi_rescan_device(sdev);
> + ret = scsi_rescan_device(sdev);
> scsi_device_put(sdev);
> spin_lock_irqsave(ap->lock, flags);
> +
> + if (ret)
> + goto unlock;
> }
> }
>
> +unlock:
> spin_unlock_irqrestore(ap->lock, flags);
> mutex_unlock(&ap->scsi_scan_mutex);
>
> - if (delay_rescan)
> + /* Reschedule with a delay if scsi_rescan_device() returned an error */
> + if (ret)
> schedule_delayed_work(&ap->scsi_rescan_task,
> msecs_to_jiffies(5));
> }
> --
> 2.41.0
>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
next prev parent reply other threads:[~2023-09-19 14:00 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-15 8:14 [PATCH v3 00/23] Fix libata suspend/resume handling and code cleanup Damien Le Moal
2023-09-15 8:14 ` [PATCH v3 01/23] ata: libata-core: Fix ata_port_request_pm() locking Damien Le Moal
2023-09-19 13:21 ` Niklas Cassel
2023-09-19 16:31 ` Damien Le Moal
2023-09-20 7:21 ` Niklas Cassel
2023-09-20 7:30 ` Niklas Cassel
2023-09-20 10:22 ` Damien Le Moal
2023-09-20 10:20 ` Damien Le Moal
2023-09-15 8:14 ` [PATCH v3 02/23] ata: libata-core: Fix port and device removal Damien Le Moal
2023-09-19 13:21 ` Niklas Cassel
2023-09-19 17:42 ` Damien Le Moal
2023-09-15 8:14 ` [PATCH v3 03/23] ata: libata-scsi: link ata port and scsi device Damien Le Moal
2023-09-19 13:21 ` Niklas Cassel
2023-09-19 16:27 ` Damien Le Moal
2023-09-15 8:14 ` [PATCH v3 04/23] scsi: sd: Differentiate system and runtime start/stop management Damien Le Moal
2023-09-15 12:26 ` Hannes Reinecke
2023-09-15 8:14 ` [PATCH v3 05/23] ata: libata-scsi: Disable scsi device manage_system_start_stop Damien Le Moal
2023-09-15 12:27 ` Hannes Reinecke
2023-09-15 8:14 ` [PATCH v3 06/23] scsi: Do not attempt to rescan suspended devices Damien Le Moal
2023-09-15 12:29 ` Hannes Reinecke
2023-09-19 13:59 ` Niklas Cassel
2023-09-15 8:14 ` [PATCH v3 07/23] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
2023-09-15 12:29 ` Hannes Reinecke
2023-09-19 14:00 ` Niklas Cassel [this message]
2023-09-15 8:14 ` [PATCH v3 08/23] ata: libata-core: Do not register PM operations for SAS ports Damien Le Moal
2023-09-15 8:14 ` [PATCH v3 09/23] scsi: sd: Do not issue commands to suspended disks on shutdown Damien Le Moal
2023-09-15 12:30 ` Hannes Reinecke
2023-09-15 14:31 ` Bart Van Assche
2023-09-15 8:14 ` [PATCH v3 10/23] ata: libata-core: Fix compilation warning in ata_dev_config_ncq() Damien Le Moal
2023-09-15 8:14 ` [PATCH v3 11/23] ata: libata-eh: Fix compilation warning in ata_eh_link_report() Damien Le Moal
2023-09-15 8:14 ` [PATCH v3 12/23] scsi: Remove scsi device no_start_on_resume flag Damien Le Moal
2023-09-15 8:14 ` [PATCH v3 13/23] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat() Damien Le Moal
2023-09-15 8:14 ` [PATCH v3 14/23] ata: libata-core: Synchronize ata_port_detach() with hotplug Damien Le Moal
2023-09-15 8:14 ` [PATCH v3 15/23] ata: libata-core: Detach a port devices on shutdown Damien Le Moal
2023-09-15 8:15 ` [PATCH v3 16/23] ata: libata-core: Remove ata_port_suspend_async() Damien Le Moal
2023-09-15 8:15 ` [PATCH v3 17/23] ata: libata-core: Remove ata_port_resume_async() Damien Le Moal
2023-09-15 8:15 ` [PATCH v3 18/23] ata: libata-core: Do not poweroff runtime suspended ports Damien Le Moal
2023-09-15 8:15 ` [PATCH v3 19/23] ata: libata-core: Do not resume " Damien Le Moal
2023-09-15 8:15 ` [PATCH v3 20/23] ata: libata-sata: Improve ata_sas_slave_configure() Damien Le Moal
2023-09-15 8:15 ` [PATCH v3 21/23] ata: libata-eh: Improve reset error messages Damien Le Moal
2023-09-15 8:15 ` [PATCH v3 22/23] ata: libata-eh: Reduce "disable device" message verbosity Damien Le Moal
2023-09-15 8:15 ` [PATCH v3 23/23] ata: libata: Cleanup inline DMA helper functions Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZQmpZfJgGPpxKonH@x1-carbon \
--to=niklas.cassel@wdc.com \
--cc=acelan.kao@canonical.com \
--cc=dlemoal@kernel.org \
--cc=geert@linux-m68k.org \
--cc=john.g.garry@oracle.com \
--cc=kai.heng.feng@canonical.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@jmbreuer.net \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=paula@soe.ucsc.edu \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox