Linux SCSI subsystem development
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	Joe Breuer <linux-kernel@jmbreuer.net>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	Hannes Reinecke <hare@suse.de>,
	"Rafael J . Wysocki" <rafael@kernel.org>
Subject: Re: [PATCH] ata: libata-scsi: Avoid deadlock on rescan after device resume
Date: Fri, 16 Jun 2023 14:25:38 +0900	[thread overview]
Message-ID: <766f0f86-7b2e-51a7-6cc1-b670631105a4@kernel.org> (raw)
In-Reply-To: <b35c2137-6469-4d30-a25c-096e4932fe1b@rowland.harvard.edu>

On 6/15/23 23:50, Alan Stern wrote:
> On Thu, Jun 15, 2023 at 05:33:26PM +0900, Damien Le Moal wrote:
>> When an ATA port is resumed from sleep, the port is reset and a power
>> management request issued to libata EH to reset the port and rescanning
>> the device(s) attached to the port. Device rescanning is done by
>> scheduling an ata_scsi_dev_rescan() work, which will execute
>> scsi_rescan_device().
>>
>> However, scsi_rescan_device() takes the generic device lock, which is
>> also taken by dpm_resume() when the SCSI device is resumed as well. If
>> a device rescan execution starts before the completion of the SCSI
>> device resume, the rcu locking used to refresh the cached VPD pages of
>> the device, combined with the generic device locking from
>> scsi_rescan_device() and from dpm_resume() can cause a deadlock.
>>
>> Avoid this situation by changing struct ata_port scsi_rescan_task to be
>> a delayed work instead of a simple work_struct. ata_scsi_dev_rescan() is
>> modified to check if the SCSI device associated with the ATA device that
>> must be rescanned is not suspended. If the SCSI device is still
>> suspended, ata_scsi_dev_rescan() returns early and reschedule itself for
>> execution after an arbitrary delay of 5ms.
> 
> I don't understand the nature of the relationship between the ATA port
> and the corresponding SCSI device.  Maybe you could explain it more
> fully, if you have time.

For ata devices, the parent -> child relationship is:

ata_host (the adapter) -> ata_port -> ata_link -> ata_device (HDD, SSD or ATAPI
CD/DVD)

For scsi devices representing ATA devices, it is:

ata_port -> scsi_host -> scsi_target -> scsi_device -> scsi_disk (or gendisk for
a CD/DVD)

When devices are scanned, libata will create ports and create a scsi_host for
each port, and a scsi device for each ata_device found on the link(s) for the
port. There is no direct relationship between an ata_device (the HDD or SSD) and
its scsi_device/scsi_disk (the device used to issue commands). The PM operations
we have are for ata_port and scsi_device. For the scsi device, the operations
are actually defined per device type, so in the scsi_disk driver (sd) for HDDs
and SSDs.

> But in any case, this approach seems like a layering violation.  Why not 

The layering violation is I think only with the direct reference to the scsi
device power.is_suspended field, which is definitely not pretty. But there are
some other drivers doing something similar:

$ git grep "power\.is_suspended" | grep -v drivers/base/power/main.c
drivers/gpu/drm/i915/display/intel_display_power_well.c:	if
(!dev_priv->drm.dev->power.is_suspended)
drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c:	if
(!dwmac->dev->power.is_suspended) {
drivers/platform/surface/surface_acpi_notify.c:	if (d->dev->power.is_suspended) {

All the other code (ata calling scsi) is normal per the SCSI-to-ata translation
needed (all ata devices are represented as scsi devices in Linux, following the
SAT=scsi ATA translation specifications).

> instead call a SCSI utility routine to set a "needs_rescan" flag in the 
> scsi_device structure?  Then scsi_device_resume() could automatically 
> call scsi_rescan_device() -- or rather an internal version that assumes 
> the device lock is already held -- if the flag is set.  Or it could 

Yes, ideally, that is what we should do. Such fix is however more involved, and
so I prefer not to push for this right now as a fix for the regression at hand.
But I will definitively look into this.

> queue a non-delayed work routine to do this.  (Is it important to have 
> the rescan finish before userspace starts up and tries to access the ATA 
> device again?)
> 
> That, combined with a guaranteed order of resuming, would do what you 
> want, right?

Yes. But again more fixes needed:
1) libata uses its error handling path to reset a port on resume and probe the
links again. The devices found are then revalidated (IDENTIFY command, reading
log pages etc). This call to EH is triggered in the pm->resume operation, but EH
runs as an asynchronous task. So the port->resume may complete before EH is
done. We need to change the EH call to be synchronous in ->resume
2) We need to remove triggering the task that does scsi_rescan_device() in EH
and move that trigger to scsi_device ->resume.
3) Modify scsi_device ->resume() to call scsi_rescan_device()

Safely doing (3) requires synchronization between ata_port->resume and
scsi_device->resume. We can do that by adding a device link between ata_device
and scsi_device. Doing so, the scsi device becomes the grandchild of the
ata_port and we are guaranteed that its ->resume will happen only once the ata
port ->resume is done.

That will also improve things as we will be able to rescan the scsi device (and
thus catch any change to the device) *before* the device ->resume operation
re-enables issuing user commands by un-quiescing the device request queue.

As you can see, that is all beyond a quick fix for a regression... Will work on
this.

Cheers.

> 
> Alan Stern

-- 
Damien Le Moal
Western Digital Research


  parent reply	other threads:[~2023-06-16  5:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15  8:33 [PATCH] ata: libata-scsi: Avoid deadlock on rescan after device resume Damien Le Moal
2023-06-15  8:35 ` Damien Le Moal
2023-06-17  6:55   ` Joe Breuer
2023-06-15  8:41 ` Hannes Reinecke
2023-06-15  8:45   ` Damien Le Moal
2023-06-15 14:50 ` Alan Stern
2023-06-16  3:32   ` Kai-Heng Feng
2023-06-16  5:28     ` Damien Le Moal
2023-06-16  5:25   ` Damien Le Moal [this message]
2023-06-16 14:25     ` Alan Stern
2023-06-16  3:31 ` Kai-Heng Feng
2023-06-16  5:28   ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=766f0f86-7b2e-51a7-6cc1-b670631105a4@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=hare@suse.de \
    --cc=kai.heng.feng@canonical.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@jmbreuer.net \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=rafael@kernel.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox