From: Damien Le Moal <dlemoal@kernel.org>
To: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Hannes Reinecke <hare@suse.de>,
Bart Van Assche <bvanassche@acm.org>,
Bagas Sanjaya <bagasdotme@gmail.com>, Pavel Machek <pavel@ucw.cz>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Len Brown <len.brown@intel.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Kees Cook <keescook@chromium.org>,
Tony Luck <tony.luck@intel.com>,
"Guilherme G. Piccoli" <gpiccoli@igalia.com>,
Thorsten Leemhuis <linux@leemhuis.info>,
"James E.J. Bottomley" <jejb@linux.ibm.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Phillip Potter <phil@philpotter.co.uk>,
Joe Breuer <linux-kernel@jmbreuer.net>,
Linux Power Management <linux-pm@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Hardening <linux-hardening@vger.kernel.org>,
Linux Regressions <regressions@lists.linux.dev>,
Linux SCSI <linux-scsi@vger.kernel.org>,
Alan Stern <stern@rowland.harvard.edu>,
Dan Williams <dan.j.williams@intel.com>,
Hannes Reinecke <hare@suse.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Martin Kepplinger <martin.kepplinger@puri.sm>
Subject: Re: Fwd: Waking up from resume locks up on sr device
Date: Mon, 12 Jun 2023 16:47:09 +0900 [thread overview]
Message-ID: <433015f6-9ca6-e4ce-e070-a75378419564@kernel.org> (raw)
In-Reply-To: <CAAd53p5CsAAX5G1J2WH5N5JT5dZB_BD2AW8WL-S=pHZtGXr1sw@mail.gmail.com>
On 6/12/23 16:36, Kai-Heng Feng wrote:
> On Mon, Jun 12, 2023 at 3:22 PM Damien Le Moal <dlemoal@kernel.org> wrote:
>>
>> On 6/12/23 15:09, Hannes Reinecke wrote:
>>> On 6/12/23 05:09, Damien Le Moal wrote:
>>>> On 6/11/23 00:03, Bart Van Assche wrote:
>>>>> On 6/10/23 06:27, Bagas Sanjaya wrote:
>>>>>> On 6/10/23 15:55, Pavel Machek wrote:
>>>>>>>>> #regzbot introduced: v5.0..v6.4-rc5 https://bugzilla.kernel.org/show_bug.cgi?id=217530
>>>>>>>>> #regzbot title: Waking up from resume locks up on SCSI CD/DVD drive
>>>>>>>>>
>>>>>>>> The reporter had found the culprit (via bisection), so:
>>>>>>>>
>>>>>>>> #regzbot introduced: a19a93e4c6a98c
>>>>>>> Maybe cc the authors of that commit?
>>>>>>
>>>>>> Ah! I forgot to do that! Thanks anyway.
>>>>>
>>>>> Hi Damien,
>>>>>
>>>>> Why does the ATA code call scsi_rescan_device() before system resume has
>>>>> finished? Would ATA devices still work with the patch below applied?
>>>>
>>>> I do not know the PM code well at all, need to dig into it. But your patch
>>>> worries me as it seems it would prevent rescan of the device on a resume, which
>>>> can be an issue if the device has changed.
>>>>
>>>> I am not yet 100% clear on the root cause for this, but I think it comes from
>>>> the fact that ata_port_pm_resume() runs before the sci device resume is done, so
>>>> with scsi_dev->power.is_suspended still true. And ata_port_pm_resume() calls
>>>> ata_port_resume_async() which triggers EH (which will do reset + rescan)
>>>> asynchronously. So it looks like we have scsi device resume and libata EH for
>>>> rescan fighting each others for the scan mutex and device lock, leading to deadlock.
>>>>
>>>> Trying to recreate this issue now to confirm and debug further. But I suspect
>>>> the solution to this may be best implemented in libata, not in scsi.
>>>> This looks definitely related to this thread:
>>>>
>>>> https://lore.kernel.org/linux-scsi/7b553268-69d3-913a-f9de-28f8d45bdb1e@acm.org/
>>>>
>>>> Similaraly to your comment on that thread, having to look at
>>>> dev->power.is_suspended is not ideal I think. What we need is to have ata and
>>>> scsi pm resume be synchronized, but I am not yet 100% clear on the scsi layer side.
>>>>
>>> Which is my feeling, too.
>>> libata runs rescan as part of the device discovery, so really it will
>>> run after resume. And consequently resume really cannot wait for rescan
>>> to finish.
>>>
>>> What I would be looking at is to decouple resume from libata device
>>> rescan, and have resume to complete before libata EH runs.
>>
>> That is the case now, for the ata port at least, even though that is not super
>> explicit, and not reliable. See ata_port_pm_resume(): I think that the call to
>> EH in ata_port_pm_resume() -> ata_port_resume_async() -> ata_port_request_pm()
>> -> ata_port_schedule_eh() should instead use a sync resume, leading to a sync EH
>> call.
>>
>> That EH execution essentially does ata_eh_handle_port_resume(), which calls into
>> the adapter resume operation. That in itself does not do much beside some
>> registers accesses to wakeup the port. There should be no issues doing that
>> synchronously.
>>
>> The problem is that after that is done, ata EH calls ata_std_error_handler() ->
>> ata_do_eh() -> ata_eh_recover() -> ata_eh_revalidate_and_attach() ->
>> schedule_work(&(ap->scsi_rescan_task)). And the rescan work calls
>> scsi_rescan_device() (yet in another context than EH) which causes the problem
>> when the scsi disk device has not been resumed yet (dev->power_is_suspended
>> still true).
>>
>> So it really looks like the solution should be to have ata_scsi_dev_rescan()
>> wait for the scsi device to resume first, but not sure how to do that with the
>> pm API. Digging...
>
> Probably use dpm_wait_for_children()? Right now it's an internal PM API.
But I am not sure if there is a relationship between ata_device and its
scsi_device (dev->sdev)... Need to clarify that.
>
> Rafael,
> What do you think?
>
> Kai-Heng
>
>>
>>>
>>> Cheers,
>>>
>>> Hannes
>>
>> --
>> Damien Le Moal
>> Western Digital Research
>>
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2023-06-12 7:51 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-09 11:04 Fwd: Waking up from resume locks up on sr device Bagas Sanjaya
2023-06-10 6:31 ` Joe Breuer
2023-06-10 6:38 ` Bagas Sanjaya
2023-06-10 8:55 ` Pavel Machek
2023-06-10 13:27 ` Bagas Sanjaya
2023-06-10 15:03 ` Bart Van Assche
2023-06-11 9:05 ` Joe Breuer
2023-06-11 11:31 ` Bagas Sanjaya
2023-06-14 4:49 ` Damien Le Moal
2023-06-14 5:37 ` Kai-Heng Feng
2023-06-14 6:31 ` Damien Le Moal
2023-06-14 7:22 ` Damien Le Moal
2023-06-14 6:57 ` Hannes Reinecke
2023-06-14 7:35 ` Damien Le Moal
2023-06-14 14:26 ` Alan Stern
2023-06-14 14:40 ` Rafael J. Wysocki
2023-06-14 18:04 ` Bart Van Assche
2023-06-14 22:44 ` Damien Le Moal
2023-06-15 0:10 ` Damien Le Moal
2023-06-15 4:40 ` Christoph Hellwig
2023-06-15 4:57 ` Damien Le Moal
2023-06-15 5:09 ` Christoph Hellwig
2023-06-12 3:09 ` Damien Le Moal
2023-06-12 6:09 ` Hannes Reinecke
2023-06-12 7:22 ` Damien Le Moal
2023-06-12 7:36 ` Kai-Heng Feng
2023-06-12 7:47 ` Damien Le Moal [this message]
2023-06-12 14:33 ` Alan Stern
2023-06-12 15:37 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=433015f6-9ca6-e4ce-e070-a75378419564@kernel.org \
--to=dlemoal@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=bagasdotme@gmail.com \
--cc=bvanassche@acm.org \
--cc=dan.j.williams@intel.com \
--cc=gpiccoli@igalia.com \
--cc=gregkh@linuxfoundation.org \
--cc=hare@suse.com \
--cc=hare@suse.de \
--cc=jejb@linux.ibm.com \
--cc=kai.heng.feng@canonical.com \
--cc=keescook@chromium.org \
--cc=len.brown@intel.com \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@jmbreuer.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linux@leemhuis.info \
--cc=martin.kepplinger@puri.sm \
--cc=martin.petersen@oracle.com \
--cc=pavel@ucw.cz \
--cc=phil@philpotter.co.uk \
--cc=rafael@kernel.org \
--cc=regressions@lists.linux.dev \
--cc=stern@rowland.harvard.edu \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.