public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Johan Hovold <johan@kernel.org>
To: manivannan.sadhasivam@linaro.org
Cc: mhi@lists.linux.dev, Loic Poulain <loic.poulain@linaro.org>,
	linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH 2/2] bus: mhi: host: pci_generic: Recover the device synchronously from mhi_pci_runtime_resume()
Date: Wed, 22 Jan 2025 16:24:27 +0100	[thread overview]
Message-ID: <Z5ENq9EMPlNvxNOF@hovoldconsulting.com> (raw)
In-Reply-To: <20250108-mhi_recovery_fix-v1-2-a0a00a17da46@linaro.org>

On Wed, Jan 08, 2025 at 07:09:28PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> 
> Currently, in mhi_pci_runtime_resume(), if the resume fails, recovery_work
> is started asynchronously and success is returned. But this doesn't align
> with what PM core expects as documented in
> Documentation/power/runtime_pm.rst:
> 
> "Once the subsystem-level resume callback (or the driver resume callback,
> if invoked directly) has completed successfully, the PM core regards the
> device as fully operational, which means that the device _must_ be able to
> complete I/O operations as needed.  The runtime PM status of the device is
> then 'active'."
> 
> So the PM core ends up marking the runtime PM status of the device as
> 'active', even though the device is not able to handle the I/O operations.
> This same condition more or less applies to system resume as well.
> 
> So to avoid this ambiguity, try to recover the device synchronously from
> mhi_pci_runtime_resume() and return the actual error code in the case of
> recovery failure.
> 
> For doing so, move the recovery code to __mhi_pci_recovery_work() helper
> and call that from both mhi_pci_recovery_work() and
> mhi_pci_runtime_resume(). Former still ignores the return value, while the
> latter passes it to PM core.
> 
> Cc: stable@vger.kernel.org # 5.13
> Reported-by: Johan Hovold <johan@kernel.org>
> Closes: https://lore.kernel.org/mhi/Z2PbEPYpqFfrLSJi@hovoldconsulting.com
> Fixes: d3800c1dce24 ("bus: mhi: pci_generic: Add support for runtime PM")
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

Reasoning above makes sense, and I do indeed see resume taking five
seconds longer with this patch as Loic suggested it would.

Unfortunately, something else is broken as the recovery code now
deadlocks again when the modem fails to resume (with both patches
applied):

[  729.833701] PM: suspend entry (deep)
[  729.841377] Filesystems sync: 0.000 seconds
[  729.867672] Freezing user space processes
[  729.869494] Freezing user space processes completed (elapsed 0.001 seconds)
[  729.869499] OOM killer disabled.
[  729.869501] Freezing remaining freezable tasks
[  729.870882] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[  730.184254] mhi-pci-generic 0005:01:00.0: mhi_pci_runtime_resume
[  730.190643] mhi mhi0: Resuming from non M3 state (SYS ERROR)
[  730.196587] mhi-pci-generic 0005:01:00.0: failed to resume device: -22
[  730.203412] mhi-pci-generic 0005:01:00.0: device recovery started

I've reproduced this three times in three different paths (runtime
resume before suspend; runtime resume during suspend; and during system
resume).

I didn't try to figure what causes the deadlock this time (and lockdep
does not trigger), but you should be able to reproduce this by
instrumenting a resume failure.

Johan

  parent reply	other threads:[~2025-01-22 15:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-08 13:39 [PATCH 0/2] bus: mhi: host: pci_generic: Couple of recovery fixes Manivannan Sadhasivam via B4 Relay
2025-01-08 13:39 ` [PATCH 1/2] bus: mhi: host: pci_generic: Use pci_try_reset_function() to avoid deadlock Manivannan Sadhasivam via B4 Relay
2025-01-08 14:46   ` Loic Poulain
2025-01-22 15:11   ` Johan Hovold
2025-02-19 13:13     ` Manivannan Sadhasivam
2025-02-19 13:52       ` Johan Hovold
2025-02-19 14:14         ` Manivannan Sadhasivam
2025-01-08 13:39 ` [PATCH 2/2] bus: mhi: host: pci_generic: Recover the device synchronously from mhi_pci_runtime_resume() Manivannan Sadhasivam via B4 Relay
2025-01-08 15:19   ` Loic Poulain
2025-01-08 16:02     ` Manivannan Sadhasivam
2025-01-09 20:50       ` Loic Poulain
2025-01-12  4:23         ` Manivannan Sadhasivam
2025-01-22 15:24   ` Johan Hovold [this message]
2025-01-22 17:25     ` Johan Hovold

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5ENq9EMPlNvxNOF@hovoldconsulting.com \
    --to=johan@kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=loic.poulain@linaro.org \
    --cc=manivannan.sadhasivam@linaro.org \
    --cc=mhi@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox