All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Poirier <mathieu.poirier@linaro.org>
To: Tanmay Shah <tanmay.shah@amd.com>
Cc: andersson@kernel.org, linux-remoteproc@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 0/2] remoteproc: xlnx: remote crash recovery
Date: Wed, 11 Mar 2026 11:21:00 -0600	[thread overview]
Message-ID: <abGkfM9O7Qpd1N68@p14s> (raw)
In-Reply-To: <20260303233533.2310513-1-tanmay.shah@amd.com>

On Tue, Mar 03, 2026 at 03:35:31PM -0800, Tanmay Shah wrote:
> Remote processor can crash or hang during normal execution. Linux
> remoteproc framework supports different mechanisms to recover the
> remote processor and re-establish the RPMsg communication in such case.
> 
> Crash reporting on AMD-Xilinx platform:
> 
> 1) Using debugfs node
> 
> User can report the crash to the core framework via debugfs node using
> following command:
> 
> echo 1 > /sys/kernel/debug/remoteproc/remoteproc0/crash
> 
> 2) Remoteproc notify to the host about crash state and crash reason
> via the resource table
> 
> This is a platform specific method where the remote firmware contains
> vendor specific resource to update the crash state and the crash
> reason. Then the remote notifies the crash to the host via mailbox
> notification. The host then will check this resource on every mbox
> notification and reports the crash to the core framework if needed.
> 
> Crash recovery mechanism on AMD-Xilnx platform:
> 
> There are two mechanisms available to recover the remote processor from
> the crash. 1) boot recovery, 2) attach on recovery
> 
> Remoteproc core framework will choose proper mechanism based on the
> rproc features set by the platform driver.
> 
> 1) Boot recovery
> 
> This is the default mechanism to recover the remote processor.
> In this method core framework will first stop the remote processor,
> load the firmware again and then starts the remote processor. On
> AMD-Xilinx platforms this method is supported. The default coredump
> method is supported.
> 
> 2) Attach on recovery
> 
> If RPROC_ATTACH_ON_RECOVERY feature is enabled by the platform driver,
> then the core framework will choose this method for recovery.
> 
> On versal and later platforms following is the sequence of events expected
> during remoteproc crash and attach on recovery:
> 
> a) Remoteproc attach/detach flow is working, and RPMsg comm is established
> b) Remote processor (RPU) crashed (crash not reported yet)
> c) Platform management controller is instructed to stop and reload elf
>    on inactive remote processor before reboot (platform specific method)
> d) Platform management controller reboots the remote processor
> e) Remote processor boots again, and detects previous crash (platform
>    specific mechanism to detect the crash)
> f) Remote processor Reports crash to the Linux (Host) and wait for
>    the recovery.
> g) Linux performs full detach and reattach to remote processor.
> h) Normal RPMsg communication is established.
> 
> It is required to destroy all RPMsg related resources and recreate them
> during recovery to establish successful RPMsg communication. To achieve
> this complete rproc_detach followed by rproc_boot calls are needed. That
> is what this patch-series is fixing along with adding rproc recovery
> methods for AMD-Xilinx platforms.
> 
> Change log:
> 
> Changes in 3: 
>   - both rproc_attach_recovery() and
>     rproc_boot_recovery() are called the same way.
>   - remove unrelated changes
>   - %s/kick/mailbox notification/
>   - %s/core framework/rproc core framework/
>   - fold simple function within zynqmp_r5_handle_rsc().
>   - remove spurious change
>   - reset crash state after reporting the crash
>   - document set and reset of ATTACH_ON_RECOVERY flag
>   - set recovery_disabled flag to false
>   - check condition rproc->crash_reason != NULL
>

For V3 Bjorn made several comments in relation with QCOM use cases.  As such I
will let him continue with this patchset.

Thanks,
Mathieu

> Changes in v2:
>   - use rproc_boot instead of rproc_attach
>   - move debug message early in the function
>   - clear attach recovery boot flag during detach and stop ops
> Tanmay Shah (2):
>   remoteproc: core: full attach detach during recovery
>   remoteproc: xlnx: add crash detection mechanism
> 
>  drivers/remoteproc/remoteproc_core.c    | 15 +++++-
>  drivers/remoteproc/xlnx_r5_remoteproc.c | 71 ++++++++++++++++++++++++-
>  2 files changed, 84 insertions(+), 2 deletions(-)
> 
> 
> base-commit: 098493c6dced7b02545e8bd0053ef4099a2b769e
> -- 
> 2.34.1
> 

  parent reply	other threads:[~2026-03-11 17:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-03 23:35 [PATCH v4 0/2] remoteproc: xlnx: remote crash recovery Tanmay Shah
2026-03-03 23:35 ` [PATCH v4 1/2] remoteproc: core: full attach detach during recovery Tanmay Shah
2026-03-03 23:35 ` [PATCH v4 2/2] remoteproc: xlnx: add crash detection mechanism Tanmay Shah
2026-03-11 17:21 ` Mathieu Poirier [this message]
2026-03-17 21:16 ` [PATCH v4 0/2] remoteproc: xlnx: remote crash recovery Shah, Tanmay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abGkfM9O7Qpd1N68@p14s \
    --to=mathieu.poirier@linaro.org \
    --cc=andersson@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-remoteproc@vger.kernel.org \
    --cc=tanmay.shah@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.