From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29F3C1DF75B for ; Mon, 22 Jun 2026 17:46:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782150390; cv=none; b=IH0wfA2VlSmMKo0ayF2a1lKQvhB2QXAXJhgWHTfAUOZjlYpF/pPoLmyNCrK5xkqYJIawF5sxZOJPDC76fud7COAEZhot2FVhVpMlK3DAteBa4oWTh8+yfQmtacw+T7geXJVwPDNHVkeoJEhZ2jYwYHVjS8hnBqo5KAENC//Fvc0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782150390; c=relaxed/simple; bh=Q7nfiHO1vNXX/vdbDv/cBxZ6UHfeLk6BayBUSyoftqM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ukLlQhuC9pKMxGOQ4rCD1bm87FCwfxogcoSuNKBfx8gzU7Pq8kZODYfcfj6dyP+cqYjCLM8zyFqxwgnH+Ws6kphn0OQxMSnNGXzSsSkxoo9xfa6M5YxBzJP2Uih9tupC5F6j7LAuXZ3cqREh45s9HuFfNabhAuLl9bLLgVLUW+A= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org; spf=pass smtp.mailfrom=linaro.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b=LweZ+0J9; arc=none smtp.client-ip=209.85.215.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="LweZ+0J9" Received: by mail-pg1-f182.google.com with SMTP id 41be03b00d2f7-c88b8fe9059so1784065a12.3 for ; Mon, 22 Jun 2026 10:46:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1782150387; x=1782755187; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=lVXqXLALkaEuqZj/gEMaREo8AsSoet4mhKT25GIVD3U=; b=LweZ+0J9vKxOXUa0VCqc2RjLtkIxdO0NqOUi5UOTwjuWN+/d3YsLpBfbsUn6JyWi+D AYt6YSoYRd1ZgkHDLQwzVOUkHI9EZ8b0ThUWYZJzWPfpO9ak70RNItkbW8NUVWUU3pQh 9MLqrxCBV4uzacoigJ8H9V1zs8CHD3RLBMp3J7z4XS52ZAI449XWR4hNZFUF3E+gP0tz pLOM5ZJFx1qxainQRjWkyLG9fRTixCQMQaWLdJo4r3MpHoxocsr5kQn5+NXrUdAhOwc0 mSCKVLGwVchIL75GQaxC/mY0iCKqD5/uTbDX73LXyg3U0o+pFnYOBtAIu0sbQg67o91i 7PVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782150387; x=1782755187; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lVXqXLALkaEuqZj/gEMaREo8AsSoet4mhKT25GIVD3U=; b=masTHpJMvg7L/g4InL7JsCU59oLnHNz25XY2Vchui8av6b0PzYl7HEjzYOaRaTdgW6 uKCWts/wC0X7t4E9ZeE225LHLfJJg9jHrzB0mTOIVRUC9KrNBcCvzUSmjPHPL+H2RgIb xbJMahLwVERHdD//VQ2Oev17uMsoimt8HveT51HkDDpYkMMu2Hbr05sZDU7+xQhARaAj 5XG6pG1KujKCbr118x9oGYap5DP7jd3lbm6ZW9Llm+v5C3UrPlkhg67vb/hwxjcc8Jrq 6/Nomm7cP+UvfAaNh0Pr9oS5a+BFaxn3ku2h+NN+PBfJbyptnokbxVX+uh6YtXgk9IpM IQ0g== X-Forwarded-Encrypted: i=1; AFNElJ9tessV1sAjUs9Eb8A546IroNKN4a8dv+trQz6u/SA3hWG3i20ozOiHWN4q1Gdqv4l0KbnRG/EIXH6vzCw=@vger.kernel.org X-Gm-Message-State: AOJu0YxPxm8oQ6m6E+l6zSuQKSZd1MqnfoAT/WBseUPq9bD5N/6Ezmgw EbhSi1MmZhQLTmOU5epUeY95UyeLvKjouTovP4QeliH2pDv6T+Paf3NVzv35mMBBsCc= X-Gm-Gg: AfdE7cmP7w0qffdzBYPc/UlFBNiN3OtVPiHy7Wl06kTwtweR4lmUeWgK5t4loACjIIS WalW8PrFruePk19HFhGgLZBCx87+H05XZ9YBtq58ZiNH3NGS3lJxJGxI7/mIGhUv0NjHYlAsW4l 4ZXLXquMW+c5nRdjsLLQlPCEFUOFKNOtsZZ7dALBGjj0twf0jhSkSNaTfDBu0ogVL77PZQUu9s5 0/+gBqexCbN7/br5m2xrQhZsUjH8ifZIiZJUmTDOED/TEx21KC47MH8KZlydCEH9p3PtbwP74P4 WIWBT0ecPjGEIo1KLy5SQis86Pr29CXZBYm6M/sndyJbu2FnwlC/zRnvfodIwAXg4Dv27LqQ5h1 RtmBf88vmzh8Urcv7SujVXjC5K3mLH4jpsqSVCBvqpOBCKC7ple6f18jSwkKoWn4qjWt2miLuuM yR3GHk+iGTiQz26NI/ X-Received: by 2002:a05:6a20:6a21:b0:3b2:86c9:baa5 with SMTP id adf61e73a8af0-3bb6c4644e7mr15489071637.38.1782150387218; Mon, 22 Jun 2026 10:46:27 -0700 (PDT) Received: from p14s ([2604:3d09:148c:c800:b36f:3823:3d45:dff4]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c8bc563c6e1sm7587091a12.15.2026.06.22.10.46.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jun 2026 10:46:26 -0700 (PDT) Date: Mon, 22 Jun 2026 11:46:24 -0600 From: Mathieu Poirier To: tanmay.shah@amd.com Cc: andersson@kernel.org, linux-remoteproc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 2/2] remoteproc: xlnx: add crash detection mechanism Message-ID: References: <20260610182711.1758544-1-tanmay.shah@amd.com> <20260610182711.1758544-3-tanmay.shah@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Tue, Jun 16, 2026 at 02:52:50PM -0500, Shah, Tanmay wrote: > Hello, > > Please find my comments below: > > On 6/16/2026 11:01 AM, Mathieu Poirier wrote: > > On Wed, Jun 10, 2026 at 11:27:11AM -0700, Tanmay Shah wrote: > >> Remote processor will report the crash reason via the resource table > >> and notify the host via mailbox notification. The host checks this > >> crash reason on every mailbox notification from the remote and report > >> to the rproc core framework. Then the rproc core framework will start > >> the recovery process. > >> > >> Signed-off-by: Tanmay Shah > >> --- > >> > >> changes in v5 > >> - use local variable to access crash report pointer > >> - End crash report string with '\0' without relying on fw > >> > >> drivers/remoteproc/xlnx_r5_remoteproc.c | 73 ++++++++++++++++++++++++- > >> 1 file changed, 72 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c > >> index 3349d1877751..86afff9f3b40 100644 > >> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > >> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > >> @@ -112,6 +112,10 @@ struct rsc_tbl_data { > >> const uintptr_t rsc_tbl; > >> } __packed; > >> > >> +enum xlnx_rproc_fw_rsc { > >> + XLNX_RPROC_FW_CRASH_REASON = RSC_VENDOR_START, > > > > I would call this XLNX_RPROC_FW_CRASH_REPORT > > > > Ack. > > >> +}; > >> + > >> /* > >> * Hardcoded TCM bank values. This will stay in driver to maintain backward > >> * compatibility with device-tree that does not have TCM information. > >> @@ -131,9 +135,27 @@ static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = { > >> {0xffe30000UL, 0x30000, 0x10000UL, PD_R5_1_BTCM, "btcm1"}, > >> }; > >> > >> +#define CRASH_REASON_STR_LEN 16 > >> + > >> +/** > >> + * struct xlnx_rproc_crash_report - resource to know crash status and reason > >> + * > >> + * @version: version of this resource > >> + * @crash_state: if true, the rproc is notifying crash, time to recover > >> + * @crash_reason: number to describe reason of crash > >> + * @crash_reason_str: short string description of crash reason > >> + */ > >> +struct xlnx_rproc_crash_report { > >> + u8 version; > >> + u8 crash_state; > >> + u8 crash_reason; > > > > Do you need 2 variables? Would it be possible for @crash_reason != 0 to > > indicate a crash, making @cash_state irrelevant? > > > > Actually initially I had defined crash_reason only, but when I looked at > how crash reason/type is defined in the kernel, I got idea to keep > crash_state separate than crash_reason: > > https://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git/tree/include/linux/remoteproc.h?h=for-next#n183 > > enum rproc_crash_type { > RPROC_MMUFAULT, > RPROC_WATCHDOG, > RPROC_FATAL_ERROR, > }; > > So, if crash_reason is defined like above, then it's easy to make > mistake and not define no_crash = 0. Different firmware projects can > treat crash_reason differently. I would like to keep crash_state and > crash_reason separate and not impose policy on firmware projects on how > to define crash_reason. > I agree with your assessment. Please move "crash_state" to "crashed". > >> + char crash_reason_str[CRASH_REASON_STR_LEN]; > >> +} __packed; > >> + > >> /** > >> * struct zynqmp_r5_core - remoteproc core's internal data > >> * > >> + * @crash_report: rproc crash state and reason > >> * @rsc_tbl_va: resource table virtual address > >> * @sram: Array of sram memories assigned to this core > >> * @num_sram: number of sram for this core > >> @@ -147,6 +169,7 @@ static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = { > >> * @ipi: pointer to mailbox information > >> */ > >> struct zynqmp_r5_core { > >> + struct xlnx_rproc_crash_report *crash_report; > >> void __iomem *rsc_tbl_va; > >> struct zynqmp_sram_bank *sram; > >> int num_sram; > >> @@ -204,11 +227,27 @@ static int event_notified_idr_cb(int id, void *ptr, void *data) > >> */ > >> static void handle_event_notified(struct work_struct *work) > >> { > >> + struct xlnx_rproc_crash_report *report; > >> + struct zynqmp_r5_core *r5_core; > >> struct mbox_info *ipi; > >> struct rproc *rproc; > >> > >> ipi = container_of(work, struct mbox_info, mbox_work); > >> rproc = ipi->r5_core->rproc; > >> + r5_core = ipi->r5_core; > >> + report = r5_core->crash_report; > >> + > >> + /* report crash only if expected */ > >> + if (report && report->crash_state) { > >> + if (rproc->state == RPROC_ATTACHED || rproc->state == RPROC_RUNNING) { > >> + report->crash_reason_str[CRASH_REASON_STR_LEN - 1] = '\0'; > >> + dev_warn(&rproc->dev, "crash reason id: %d %s\n", > >> + report->crash_reason, report->crash_reason_str); > >> + rproc_report_crash(rproc, RPROC_FATAL_ERROR); > >> + report->crash_state = false; > >> + return; > >> + } > >> + } > >> > >> /* > >> * We only use IPI for interrupt. The RPU firmware side may or may > >> @@ -448,6 +487,13 @@ static int zynqmp_r5_rproc_stop(struct rproc *rproc) > >> if (ret) > >> dev_err(r5_core->dev, "core force power down failed\n"); > >> > >> + /* > >> + * Clear attach on recovery flag during stop operation. The next state > >> + * of the remote processor is expected to be "Running" state. In this > >> + * state boot recovery method must take place over attach on recovery. > >> + */ > >> + test_and_clear_bit(RPROC_FEAT_ATTACH_ON_RECOVERY, rproc->features); > > > > Why is there a need to play set/clear the RPROC_FEAT_ATTACH_ON_RECOVERY flag > > here and in zynqmp_r5_attach() and zynqmp_r5_detach()? I think we talked about > > that before but can't remember the outcome of that conversation. > > > > The remote processor can go through following life cycle: > > attach() -> stop() -> start(). When this happens, ATTACH_ON_RECOVERY is > not valid anymore. At this point, it should be BOOT_RECOVERY which is > indicated by clearing ATTACH_RECOVERY flag. That is why it is important > to clear this bit on stop(). I'm good with that part... > > Now in detach() it is not needed. I am just clearing the flag as part of > a cleanup sequence. > > >> + > >> return ret; > >> } > >> > >> @@ -869,6 +915,9 @@ static int zynqmp_r5_get_rsc_table_va(struct zynqmp_r5_core *r5_core) > >> > >> static int zynqmp_r5_attach(struct rproc *rproc) > >> { > >> + /* Enable attach on recovery method. Clear it during rproc stop. */ > >> + rproc_set_feature(rproc, RPROC_FEAT_ATTACH_ON_RECOVERY); > >> + > >> dev_dbg(&rproc->dev, "rproc %d attached\n", rproc->index); > >> > >> return 0; > >> @@ -883,9 +932,30 @@ static int zynqmp_r5_detach(struct rproc *rproc) > >> */ > >> zynqmp_r5_rproc_kick(rproc, 0); > >> > >> + clear_bit(RPROC_FEAT_ATTACH_ON_RECOVERY, rproc->features); > >> + ... but why is this needed? Why can't it just be set once in zynqmp_r5_add_rproc_core() ? The only time the bit should be cleared is in zynqmp_r5_rproc_stop(). > >> return 0; > >> } > >> > >> +static int zynqmp_r5_handle_rsc(struct rproc *rproc, u32 rsc_type, void *rsc, > >> + int offset, int avail) > >> +{ > >> + struct zynqmp_r5_core *r5_core = rproc->priv; > >> + void *rsc_offset = (r5_core->rsc_tbl_va + offset); > >> + > > > > if (rsc_type != XLNX_RPROC_FW_CRASH_REASON) > > return RSC_IGNORED; > > > > Ack. > > >> + if (rsc_type == XLNX_RPROC_FW_CRASH_REASON) { > >> + r5_core->crash_report = rsc_offset; > >> + /* reset all values */ > >> + r5_core->crash_report->crash_state = false; > >> + r5_core->crash_report->crash_reason = 0; > >> + r5_core->crash_report->crash_reason_str[0] = '\0'; > >> + } else { > >> + return RSC_IGNORED; > >> + } > >> + > >> + return RSC_HANDLED; > >> +} > >> + > >> static const struct rproc_ops zynqmp_r5_rproc_ops = { > >> .prepare = zynqmp_r5_rproc_prepare, > >> .unprepare = zynqmp_r5_rproc_unprepare, > >> @@ -900,6 +970,7 @@ static const struct rproc_ops zynqmp_r5_rproc_ops = { > >> .get_loaded_rsc_table = zynqmp_r5_get_loaded_rsc_table, > >> .attach = zynqmp_r5_attach, > >> .detach = zynqmp_r5_detach, > >> + .handle_rsc = zynqmp_r5_handle_rsc, > >> }; > >> > >> /** > >> @@ -939,7 +1010,7 @@ static struct zynqmp_r5_core *zynqmp_r5_alloc_rproc_core(struct device *cdev) > >> > >> rproc_coredump_set_elf_info(r5_rproc, ELFCLASS32, EM_ARM); > >> > >> - r5_rproc->recovery_disabled = true; > >> + r5_rproc->recovery_disabled = false; > > > > I'm good with the overall architecture of this set. > > > > Okay. If above comments looks good, I will send v6 soon. > > > Regards, > > Mathieu > > > >> r5_rproc->has_iommu = false; > >> r5_rproc->auto_boot = false; > >> > >> -- > >> 2.34.1 > >> >