From: Oded Gabbay <ogabbay@kernel.org>
To: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Cc: Tomer Tayar <ttayar@habana.ai>
Subject: [PATCH 6/6] accel/habanalabs: abort device reset for consecutive heartbeat failures
Date: Tue, 2 Jan 2024 17:06:54 +0200 [thread overview]
Message-ID: <20240102150654.522555-6-ogabbay@kernel.org> (raw)
In-Reply-To: <20240102150654.522555-1-ogabbay@kernel.org>
From: Tomer Tayar <ttayar@habana.ai>
The mechanism of aborting device reset for consecutive fatal errors is
currently only for fatal errors that are reported by FW.
A non-responsive FW and consecutive heartbeat failures is also
considered fatal, so add them as well to this mechanism to avoid
recurring device reset in such a case.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/accel/habanalabs/common/device.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index 15891de6cf39..581fc99ad89b 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -1769,14 +1769,16 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
hdev->device_cpu_disabled = false;
hdev->reset_info.hard_reset_pending = false;
+ /*
+ * Put the device in an unusable state if there are 2 back to back resets due to
+ * fatal errors.
+ */
if (hdev->reset_info.reset_trigger_repeated &&
- (hdev->reset_info.prev_reset_trigger ==
- HL_DRV_RESET_FW_FATAL_ERR)) {
- /* if there 2 back to back resets from FW,
- * ensure driver puts the driver in a unusable state
- */
+ (hdev->reset_info.prev_reset_trigger == HL_DRV_RESET_FW_FATAL_ERR ||
+ hdev->reset_info.prev_reset_trigger ==
+ HL_DRV_RESET_HEARTBEAT)) {
dev_crit(hdev->dev,
- "%s Consecutive FW fatal errors received, stopping hard reset\n",
+ "%s Consecutive fatal errors, stopping hard reset\n",
dev_name(&(hdev)->pdev->dev));
rc = -EIO;
goto out_err;
--
2.34.1
prev parent reply other threads:[~2024-01-02 15:07 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-02 15:06 [PATCH 1/6] accel/habanalabs: check failure of eventfd_signal Oded Gabbay
2024-01-02 15:06 ` [PATCH 2/6] accel/habanalabs/gaudi2: add interrupt affinity for user interrupts Oded Gabbay
2024-01-03 15:07 ` kernel test robot
2024-01-03 21:46 ` kernel test robot
2024-01-02 15:06 ` [PATCH 3/6] accel/habanalabs: increase HL_MAX_STR to 64 bytes to avoid warnings Oded Gabbay
2024-01-02 15:06 ` [PATCH 4/6] accel/habanalabs: fix DRAM BAR base address calculation Oded Gabbay
2024-01-02 15:06 ` [PATCH 5/6] accel/habanalabs/gaudi2: move HMMU page tables to device memory Oded Gabbay
2024-01-03 8:48 ` kernel test robot
2024-01-04 12:47 ` kernel test robot
2024-01-02 15:06 ` Oded Gabbay [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240102150654.522555-6-ogabbay@kernel.org \
--to=ogabbay@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ttayar@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.