From: Oded Gabbay <ogabbay@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Tal Cohen <talcohen@habana.ai>
Subject: [PATCH 11/15] habanalabs: no consecutive err when user context is enabled
Date: Thu, 27 Oct 2022 12:10:03 +0300 [thread overview]
Message-ID: <20221027091007.664797-11-ogabbay@kernel.org> (raw)
In-Reply-To: <20221027091007.664797-1-ogabbay@kernel.org>
From: Tal Cohen <talcohen@habana.ai>
Consecutive error protects a device reset loop from being triggered
due to h/w issues and enters the device into an unavailable state.
When user may cause the error, an unavailable state
will prevent the user from running its workloads.
The commit prevents entering consecutive state when a user context
is enabled.
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/device.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index bcd959924971..61ddcb1ce508 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -1320,6 +1320,10 @@ static void handle_reset_trigger(struct hl_device *hdev, u32 flags)
{
u32 cur_reset_trigger = HL_RESET_TRIGGER_DEFAULT;
+ /* No consecutive mechanism when user context exists */
+ if (hdev->is_compute_ctx_active)
+ return;
+
/*
* 'reset cause' is being updated here, because getting here
* means that it's the 1st time and the last time we're here
--
2.25.1
next prev parent reply other threads:[~2022-10-27 9:11 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-27 9:09 [PATCH 01/15] habanalabs: fix using freed pointer Oded Gabbay
2022-10-27 9:09 ` [PATCH 02/15] habanalabs: allow setting HBM BAR to other regions Oded Gabbay
2022-10-27 9:09 ` [PATCH 03/15] habanalabs/gaudi2: remove configurations to access the MSI-X doorbell Oded Gabbay
2022-10-27 9:09 ` [PATCH 04/15] habanalabs: fix user mappings calculation in case of page fault Oded Gabbay
2022-10-27 9:09 ` [PATCH 05/15] habanalabs: avoid divide by zero in device utilization Oded Gabbay
2022-10-27 9:09 ` [PATCH 06/15] habanalabs: add support for graceful hard reset Oded Gabbay
2022-10-27 9:09 ` [PATCH 07/15] habanalabs: add an option to control watchdog timeout via debugfs Oded Gabbay
2022-10-27 9:10 ` [PATCH 08/15] habanalabs/gaudi: use graceful hard reset for F/W events Oded Gabbay
2022-10-27 9:10 ` [PATCH 09/15] habanalabs/gaudi2: " Oded Gabbay
2022-10-27 9:10 ` [PATCH 10/15] habanalabs: use graceful hard reset for CS timeouts Oded Gabbay
2022-10-27 9:10 ` Oded Gabbay [this message]
2022-10-27 9:10 ` [PATCH 12/15] habanalabs: zero ts registration buff when allocated Oded Gabbay
2022-10-27 9:10 ` [PATCH 13/15] habanalabs: fix PCIe access to SRAM via debugfs Oded Gabbay
2022-10-27 9:10 ` [PATCH 14/15] habanalabs: add warning print upon a PCI error Oded Gabbay
2022-10-27 9:10 ` [PATCH 15/15] habanalabs: remove redundant gaudi2_sec asic type Oded Gabbay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221027091007.664797-11-ogabbay@kernel.org \
--to=ogabbay@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=talcohen@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.