From: Oded Gabbay <ogabbay@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Tal Cohen <talcohen@habana.ai>
Subject: [PATCH 12/17] habanalabs/gaudi: notify user process on device unavailable
Date: Mon, 20 Jun 2022 16:04:27 +0300 [thread overview]
Message-ID: <20220620130432.1180451-12-ogabbay@kernel.org> (raw)
In-Reply-To: <20220620130432.1180451-1-ogabbay@kernel.org>
From: Tal Cohen <talcohen@habana.ai>
When a device error occurs, user process would like to get some
indication on the error by reading some device HW info. If the
device is unavailable, user process can't perform any HW device
reading.
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/gaudi/gaudi.c | 5 ++++-
include/uapi/misc/habanalabs.h | 12 +++++++-----
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 1156ec7dacc1..939d2636b9ed 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -8063,7 +8063,10 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
if (hdev->asic_prop.fw_security_enabled && !reset_direct) {
flags = HL_DRV_RESET_HARD | HL_DRV_RESET_BYPASS_REQ_TO_FW | fw_fatal_err_flag;
- event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
+
+ /* notify on device unavailable while the reset triggered by fw */
+ event_mask |= (HL_NOTIFIER_EVENT_DEVICE_RESET |
+ HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE);
} else if (hdev->hard_reset_on_fw_events) {
flags = HL_DRV_RESET_HARD | HL_DRV_RESET_DELAY | fw_fatal_err_flag;
event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index 18f86d259421..78aecea4684d 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -1432,15 +1432,17 @@ struct hl_debug_args {
/*
* Notifier event values - for the notification mechanism and the HL_INFO_GET_EVENTS command
*
- * HL_NOTIFIER_EVENT_TPC_ASSERT - Indicates TPC assert event
- * HL_NOTIFIER_EVENT_UNDEFINED_OPCODE - Indicates undefined operation code
- * HL_NOTIFIER_EVENT_DEVICE_RESET - Indicates device requires a reset
- * HL_NOTIFIER_EVENT_CS_TIMEOUT - Indicates CS timeout error
+ * HL_NOTIFIER_EVENT_TPC_ASSERT - Indicates TPC assert event
+ * HL_NOTIFIER_EVENT_UNDEFINED_OPCODE - Indicates undefined operation code
+ * HL_NOTIFIER_EVENT_DEVICE_RESET - Indicates device requires a reset
+ * HL_NOTIFIER_EVENT_CS_TIMEOUT - Indicates CS timeout error
+ * HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE - Indicates device is unavailable
*/
#define HL_NOTIFIER_EVENT_TPC_ASSERT (1ULL << 0)
#define HL_NOTIFIER_EVENT_UNDEFINED_OPCODE (1ULL << 1)
#define HL_NOTIFIER_EVENT_DEVICE_RESET (1ULL << 2)
-#define HL_NOTIFIER_EVENT_CS_TIMEOUT (1ULL << 3)
+#define HL_NOTIFIER_EVENT_CS_TIMEOUT (1ULL << 3)
+#define HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE (1ULL << 4)
/*
* Various information operations such as:
--
2.25.1
next prev parent reply other threads:[~2022-06-20 13:14 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-20 13:04 [PATCH 01/17] habanalabs/gaudi: collect undefined opcode error info Oded Gabbay
2022-06-20 13:04 ` [PATCH 02/17] habanalabs: expose undefined opcode status via info ioctl Oded Gabbay
2022-06-20 13:04 ` [PATCH 03/17] habanalabs/gaudi: invoke device reset from one code block Oded Gabbay
2022-06-20 13:04 ` [PATCH 04/17] habanalabs/gaudi: send device reset notification Oded Gabbay
2022-06-20 13:04 ` [PATCH 05/17] habanalabs: send an event notification when CS timeout occurs Oded Gabbay
2022-06-20 13:04 ` [PATCH 06/17] habanalabs: avoid unnecessary error print Oded Gabbay
2022-06-20 13:04 ` [PATCH 07/17] habanalabs/gaudi: fix incorrect MME offset calculation Oded Gabbay
2022-06-20 13:04 ` [PATCH 08/17] habanalabs: add validity check for cq counter offset Oded Gabbay
2022-06-20 13:04 ` [PATCH 09/17] habanalabs/gaudi: fix shift out of bounds Oded Gabbay
2022-06-20 13:04 ` [PATCH 10/17] habanalabs: fix NULL dereference on cs timeout Oded Gabbay
2022-06-20 13:04 ` [PATCH 11/17] habanalabs: remove unused get_dma_desc_list_size Oded Gabbay
2022-06-20 13:04 ` Oded Gabbay [this message]
2022-06-20 13:04 ` [PATCH 13/17] habanalabs: add critical indication in sram ecc Oded Gabbay
2022-06-20 13:04 ` [PATCH 14/17] habanalabs: check fence pointer before use Oded Gabbay
2022-06-20 13:04 ` [PATCH 15/17] habanalabs: print pointer with correct modifier Oded Gabbay
2022-06-20 13:04 ` [PATCH 16/17] habanalabs: use kvcalloc when possible Oded Gabbay
2022-06-20 13:04 ` [PATCH 17/17] habanalabs: fix comment style Oded Gabbay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220620130432.1180451-12-ogabbay@kernel.org \
--to=ogabbay@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=talcohen@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox