public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oded Gabbay <ogabbay@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Tal Cohen <talcohen@habana.ai>
Subject: [PATCH 06/13] habanalabs/gaudi2: add device unavailable notification
Date: Thu,  6 Oct 2022 11:23:01 +0300	[thread overview]
Message-ID: <20221006082308.1266716-6-ogabbay@kernel.org> (raw)
In-Reply-To: <20221006082308.1266716-1-ogabbay@kernel.org>

From: Tal Cohen <talcohen@habana.ai>

Device unavailable notifies the user that there isn't an option to
retrieve debug information from the device.
When a critical device error occurs and the f/w performs the device
reset, a device unavailable notification shall be sent to the user
process.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/gaudi2/gaudi2.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/gaudi2/gaudi2.c b/drivers/misc/habanalabs/gaudi2/gaudi2.c
index 90e1d7fcb17a..e05ffaa047a2 100644
--- a/drivers/misc/habanalabs/gaudi2/gaudi2.c
+++ b/drivers/misc/habanalabs/gaudi2/gaudi2.c
@@ -8576,7 +8576,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent
 {
 	u32 ctl, reset_flags = HL_DRV_RESET_HARD | HL_DRV_RESET_DELAY;
 	struct gaudi2_device *gaudi2 = hdev->asic_specific;
-	bool reset_required = false, skip_reset = false;
+	bool reset_required = false, skip_reset = false, is_critical = false;
 	int index, sbte_index;
 	u64 event_mask = 0;
 	u16 event_type;
@@ -8602,6 +8602,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent
 		reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 		event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 		reset_required = gaudi2_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
+		is_critical = eq_entry->ecc_data.is_critical;
 		break;
 
 	case GAUDI2_EVENT_TPC0_QM ... GAUDI2_EVENT_PDMA1_QM:
@@ -8976,9 +8977,16 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent
 	return;
 
 reset_device:
-	if (hdev->hard_reset_on_fw_events) {
+	if (hdev->asic_prop.fw_security_enabled && is_critical) {
+		reset_flags = HL_DRV_RESET_HARD | HL_DRV_RESET_BYPASS_REQ_TO_FW;
+
+		/* notify on device unavailable while the reset triggered by fw */
+		event_mask |= (HL_NOTIFIER_EVENT_DEVICE_RESET |
+					HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE);
 		hl_device_reset(hdev, reset_flags);
+	} else if (hdev->hard_reset_on_fw_events) {
 		event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
+		hl_device_reset(hdev, reset_flags);
 	} else {
 		if (!gaudi2_irq_map_table[event_type].msg)
 			hl_fw_unmask_irq(hdev, event_type);
-- 
2.25.1


  parent reply	other threads:[~2022-10-06  8:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-06  8:22 [PATCH 01/13] habanalabs: use lower_32_bits() Oded Gabbay
2022-10-06  8:22 ` [PATCH 02/13] habanalabs/gaudi2: fix module ID for RAZWI handling Oded Gabbay
2022-10-06  8:22 ` [PATCH 03/13] habanalabs: add page fault info uapi Oded Gabbay
2022-10-06  8:22 ` [PATCH 04/13] habanalabs: replace 'pf' to 'prefetch' Oded Gabbay
2022-10-06  8:23 ` [PATCH 05/13] habanalabs/gaudi2: remove privileged MME clock configuration Oded Gabbay
2022-10-06  8:23 ` Oded Gabbay [this message]
2022-10-06  8:23 ` [PATCH 07/13] habanalabs: skip idle status check if reset on device release Oded Gabbay
2022-10-06  8:23 ` [PATCH 08/13] habanalabs: allow unregistering eventfd when device non-operational Oded Gabbay
2022-10-06  8:23 ` [PATCH 09/13] habanalabs: move reset workqueue to be under hl_device Oded Gabbay
2022-10-06  8:23 ` [PATCH 10/13] habanalabs: handle HBM MMU when capturing page fault data Oded Gabbay
2022-10-06  8:23 ` [PATCH 11/13] habanalabs/gaudi2: capture RAZWI information Oded Gabbay
2022-10-06  8:23 ` [PATCH 12/13] habanalabs/gaudi2: capture page fault data Oded Gabbay
2022-10-06  8:23 ` [PATCH 13/13] habanalabs: verify no zero event is sent Oded Gabbay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221006082308.1266716-6-ogabbay@kernel.org \
    --to=ogabbay@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=talcohen@habana.ai \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox