public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] habanalabs/gaudi: reset device upon BMC request
@ 2021-03-09 18:28 Oded Gabbay
  2021-03-09 18:28 ` [PATCH 2/2] habanalabs/gaudi: always use single-msi mode Oded Gabbay
  0 siblings, 1 reply; 2+ messages in thread
From: Oded Gabbay @ 2021-03-09 18:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

In case the BMC of the devices' box wants to initiate a reset of
a specific device, it must go through driver.
Once driver will receive the request it will initiate a hard reset
flow.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/gaudi/gaudi.c                         | 4 ++++
 drivers/misc/habanalabs/include/gaudi/gaudi_async_events.h    | 1 +
 .../habanalabs/include/gaudi/gaudi_async_ids_map_extended.h   | 2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 8be75f7c97f8..099c51350be6 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -7575,6 +7575,10 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 			event_type, cause);
 		break;
 
+	case GAUDI_EVENT_BMC_RESET_CMD:
+		gaudi_print_irq_info(hdev, event_type, false);
+		goto reset_device;
+
 	case GAUDI_EVENT_PKT_QUEUE_OUT_SYNC:
 		gaudi_print_irq_info(hdev, event_type, false);
 		gaudi_print_out_of_sync_info(hdev, &eq_entry->pkt_sync_err);
diff --git a/drivers/misc/habanalabs/include/gaudi/gaudi_async_events.h b/drivers/misc/habanalabs/include/gaudi/gaudi_async_events.h
index 0a0fa57024f8..25f835ba2cd6 100644
--- a/drivers/misc/habanalabs/include/gaudi/gaudi_async_events.h
+++ b/drivers/misc/habanalabs/include/gaudi/gaudi_async_events.h
@@ -303,6 +303,7 @@ enum gaudi_async_event_id {
 	GAUDI_EVENT_NIC3_QP1 = 619,
 	GAUDI_EVENT_NIC4_QP0 = 620,
 	GAUDI_EVENT_NIC4_QP1 = 621,
+	GAUDI_EVENT_BMC_RESET_CMD = 646,
 	GAUDI_EVENT_PKT_QUEUE_OUT_SYNC = 647,
 	GAUDI_EVENT_FIX_POWER_ENV_S = 658,
 	GAUDI_EVENT_FIX_POWER_ENV_E = 659,
diff --git a/drivers/misc/habanalabs/include/gaudi/gaudi_async_ids_map_extended.h b/drivers/misc/habanalabs/include/gaudi/gaudi_async_ids_map_extended.h
index 9106395eb920..079be1fb8f70 100644
--- a/drivers/misc/habanalabs/include/gaudi/gaudi_async_ids_map_extended.h
+++ b/drivers/misc/habanalabs/include/gaudi/gaudi_async_ids_map_extended.h
@@ -670,7 +670,7 @@ static struct gaudi_async_events_ids_map gaudi_irq_map_table[] = {
 	{ .fc_id = 643, .cpu_id = 492, .valid = 0, .name = "" },
 	{ .fc_id = 644, .cpu_id = 493, .valid = 0, .name = "" },
 	{ .fc_id = 645, .cpu_id = 494, .valid = 0, .name = "" },
-	{ .fc_id = 646, .cpu_id = 495, .valid = 0, .name = "" },
+	{ .fc_id = 646, .cpu_id = 495, .valid = 1, .name = "BMC_RST_CMD" },
 	{ .fc_id = 647, .cpu_id = 496, .valid = 0, .name = "" },
 	{ .fc_id = 648, .cpu_id = 497, .valid = 0, .name = "" },
 	{ .fc_id = 649, .cpu_id = 498, .valid = 0, .name = "" },
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH 2/2] habanalabs/gaudi: always use single-msi mode
  2021-03-09 18:28 [PATCH 1/2] habanalabs/gaudi: reset device upon BMC request Oded Gabbay
@ 2021-03-09 18:28 ` Oded Gabbay
  0 siblings, 0 replies; 2+ messages in thread
From: Oded Gabbay @ 2021-03-09 18:28 UTC (permalink / raw)
  To: linux-kernel

The device can get into deadlock in case it use indirect mode for MSI
interrupts (multi-msi) and have hard-reset during interrupt storm.

To prevent that, always use direct mode which means single-msi mode.

The F/W will prevent the host from writing to the indirect MSI
registers to prevent any malicious user from causing this scenario.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/gaudi/gaudi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 099c51350be6..0bcee675e1db 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -1766,8 +1766,7 @@ static int gaudi_enable_msi(struct hl_device *hdev)
 	if (gaudi->hw_cap_initialized & HW_CAP_MSI)
 		return 0;
 
-	rc = pci_alloc_irq_vectors(hdev->pdev, 1, GAUDI_MSI_ENTRIES,
-					PCI_IRQ_MSI);
+	rc = pci_alloc_irq_vectors(hdev->pdev, 1, 1, PCI_IRQ_MSI);
 	if (rc < 0) {
 		dev_err(hdev->dev, "MSI: Failed to enable support %d\n", rc);
 		return rc;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-03-09 18:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-09 18:28 [PATCH 1/2] habanalabs/gaudi: reset device upon BMC request Oded Gabbay
2021-03-09 18:28 ` [PATCH 2/2] habanalabs/gaudi: always use single-msi mode Oded Gabbay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox