public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oded Gabbay <ogabbay@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Koby Elbaz <kelbaz@habana.ai>
Subject: [PATCH 07/17] habanalabs/gaudi: fix incorrect MME offset calculation
Date: Mon, 20 Jun 2022 16:04:22 +0300	[thread overview]
Message-ID: <20220620130432.1180451-7-ogabbay@kernel.org> (raw)
In-Reply-To: <20220620130432.1180451-1-ogabbay@kernel.org>

From: Koby Elbaz <kelbaz@habana.ai>

Once FW raised an event following a MME2 QMAN error, the driver should
have gone to the corresponding status registers, trying to gather more
info on the error, yet it was accidentally accessing MME1 QMAN address
space.

Generally, we have x4 MMEs, while 0 & 2 are marked MASTER, and
1 & 3 are marked SLAVE. The former can be addressed, yet addressing
the latter is considered an access violation, and will result in a
hung system, which is what unintentionally happened above.
Note that this cannot happen in a secured system, since these registers
are protected with range registers.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/gaudi/gaudi.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index b7460c30aa51..8b9ff7fa51ea 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -454,7 +454,7 @@ static const int gaudi_queue_id_to_engine_id[] = {
 	[GAUDI_QUEUE_ID_DMA_6_0...GAUDI_QUEUE_ID_DMA_6_3] = GAUDI_ENGINE_ID_DMA_6,
 	[GAUDI_QUEUE_ID_DMA_7_0...GAUDI_QUEUE_ID_DMA_7_3] = GAUDI_ENGINE_ID_DMA_7,
 	[GAUDI_QUEUE_ID_MME_0_0...GAUDI_QUEUE_ID_MME_0_3] = GAUDI_ENGINE_ID_MME_0,
-	[GAUDI_QUEUE_ID_MME_1_0...GAUDI_QUEUE_ID_MME_1_3] = GAUDI_ENGINE_ID_MME_1,
+	[GAUDI_QUEUE_ID_MME_1_0...GAUDI_QUEUE_ID_MME_1_3] = GAUDI_ENGINE_ID_MME_2,
 	[GAUDI_QUEUE_ID_TPC_0_0...GAUDI_QUEUE_ID_TPC_0_3] = GAUDI_ENGINE_ID_TPC_0,
 	[GAUDI_QUEUE_ID_TPC_1_0...GAUDI_QUEUE_ID_TPC_1_3] = GAUDI_ENGINE_ID_TPC_1,
 	[GAUDI_QUEUE_ID_TPC_2_0...GAUDI_QUEUE_ID_TPC_2_3] = GAUDI_ENGINE_ID_TPC_2,
@@ -7383,8 +7383,13 @@ static void gaudi_handle_qman_err(struct hl_device *hdev, u16 event_type, u64 *e
 		snprintf(desc, ARRAY_SIZE(desc), "%s%d", "TPC_QM", index);
 		break;
 	case GAUDI_EVENT_MME0_QM ... GAUDI_EVENT_MME2_QM:
-		index = event_type - GAUDI_EVENT_MME0_QM;
-		qid_base = GAUDI_QUEUE_ID_MME_0_0 + index * QMAN_STREAMS;
+		if (event_type == GAUDI_EVENT_MME0_QM) {
+			index = 0;
+			qid_base = GAUDI_QUEUE_ID_MME_0_0;
+		} else if (event_type == GAUDI_EVENT_MME2_QM) {
+			index = 2;
+			qid_base = GAUDI_QUEUE_ID_MME_1_0;
+		}
 		qman_base = mmMME0_QM_BASE + index * MME_QMAN_OFFSET;
 		snprintf(desc, ARRAY_SIZE(desc), "%s%d", "MME_QM", index);
 		break;
-- 
2.25.1


  parent reply	other threads:[~2022-06-20 13:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-20 13:04 [PATCH 01/17] habanalabs/gaudi: collect undefined opcode error info Oded Gabbay
2022-06-20 13:04 ` [PATCH 02/17] habanalabs: expose undefined opcode status via info ioctl Oded Gabbay
2022-06-20 13:04 ` [PATCH 03/17] habanalabs/gaudi: invoke device reset from one code block Oded Gabbay
2022-06-20 13:04 ` [PATCH 04/17] habanalabs/gaudi: send device reset notification Oded Gabbay
2022-06-20 13:04 ` [PATCH 05/17] habanalabs: send an event notification when CS timeout occurs Oded Gabbay
2022-06-20 13:04 ` [PATCH 06/17] habanalabs: avoid unnecessary error print Oded Gabbay
2022-06-20 13:04 ` Oded Gabbay [this message]
2022-06-20 13:04 ` [PATCH 08/17] habanalabs: add validity check for cq counter offset Oded Gabbay
2022-06-20 13:04 ` [PATCH 09/17] habanalabs/gaudi: fix shift out of bounds Oded Gabbay
2022-06-20 13:04 ` [PATCH 10/17] habanalabs: fix NULL dereference on cs timeout Oded Gabbay
2022-06-20 13:04 ` [PATCH 11/17] habanalabs: remove unused get_dma_desc_list_size Oded Gabbay
2022-06-20 13:04 ` [PATCH 12/17] habanalabs/gaudi: notify user process on device unavailable Oded Gabbay
2022-06-20 13:04 ` [PATCH 13/17] habanalabs: add critical indication in sram ecc Oded Gabbay
2022-06-20 13:04 ` [PATCH 14/17] habanalabs: check fence pointer before use Oded Gabbay
2022-06-20 13:04 ` [PATCH 15/17] habanalabs: print pointer with correct modifier Oded Gabbay
2022-06-20 13:04 ` [PATCH 16/17] habanalabs: use kvcalloc when possible Oded Gabbay
2022-06-20 13:04 ` [PATCH 17/17] habanalabs: fix comment style Oded Gabbay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220620130432.1180451-7-ogabbay@kernel.org \
    --to=ogabbay@kernel.org \
    --cc=kelbaz@habana.ai \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox