From: Oded Gabbay <ogabbay@kernel.org>
To: dri-devel@lists.freedesktop.org
Cc: Koby Elbaz <kelbaz@habana.ai>
Subject: [PATCH 06/12] accel/habanalabs: upon DMA errors, use FW-extracted error cause
Date: Tue, 16 May 2023 12:30:24 +0300 [thread overview]
Message-ID: <20230516093030.1220526-6-ogabbay@kernel.org> (raw)
In-Reply-To: <20230516093030.1220526-1-ogabbay@kernel.org>
From: Koby Elbaz <kelbaz@habana.ai>
Initially, the driver used to read the error cause data directly from
the ASIC. However, the FW now clears it before the driver could read
it. Therefore we should use the error cause data that is extracted by
the FW.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/accel/habanalabs/gaudi2/gaudi2.c | 37 +++++-------------------
1 file changed, 8 insertions(+), 29 deletions(-)
diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c
index e900017f4ff7..b8644d87f817 100644
--- a/drivers/accel/habanalabs/gaudi2/gaudi2.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c
@@ -8807,13 +8807,13 @@ static int gaudi2_handle_kdma_core_event(struct hl_device *hdev, u16 event_type,
return error_count;
}
-static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u16 event_type, int sts_addr)
+static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u16 event_type, u64 intr_cause)
{
- u32 error_count = 0, sts_val = RREG32(sts_addr);
+ u32 error_count = 0;
int i;
for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++)
- if (sts_val & BIT(i)) {
+ if (intr_cause & BIT(i)) {
gaudi2_print_event(hdev, event_type, true,
"err cause: %s", gaudi2_dma_core_interrupts_cause[i]);
error_count++;
@@ -8824,27 +8824,6 @@ static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u16 event_type,
return error_count;
}
-static int gaudi2_handle_pdma_core_event(struct hl_device *hdev, u16 event_type, int pdma_idx)
-{
- u32 sts_addr;
-
- sts_addr = mmPDMA0_CORE_ERR_CAUSE + pdma_idx * PDMA_OFFSET;
- return gaudi2_handle_dma_core_event(hdev, event_type, sts_addr);
-}
-
-static int gaudi2_handle_edma_core_event(struct hl_device *hdev, u16 event_type, int edma_idx)
-{
- static const int edma_event_index_map[] = {2, 3, 0, 1, 6, 7, 4, 5};
- u32 sts_addr, index;
-
- index = edma_event_index_map[edma_idx];
-
- sts_addr = mmDCORE0_EDMA0_CORE_ERR_CAUSE +
- DCORE_OFFSET * (index / NUM_OF_EDMA_PER_DCORE) +
- DCORE_EDMA_OFFSET * (index % NUM_OF_EDMA_PER_DCORE);
- return gaudi2_handle_dma_core_event(hdev, event_type, sts_addr);
-}
-
static void gaudi2_print_pcie_mstr_rr_mstr_if_razwi_info(struct hl_device *hdev, u64 *event_mask)
{
u32 mstr_if_base_addr = mmPCIE_MSTR_RR_MSTR_IF_RR_SHRD_HBW_BASE, razwi_happened_addr;
@@ -9725,19 +9704,19 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent
case GAUDI2_EVENT_KDMA_CH0_AXI_ERR_RSP:
case GAUDI2_EVENT_KDMA0_CORE:
error_count = gaudi2_handle_kdma_core_event(hdev, event_type,
- le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
break;
case GAUDI2_EVENT_HDMA2_CORE ... GAUDI2_EVENT_HDMA5_CORE:
- index = event_type - GAUDI2_EVENT_HDMA2_CORE;
- error_count = gaudi2_handle_edma_core_event(hdev, event_type, index);
+ error_count = gaudi2_handle_dma_core_event(hdev, event_type,
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
break;
case GAUDI2_EVENT_PDMA0_CORE ... GAUDI2_EVENT_PDMA1_CORE:
- index = event_type - GAUDI2_EVENT_PDMA0_CORE;
- error_count = gaudi2_handle_pdma_core_event(hdev, event_type, index);
+ error_count = gaudi2_handle_dma_core_event(hdev, event_type,
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
break;
--
2.40.1
next prev parent reply other threads:[~2023-05-16 9:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-16 9:30 [PATCH 01/12] accel/habanalabs: rename security functions related arguments Oded Gabbay
2023-05-16 9:30 ` [PATCH 02/12] accel/habanalabs: set unused bit as reserved Oded Gabbay
2023-05-17 18:03 ` Ofir Bitton
2023-05-16 9:30 ` [PATCH 03/12] accel/habanalabs: fix mem leak in capture user mappings Oded Gabbay
2023-05-16 9:30 ` [PATCH 04/12] accel/habanalabs: align to latest firmware specs Oded Gabbay
2023-05-17 18:03 ` Ofir Bitton
2023-05-16 9:30 ` [PATCH 05/12] accel/habanalabs: print max timeout value on CS stuck Oded Gabbay
2023-05-17 18:01 ` Ofir Bitton
2023-05-16 9:30 ` Oded Gabbay [this message]
2023-05-16 9:30 ` [PATCH 07/12] accel/habanalabs: remove support for mmu disable Oded Gabbay
2023-05-16 9:30 ` [PATCH 08/12] accel/habanalabs: use binning info when handling razwi Oded Gabbay
2023-05-16 9:30 ` [PATCH 09/12] accel/habanalabs: use lower QM in QM errors handling Oded Gabbay
2023-05-16 9:30 ` [PATCH 10/12] accel/habanalabs: print qman data on error only for lower qman Oded Gabbay
2023-05-16 9:30 ` [PATCH 11/12] accel/habanalabs: update state when loading boot fit Oded Gabbay
2023-05-16 9:30 ` [PATCH 12/12] accel/habanalabs: mask part of hmmu page fault captured address Oded Gabbay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230516093030.1220526-6-ogabbay@kernel.org \
--to=ogabbay@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=kelbaz@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.