From: Oded Gabbay <oded.gabbay@gmail.com>
To: linux-kernel@vger.kernel.org
Cc: gregkh@linuxfoundation.org, Omer Shpigelman <oshpigelman@habana.ai>
Subject: [PATCH 3/4] habanalabs: complete user context cleanup before hard reset
Date: Sat, 16 Mar 2019 22:10:46 +0200 [thread overview]
Message-ID: <20190316201047.22516-3-oded.gabbay@gmail.com> (raw)
In-Reply-To: <20190316201047.22516-1-oded.gabbay@gmail.com>
From: Omer Shpigelman <oshpigelman@habana.ai>
This patch fixes a bug which led to a crash during hard reset flow.
Before a hard reset is executed, we wait a few seconds for the user
context cleanup to complete.
If it wasn't completed, we kill the user process and move on to the reset
flow.
Upon killing the user process, the context cleanup flow begins and may
take a while due to MMU unmaps.
Meanwhile, in the driver reset flow, we change the PCI DRAM bar location
which can interfere with the MMU that uses the bar.
If the context cleanup flow didn't finish quickly, a crash may occur due
to PCI DRAM bar mislocation during the MMU unmap.
Hence adding a wait between killing the user process and the start of the
reset flow.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
drivers/misc/habanalabs/device.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
index de46aa6ed154..93d67983ddba 100644
--- a/drivers/misc/habanalabs/device.c
+++ b/drivers/misc/habanalabs/device.c
@@ -11,6 +11,8 @@
#include <linux/sched/signal.h>
#include <linux/hwmon.h>
+#define HL_PLDM_PENDING_RESET_PER_SEC (HL_PENDING_RESET_PER_SEC * 10)
+
bool hl_device_disabled_or_in_reset(struct hl_device *hdev)
{
if ((hdev->disabled) || (atomic_read(&hdev->in_reset)))
@@ -462,9 +464,16 @@ static void hl_device_hard_reset_pending(struct work_struct *work)
struct hl_device_reset_work *device_reset_work =
container_of(work, struct hl_device_reset_work, reset_work);
struct hl_device *hdev = device_reset_work->hdev;
- u16 pending_cnt = HL_PENDING_RESET_PER_SEC;
+ u16 pending_total, pending_cnt;
struct task_struct *task = NULL;
+ if (hdev->pldm)
+ pending_total = HL_PLDM_PENDING_RESET_PER_SEC;
+ else
+ pending_total = HL_PENDING_RESET_PER_SEC;
+
+ pending_cnt = pending_total;
+
/* Flush all processes that are inside hl_open */
mutex_lock(&hdev->fd_open_cnt_lock);
@@ -489,6 +498,19 @@ static void hl_device_hard_reset_pending(struct work_struct *work)
}
}
+ pending_cnt = pending_total;
+
+ while ((atomic_read(&hdev->fd_open_cnt)) && (pending_cnt)) {
+
+ pending_cnt--;
+
+ ssleep(1);
+ }
+
+ if (atomic_read(&hdev->fd_open_cnt))
+ dev_crit(hdev->dev,
+ "Going to hard reset with open user contexts\n");
+
mutex_unlock(&hdev->fd_open_cnt_lock);
hl_device_reset(hdev, true, true);
--
2.17.1
next prev parent reply other threads:[~2019-03-16 20:11 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-16 20:10 [PATCH 1/4] habanalabs: fix MMU number of pages calculation Oded Gabbay
2019-03-16 20:10 ` [PATCH 2/4] habanalabs: fix bug when mapping very large memory area Oded Gabbay
2019-03-16 20:10 ` Oded Gabbay [this message]
2019-03-16 20:10 ` [PATCH 4/4] habanalabs: fix mapping with page size bigger than 4KB Oded Gabbay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190316201047.22516-3-oded.gabbay@gmail.com \
--to=oded.gabbay@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oshpigelman@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox