public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oded Gabbay <oded.gabbay@gmail.com>
To: linux-kernel@vger.kernel.org, SW_Drivers@habana.ai
Cc: gregkh@linuxfoundation.org, Omer Shpigelman <oshpigelman@habana.ai>
Subject: [PATCH 4/4] habanalabs: don't allow hard reset with open processes
Date: Thu, 21 May 2020 10:02:05 +0300	[thread overview]
Message-ID: <20200521070205.26673-4-oded.gabbay@gmail.com> (raw)
In-Reply-To: <20200521070205.26673-1-oded.gabbay@gmail.com>

From: Omer Shpigelman <oshpigelman@habana.ai>

When the MMU is heavily used by the engines, unmapping might take a lot of
time due to a full MMU cache invalidation done as part of the unmap flow.
Hence we might not be able to kill all open processes before going to hard
reset the device, as it involves unmapping of all user memory.
In case of a failure in killing all open processes, we should stop the
hard reset flow as it might lead to a kernel crash - one thread (killing
of a process) is updating MMU structures that other thread (hard reset) is
freeing.
Stopping a hard reset flow leaves the device as nonoperational and the
user can then initiate a hard reset via sysfs to reinitialize the device.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/device.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
index 4a4a446f479e..2b38a119704c 100644
--- a/drivers/misc/habanalabs/device.c
+++ b/drivers/misc/habanalabs/device.c
@@ -726,7 +726,7 @@ int hl_device_resume(struct hl_device *hdev)
 	return rc;
 }
 
-static void device_kill_open_processes(struct hl_device *hdev)
+static int device_kill_open_processes(struct hl_device *hdev)
 {
 	u16 pending_total, pending_cnt;
 	struct hl_fpriv	*hpriv;
@@ -779,9 +779,7 @@ static void device_kill_open_processes(struct hl_device *hdev)
 		ssleep(1);
 	}
 
-	if (!list_empty(&hdev->fpriv_list))
-		dev_crit(hdev->dev,
-			"Going to hard reset with open user contexts\n");
+	return list_empty(&hdev->fpriv_list) ? 0 : -EBUSY;
 }
 
 static void device_hard_reset_pending(struct work_struct *work)
@@ -908,7 +906,12 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
 		 * process can't really exit until all its CSs are done, which
 		 * is what we do in cs rollback
 		 */
-		device_kill_open_processes(hdev);
+		rc = device_kill_open_processes(hdev);
+		if (rc) {
+			dev_crit(hdev->dev,
+				"Failed to kill all open processes, stopping hard reset\n");
+			goto out_err;
+		}
 
 		/* Flush the Event queue workers to make sure no other thread is
 		 * reading or writing to registers during the reset
@@ -1391,7 +1394,9 @@ void hl_device_fini(struct hl_device *hdev)
 	 * can't really exit until all its CSs are done, which is what we
 	 * do in cs rollback
 	 */
-	device_kill_open_processes(hdev);
+	rc = device_kill_open_processes(hdev);
+	if (rc)
+		dev_crit(hdev->dev, "Failed to kill all open processes\n");
 
 	hl_cb_pool_fini(hdev);
 
-- 
2.17.1


      parent reply	other threads:[~2020-05-21  7:02 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-21  7:02 [PATCH 1/4] habanalabs: improve MMU cache invalidation code Oded Gabbay
2020-05-21  7:02 ` [PATCH 2/4] habanalabs: add print for soft reset due to event Oded Gabbay
2020-05-21  7:02 ` [PATCH 3/4] habanalabs: GAUDI does not support soft-reset Oded Gabbay
2020-05-21  7:41   ` Tomer Tayar
2020-05-21  7:02 ` Oded Gabbay [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200521070205.26673-4-oded.gabbay@gmail.com \
    --to=oded.gabbay@gmail.com \
    --cc=SW_Drivers@habana.ai \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oshpigelman@habana.ai \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox