From: Oded Gabbay <ogabbay@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Dani Liberman <dliberman@habana.ai>
Subject: [PATCH 1/3] habanalabs: handle race in driver fini
Date: Mon, 9 May 2022 11:22:16 +0300 [thread overview]
Message-ID: <20220509082218.956916-1-ogabbay@kernel.org> (raw)
From: Dani Liberman <dliberman@habana.ai>
Scenario:
1. During hard reset, driver executes device_kill_open_processes.
2. Drivers file descriptor is not closed yet (user process is alive),
hence we are starting loop on all open file descriptors.
3. Just before getting task struct of user process, according to
pid, SIGKILL is sent to the user process, hence get_pid_task
fails, driver prints a warning and device_kill_open_processes
returns an error.
4. Returned error causing driver fini do disable the device object
of the process which causes a kernel crash.
The fix is to handle this case not as an error and continue fini flow
as normal, since the killed process (by the SIGKILL) will release its
resources just like it will do when the driver sends him the sigkill.
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/device.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index dbec98736a31..15df89b31e1b 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -1024,10 +1024,13 @@ static int device_kill_open_processes(struct hl_device *hdev, u32 timeout, bool
put_task_struct(task);
} else {
- dev_warn(hdev->dev,
- "Can't get task struct for PID so giving up on killing process\n");
- mutex_unlock(fd_lock);
- return -ETIME;
+ /*
+ * If we got here, it means that process was killed from outside the driver
+ * right after it started looping on fd_list and before get_pid_task, thus
+ * we don't need to kill it.
+ */
+ dev_dbg(hdev->dev,
+ "Can't get task struct for user process, assuming process was killed from outside the driver\n");
}
}
--
2.25.1
next reply other threads:[~2022-05-09 8:57 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-09 8:22 Oded Gabbay [this message]
2022-05-09 8:22 ` [PATCH 2/3] habanalabs: add topic to memory manager buffer Oded Gabbay
2022-05-09 8:22 ` [PATCH 3/3] habanalabs: add support for notification via eventfd Oded Gabbay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220509082218.956916-1-ogabbay@kernel.org \
--to=ogabbay@kernel.org \
--cc=dliberman@habana.ai \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox