From: Oded Gabbay <ogabbay@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Ofir Bitton <obitton@habana.ai>
Subject: [PATCH 1/2] habanalabs: fix potential race in interrupt wait ioctl
Date: Wed, 1 Sep 2021 19:33:08 +0300 [thread overview]
Message-ID: <20210901163309.112126-1-ogabbay@kernel.org> (raw)
From: Ofir Bitton <obitton@habana.ai>
We have a potential race where a user interrupt can be received
in between user thread value comparison and before request was
added to wait list. This means that if no consecutive interrupt
will be received, user thread will timeout and fail.
The solution is to add the request to wait list before we
perform the comparison.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
.../habanalabs/common/command_submission.c | 35 +++++++++++--------
1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 7b0516cf808b..9a8b9191c28c 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -2740,10 +2740,20 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
else
interrupt = &hdev->user_interrupt[interrupt_offset];
+ /* Add pending user interrupt to relevant list for the interrupt
+ * handler to monitor
+ */
+ spin_lock_irqsave(&interrupt->wait_list_lock, flags);
+ list_add_tail(&pend->wait_list_node, &interrupt->wait_list_head);
+ spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
+
+ /* We check for completion value as interrupt could have been received
+ * before we added the node to the wait list
+ */
if (copy_from_user(&completion_value, u64_to_user_ptr(user_address), 4)) {
dev_err(hdev->dev, "Failed to copy completion value from user\n");
rc = -EFAULT;
- goto free_fence;
+ goto remove_pending_user_interrupt;
}
if (completion_value >= target_value)
@@ -2752,14 +2762,7 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
*status = CS_WAIT_STATUS_BUSY;
if (!timeout_us || (*status == CS_WAIT_STATUS_COMPLETED))
- goto free_fence;
-
- /* Add pending user interrupt to relevant list for the interrupt
- * handler to monitor
- */
- spin_lock_irqsave(&interrupt->wait_list_lock, flags);
- list_add_tail(&pend->wait_list_node, &interrupt->wait_list_head);
- spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
+ goto remove_pending_user_interrupt;
wait_again:
/* Wait for interrupt handler to signal completion */
@@ -2770,6 +2773,15 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
* If comparison fails, keep waiting until timeout expires
*/
if (completion_rc > 0) {
+ spin_lock_irqsave(&interrupt->wait_list_lock, flags);
+ /* reinit_completion must be called before we check for user
+ * completion value, otherwise, if interrupt is received after
+ * the comparison and before the next wait_for_completion,
+ * we will reach timeout and fail
+ */
+ reinit_completion(&pend->fence.completion);
+ spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
+
if (copy_from_user(&completion_value, u64_to_user_ptr(user_address), 4)) {
dev_err(hdev->dev, "Failed to copy completion value from user\n");
rc = -EFAULT;
@@ -2780,11 +2792,7 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
if (completion_value >= target_value) {
*status = CS_WAIT_STATUS_COMPLETED;
} else {
- spin_lock_irqsave(&interrupt->wait_list_lock, flags);
- reinit_completion(&pend->fence.completion);
timeout = completion_rc;
-
- spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
goto wait_again;
}
} else if (completion_rc == -ERESTARTSYS) {
@@ -2802,7 +2810,6 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
list_del(&pend->wait_list_node);
spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
-free_fence:
kfree(pend);
hl_ctx_put(ctx);
--
2.17.1
next reply other threads:[~2021-09-01 16:35 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-01 16:33 Oded Gabbay [this message]
2021-09-01 16:33 ` [PATCH 2/2] habanalabs: fix kernel OOPs related to staged cs Oded Gabbay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210901163309.112126-1-ogabbay@kernel.org \
--to=ogabbay@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=obitton@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox