linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] tee: optee: Fix supplicant wait loop
@ 2025-02-03  8:00 Sumit Garg
  2025-02-03  8:16 ` Arnd Bergmann
  2025-02-03  9:31 ` Jens Wiklander
  0 siblings, 2 replies; 6+ messages in thread
From: Sumit Garg @ 2025-02-03  8:00 UTC (permalink / raw)
  To: jens.wiklander, arnd
  Cc: op-tee, jerome.forissier, dannenberg, javier, linux-kernel,
	Sumit Garg, stable

OP-TEE supplicant is a user-space daemon and it's possible for it
being hung or crashed or killed in the middle of processing an OP-TEE
RPC call. It becomes more complicated when there is incorrect shutdown
ordering of the supplicant process vs the OP-TEE client application which
can eventually lead to system hang-up waiting for the closure of the
client application.

Allow the client process waiting in kernel for supplicant response to
be killed rather than indefinitetly waiting in an unkillable state. This
fixes issues observed during system reboot/shutdown when supplicant got
hung for some reason or gets crashed/killed which lead to client getting
hung in an unkillable state. It in turn lead to system being in hung up
state requiring hard power off/on to recover.

Fixes: 4fb0a5eb364d ("tee: add OP-TEE driver")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
---

Changes in v2:
- Switch to killable wait instead as suggested by Arnd instead
  of supplicant timeout. It atleast allow the client to wait for
  supplicant in killable state which in turn allows system to reboot
  or shutdown gracefully.

 drivers/tee/optee/supp.c | 32 +++++++-------------------------
 1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
index 322a543b8c27..3fbfa9751931 100644
--- a/drivers/tee/optee/supp.c
+++ b/drivers/tee/optee/supp.c
@@ -80,7 +80,6 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
 	struct optee *optee = tee_get_drvdata(ctx->teedev);
 	struct optee_supp *supp = &optee->supp;
 	struct optee_supp_req *req;
-	bool interruptable;
 	u32 ret;
 
 	/*
@@ -111,36 +110,19 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
 	/*
 	 * Wait for supplicant to process and return result, once we've
 	 * returned from wait_for_completion(&req->c) successfully we have
-	 * exclusive access again.
+	 * exclusive access again. Allow the wait to be killable such that
+	 * the wait doesn't turn into an indefinite state if the supplicant
+	 * gets hung for some reason.
 	 */
-	while (wait_for_completion_interruptible(&req->c)) {
-		mutex_lock(&supp->mutex);
-		interruptable = !supp->ctx;
-		if (interruptable) {
-			/*
-			 * There's no supplicant available and since the
-			 * supp->mutex currently is held none can
-			 * become available until the mutex released
-			 * again.
-			 *
-			 * Interrupting an RPC to supplicant is only
-			 * allowed as a way of slightly improving the user
-			 * experience in case the supplicant hasn't been
-			 * started yet. During normal operation the supplicant
-			 * will serve all requests in a timely manner and
-			 * interrupting then wouldn't make sense.
-			 */
+	if (wait_for_completion_killable(&req->c)) {
+		if (!mutex_lock_killable(&supp->mutex))	{
 			if (req->in_queue) {
 				list_del(&req->link);
 				req->in_queue = false;
 			}
+			mutex_unlock(&supp->mutex);
 		}
-		mutex_unlock(&supp->mutex);
-
-		if (interruptable) {
-			req->ret = TEEC_ERROR_COMMUNICATION;
-			break;
-		}
+		req->ret = TEEC_ERROR_COMMUNICATION;
 	}
 
 	ret = req->ret;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] tee: optee: Fix supplicant wait loop
  2025-02-03  8:00 [PATCH v2] tee: optee: Fix supplicant wait loop Sumit Garg
@ 2025-02-03  8:16 ` Arnd Bergmann
  2025-02-03 10:55   ` Sumit Garg
  2025-02-03  9:31 ` Jens Wiklander
  1 sibling, 1 reply; 6+ messages in thread
From: Arnd Bergmann @ 2025-02-03  8:16 UTC (permalink / raw)
  To: Sumit Garg, Jens Wiklander
  Cc: op-tee, Jerome Forissier, dannenberg, javier, linux-kernel,
	stable

On Mon, Feb 3, 2025, at 09:00, Sumit Garg wrote:
> OP-TEE supplicant is a user-space daemon and it's possible for it
> being hung or crashed or killed in the middle of processing an OP-TEE
> RPC call. It becomes more complicated when there is incorrect shutdown
> ordering of the supplicant process vs the OP-TEE client application which
> can eventually lead to system hang-up waiting for the closure of the
> client application.
>
> Allow the client process waiting in kernel for supplicant response to
> be killed rather than indefinitetly waiting in an unkillable state.

It would be good to mention that the existing version ends up in
a busy-loop here because of the broken wait_for_completion_interruptible()
loop.

A normal uninterruptible wait should not have resulted in the hung-task
watchdog getting triggered, but the endless loop would.

> +	if (wait_for_completion_killable(&req->c)) {
> +		if (!mutex_lock_killable(&supp->mutex))	{
>  			if (req->in_queue) {

Using mutex_trylock() here is probably clearer than
mutex_lock_killable(), since a task that is already in the
process of getting killed won't ever wait for the mutex.
mutex_lock_killable() does try to get the lock first though
if nobody else holds it already, which is the same as trylock.

     Arnd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] tee: optee: Fix supplicant wait loop
  2025-02-03  8:00 [PATCH v2] tee: optee: Fix supplicant wait loop Sumit Garg
  2025-02-03  8:16 ` Arnd Bergmann
@ 2025-02-03  9:31 ` Jens Wiklander
  2025-02-03 10:53   ` Arnd Bergmann
  1 sibling, 1 reply; 6+ messages in thread
From: Jens Wiklander @ 2025-02-03  9:31 UTC (permalink / raw)
  To: Sumit Garg
  Cc: arnd, op-tee, jerome.forissier, dannenberg, javier, linux-kernel,
	stable

On Mon, Feb 3, 2025 at 9:00 AM Sumit Garg <sumit.garg@linaro.org> wrote:
>
> OP-TEE supplicant is a user-space daemon and it's possible for it
> being hung or crashed or killed in the middle of processing an OP-TEE
> RPC call. It becomes more complicated when there is incorrect shutdown
> ordering of the supplicant process vs the OP-TEE client application which
> can eventually lead to system hang-up waiting for the closure of the
> client application.
>
> Allow the client process waiting in kernel for supplicant response to
> be killed rather than indefinitetly waiting in an unkillable state. This
> fixes issues observed during system reboot/shutdown when supplicant got
> hung for some reason or gets crashed/killed which lead to client getting
> hung in an unkillable state. It in turn lead to system being in hung up
> state requiring hard power off/on to recover.
>
> Fixes: 4fb0a5eb364d ("tee: add OP-TEE driver")
> Suggested-by: Arnd Bergmann <arnd@arndb.de>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
> ---
>
> Changes in v2:
> - Switch to killable wait instead as suggested by Arnd instead
>   of supplicant timeout. It atleast allow the client to wait for
>   supplicant in killable state which in turn allows system to reboot
>   or shutdown gracefully.
>
>  drivers/tee/optee/supp.c | 32 +++++++-------------------------
>  1 file changed, 7 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
> index 322a543b8c27..3fbfa9751931 100644
> --- a/drivers/tee/optee/supp.c
> +++ b/drivers/tee/optee/supp.c
> @@ -80,7 +80,6 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>         struct optee *optee = tee_get_drvdata(ctx->teedev);
>         struct optee_supp *supp = &optee->supp;
>         struct optee_supp_req *req;
> -       bool interruptable;
>         u32 ret;
>
>         /*
> @@ -111,36 +110,19 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>         /*
>          * Wait for supplicant to process and return result, once we've
>          * returned from wait_for_completion(&req->c) successfully we have
> -        * exclusive access again.
> +        * exclusive access again. Allow the wait to be killable such that
> +        * the wait doesn't turn into an indefinite state if the supplicant
> +        * gets hung for some reason.
>          */
> -       while (wait_for_completion_interruptible(&req->c)) {
> -               mutex_lock(&supp->mutex);
> -               interruptable = !supp->ctx;
> -               if (interruptable) {
> -                       /*
> -                        * There's no supplicant available and since the
> -                        * supp->mutex currently is held none can
> -                        * become available until the mutex released
> -                        * again.
> -                        *
> -                        * Interrupting an RPC to supplicant is only
> -                        * allowed as a way of slightly improving the user
> -                        * experience in case the supplicant hasn't been
> -                        * started yet. During normal operation the supplicant
> -                        * will serve all requests in a timely manner and
> -                        * interrupting then wouldn't make sense.
> -                        */
> +       if (wait_for_completion_killable(&req->c)) {
> +               if (!mutex_lock_killable(&supp->mutex)) {

Why not mutex_lock()? If we fail to acquire the mutex here, we will
quite likely free the req list item below at the end of this function
while it remains in the list.

Cheers,
Jens

>                         if (req->in_queue) {
>                                 list_del(&req->link);
>                                 req->in_queue = false;
>                         }
> +                       mutex_unlock(&supp->mutex);
>                 }
> -               mutex_unlock(&supp->mutex);
> -
> -               if (interruptable) {
> -                       req->ret = TEEC_ERROR_COMMUNICATION;
> -                       break;
> -               }
> +               req->ret = TEEC_ERROR_COMMUNICATION;
>         }
>
>         ret = req->ret;
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] tee: optee: Fix supplicant wait loop
  2025-02-03  9:31 ` Jens Wiklander
@ 2025-02-03 10:53   ` Arnd Bergmann
  2025-02-03 11:14     ` Sumit Garg
  0 siblings, 1 reply; 6+ messages in thread
From: Arnd Bergmann @ 2025-02-03 10:53 UTC (permalink / raw)
  To: Jens Wiklander, Sumit Garg
  Cc: op-tee, Jerome Forissier, dannenberg, javier, linux-kernel,
	stable

On Mon, Feb 3, 2025, at 10:31, Jens Wiklander wrote:
> On Mon, Feb 3, 2025 at 9:00 AM Sumit Garg <sumit.garg@linaro.org> wrote:

> Why not mutex_lock()? If we fail to acquire the mutex here, we will
> quite likely free the req list item below at the end of this function
> while it remains in the list.

Right, I had mentioned mutex_lock_killable in an earlier reply,
as I didn't know exactly where it hang. If we know that the
wait_event_interruptible() was causing the hang, then the
normal mutex_lock should be fine.

     Arnd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] tee: optee: Fix supplicant wait loop
  2025-02-03  8:16 ` Arnd Bergmann
@ 2025-02-03 10:55   ` Sumit Garg
  0 siblings, 0 replies; 6+ messages in thread
From: Sumit Garg @ 2025-02-03 10:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jens Wiklander, op-tee, Jerome Forissier, dannenberg, javier,
	linux-kernel, stable

Hi Arnd,

On Mon, 3 Feb 2025 at 13:46, Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Mon, Feb 3, 2025, at 09:00, Sumit Garg wrote:
> > OP-TEE supplicant is a user-space daemon and it's possible for it
> > being hung or crashed or killed in the middle of processing an OP-TEE
> > RPC call. It becomes more complicated when there is incorrect shutdown
> > ordering of the supplicant process vs the OP-TEE client application which
> > can eventually lead to system hang-up waiting for the closure of the
> > client application.
> >
> > Allow the client process waiting in kernel for supplicant response to
> > be killed rather than indefinitetly waiting in an unkillable state.
>
> It would be good to mention that the existing version ends up in
> a busy-loop here because of the broken wait_for_completion_interruptible()
> loop.
>
> A normal uninterruptible wait should not have resulted in the hung-task
> watchdog getting triggered, but the endless loop would.

Sure, I will add that.

>
> > +     if (wait_for_completion_killable(&req->c)) {
> > +             if (!mutex_lock_killable(&supp->mutex)) {
> >                       if (req->in_queue) {
>
> Using mutex_trylock() here is probably clearer than
> mutex_lock_killable(), since a task that is already in the
> process of getting killed won't ever wait for the mutex.
> mutex_lock_killable() does try to get the lock first though
> if nobody else holds it already, which is the same as trylock.

Let's follow-up on this in the other thread.

-Sumit

>
>      Arnd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] tee: optee: Fix supplicant wait loop
  2025-02-03 10:53   ` Arnd Bergmann
@ 2025-02-03 11:14     ` Sumit Garg
  0 siblings, 0 replies; 6+ messages in thread
From: Sumit Garg @ 2025-02-03 11:14 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jens Wiklander, op-tee, Jerome Forissier, dannenberg, javier,
	linux-kernel, stable

On Mon, 3 Feb 2025 at 16:23, Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Mon, Feb 3, 2025, at 10:31, Jens Wiklander wrote:
> > On Mon, Feb 3, 2025 at 9:00 AM Sumit Garg <sumit.garg@linaro.org> wrote:
>
> > Why not mutex_lock()? If we fail to acquire the mutex here, we will
> > quite likely free the req list item below at the end of this function
> > while it remains in the list.
>
> Right, I had mentioned mutex_lock_killable in an earlier reply,
> as I didn't know exactly where it hang. If we know that the
> wait_event_interruptible() was causing the hang, then the
> normal mutex_lock should be fine.
>

Yeah for my current test scenario mutex_lock() works fine too but I
added mutex_lock_killable() just in another theoretical corner case
being stuck acquiring mutex while someone is trying to kill the
process. But let's avoid over complexity for the time being with a
simpler mutex_lock() approach.

-Sumit

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-02-03 11:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-03  8:00 [PATCH v2] tee: optee: Fix supplicant wait loop Sumit Garg
2025-02-03  8:16 ` Arnd Bergmann
2025-02-03 10:55   ` Sumit Garg
2025-02-03  9:31 ` Jens Wiklander
2025-02-03 10:53   ` Arnd Bergmann
2025-02-03 11:14     ` Sumit Garg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).