* [PATCH v2] optee: Do not send requests to supplicant during shutdown
@ 2022-01-20 10:40 Lars Persson
2022-01-21 16:47 ` Tyler Hicks
2022-01-24 5:56 ` Sumit Garg
0 siblings, 2 replies; 8+ messages in thread
From: Lars Persson @ 2022-01-20 10:40 UTC (permalink / raw)
To: op-tee
[-- Attachment #1: Type: text/plain, Size: 3645 bytes --]
The addition of a shutdown hook by commit f25889f93184 ("optee:
fix tee out of memory failure seen during kexec reboot") introduced a
kernel shutdown regression that can be triggered after running the
xtest suites.
Once the shutdown hook is called it is not possible to communicate any
more with the supplicant process because the system is not scheduling
task any longer. Thus if the optee driver shutdown path receives a
supplicant RPC request from the OP-TEE we will deadlock the kernel's
shutdown.
This unexpected event will in fact occur after the xtest suite has
been run. It seems some cached SHM kept alive a context object which
in turn kept alive a session towards a PTA or TA. Closing the session
results in a socket RPC command being sent back from OP-TEE.
This sequence of events is captured by a 5.15 kernel annotated with
extra prints:
Calling OPTEE_SMC_DISABLE_SHM_CACHE
OPTEE_SMC_DISABLE_SHM_CACHE returned 0
freeing SHM ptr 0xFFFFFF8001079380
Calling OPTEE_SMC_DISABLE_SHM_CACHE
OPTEE_SMC_DISABLE_SHM_CACHE returned 0
freeing SHM ptr 0xFFFFFF8001CC5580
Calling OPTEE_SMC_DISABLE_SHM_CACHE
OPTEE_SMC_DISABLE_SHM_CACHE returned 0
freeing SHM ptr 0xFFFFFF8006308A80
Calling OPTEE_SMC_DISABLE_SHM_CACHE
OPTEE_SMC_DISABLE_SHM_CACHE returned 0
freeing SHM ptr 0xFFFFFF8006308B00
optee: optee_handle_rpc: a0=0XFFFF0000 a1=0XA0 a2=0X0
optee: optee_handle_rpc: a0=0XFFFF0005 a1=0XFFFFFF80 a2=0X61E6500
optee: handle_rpc_func_cmd: cmd = 0XA
optee_supp_thrd_req: func=0XA
Introduce a shutdown state in the optee device object to return an
immediate error to all RPC requests in the shutdown path.
Fixes: f25889f93184 ("optee: fix tee out of memory failure seen during kexec reboot")
Signed-off-by: Lars Persson <larper@axis.com>
---
drivers/tee/optee/optee_private.h | 1 +
drivers/tee/optee/smc_abi.c | 5 ++++-
drivers/tee/optee/supp.c | 8 ++++++++
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
index 46f74ab07c7e..9eb72931e11f 100644
--- a/drivers/tee/optee/optee_private.h
+++ b/drivers/tee/optee/optee_private.h
@@ -162,6 +162,7 @@ struct optee {
struct tee_shm_pool *pool;
unsigned int rpc_arg_count;
bool scan_bus_done;
+ bool shutting_down;
struct workqueue_struct *scan_bus_wq;
struct work_struct scan_bus_work;
};
diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
index 449d6a72d289..10af747da816 100644
--- a/drivers/tee/optee/smc_abi.c
+++ b/drivers/tee/optee/smc_abi.c
@@ -1356,7 +1356,10 @@ static int optee_smc_remove(struct platform_device *pdev)
*/
static void optee_shutdown(struct platform_device *pdev)
{
- optee_disable_shm_cache(platform_get_drvdata(pdev));
+ struct optee *optee = platform_get_drvdata(pdev);
+
+ optee->shutting_down = true;
+ optee_disable_shm_cache(optee);
}
static int optee_probe(struct platform_device *pdev)
diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
index 322a543b8c27..801b4ec659e8 100644
--- a/drivers/tee/optee/supp.c
+++ b/drivers/tee/optee/supp.c
@@ -83,6 +83,14 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
bool interruptable;
u32 ret;
+ /*
+ * When the system is shutting down we cannot talk
+ * to the supplicant anymore even if we seem to have
+ * an open session to it.
+ */
+ if (optee->shutting_down)
+ return TEEC_ERROR_COMMUNICATION;
+
/*
* Return in case there is no supplicant available and
* non-blocking request.
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v2] optee: Do not send requests to supplicant during shutdown
2022-01-20 10:40 [PATCH v2] optee: Do not send requests to supplicant during shutdown Lars Persson
@ 2022-01-21 16:47 ` Tyler Hicks
2022-01-24 5:56 ` Sumit Garg
1 sibling, 0 replies; 8+ messages in thread
From: Tyler Hicks @ 2022-01-21 16:47 UTC (permalink / raw)
To: op-tee
[-- Attachment #1: Type: text/plain, Size: 4311 bytes --]
On 2022-01-20 11:40:00, Lars Persson wrote:
> The addition of a shutdown hook by commit f25889f93184 ("optee:
> fix tee out of memory failure seen during kexec reboot") introduced a
> kernel shutdown regression that can be triggered after running the
> xtest suites.
>
> Once the shutdown hook is called it is not possible to communicate any
> more with the supplicant process because the system is not scheduling
> task any longer. Thus if the optee driver shutdown path receives a
> supplicant RPC request from the OP-TEE we will deadlock the kernel's
> shutdown.
>
> This unexpected event will in fact occur after the xtest suite has
> been run. It seems some cached SHM kept alive a context object which
> in turn kept alive a session towards a PTA or TA. Closing the session
> results in a socket RPC command being sent back from OP-TEE.
>
> This sequence of events is captured by a 5.15 kernel annotated with
> extra prints:
>
> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> freeing SHM ptr 0xFFFFFF8001079380
> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> freeing SHM ptr 0xFFFFFF8001CC5580
> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> freeing SHM ptr 0xFFFFFF8006308A80
> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> freeing SHM ptr 0xFFFFFF8006308B00
> optee: optee_handle_rpc: a0=0XFFFF0000 a1=0XA0 a2=0X0
> optee: optee_handle_rpc: a0=0XFFFF0005 a1=0XFFFFFF80 a2=0X61E6500
> optee: handle_rpc_func_cmd: cmd = 0XA
> optee_supp_thrd_req: func=0XA
>
> Introduce a shutdown state in the optee device object to return an
> immediate error to all RPC requests in the shutdown path.
>
> Fixes: f25889f93184 ("optee: fix tee out of memory failure seen during kexec reboot")
> Signed-off-by: Lars Persson <larper@axis.com>
I think that this change looks alright to fix the deadlock but I don't
feel qualified to definitively say that this is the right fix. Something
makes me wonder why the session is kept alive and if there's an
underlying bug that we're masking over but I don't understand the
lifetime of the objects, sessions, etc., when the supplicant is
involved. I'm going to defer to Jens and Sumit on the design aspects.
Tyler
> ---
> drivers/tee/optee/optee_private.h | 1 +
> drivers/tee/optee/smc_abi.c | 5 ++++-
> drivers/tee/optee/supp.c | 8 ++++++++
> 3 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
> index 46f74ab07c7e..9eb72931e11f 100644
> --- a/drivers/tee/optee/optee_private.h
> +++ b/drivers/tee/optee/optee_private.h
> @@ -162,6 +162,7 @@ struct optee {
> struct tee_shm_pool *pool;
> unsigned int rpc_arg_count;
> bool scan_bus_done;
> + bool shutting_down;
> struct workqueue_struct *scan_bus_wq;
> struct work_struct scan_bus_work;
> };
> diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
> index 449d6a72d289..10af747da816 100644
> --- a/drivers/tee/optee/smc_abi.c
> +++ b/drivers/tee/optee/smc_abi.c
> @@ -1356,7 +1356,10 @@ static int optee_smc_remove(struct platform_device *pdev)
> */
> static void optee_shutdown(struct platform_device *pdev)
> {
> - optee_disable_shm_cache(platform_get_drvdata(pdev));
> + struct optee *optee = platform_get_drvdata(pdev);
> +
> + optee->shutting_down = true;
> + optee_disable_shm_cache(optee);
> }
>
> static int optee_probe(struct platform_device *pdev)
> diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
> index 322a543b8c27..801b4ec659e8 100644
> --- a/drivers/tee/optee/supp.c
> +++ b/drivers/tee/optee/supp.c
> @@ -83,6 +83,14 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
> bool interruptable;
> u32 ret;
>
> + /*
> + * When the system is shutting down we cannot talk
> + * to the supplicant anymore even if we seem to have
> + * an open session to it.
> + */
> + if (optee->shutting_down)
> + return TEEC_ERROR_COMMUNICATION;
> +
> /*
> * Return in case there is no supplicant available and
> * non-blocking request.
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v2] optee: Do not send requests to supplicant during shutdown
2022-01-20 10:40 [PATCH v2] optee: Do not send requests to supplicant during shutdown Lars Persson
2022-01-21 16:47 ` Tyler Hicks
@ 2022-01-24 5:56 ` Sumit Garg
1 sibling, 0 replies; 8+ messages in thread
From: Sumit Garg @ 2022-01-24 5:56 UTC (permalink / raw)
To: op-tee
[-- Attachment #1: Type: text/plain, Size: 4534 bytes --]
Hi Lars,
On Thu, 20 Jan 2022 at 16:10, Lars Persson <larper@axis.com> wrote:
>
> The addition of a shutdown hook by commit f25889f93184 ("optee:
> fix tee out of memory failure seen during kexec reboot") introduced a
> kernel shutdown regression that can be triggered after running the
> xtest suites.
>
> Once the shutdown hook is called it is not possible to communicate any
> more with the supplicant process because the system is not scheduling
> task any longer. Thus if the optee driver shutdown path receives a
> supplicant RPC request from the OP-TEE we will deadlock the kernel's
> shutdown.
>
> This unexpected event will in fact occur after the xtest suite has
> been run. It seems some cached SHM kept alive a context object which
> in turn kept alive a session towards a PTA or TA. Closing the session
> results in a socket RPC command being sent back from OP-TEE.
>
> This sequence of events is captured by a 5.15 kernel annotated with
> extra prints:
>
> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> freeing SHM ptr 0xFFFFFF8001079380
> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> freeing SHM ptr 0xFFFFFF8001CC5580
> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> freeing SHM ptr 0xFFFFFF8006308A80
> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> freeing SHM ptr 0xFFFFFF8006308B00
> optee: optee_handle_rpc: a0=0XFFFF0000 a1=0XA0 a2=0X0
> optee: optee_handle_rpc: a0=0XFFFF0005 a1=0XFFFFFF80 a2=0X61E6500
> optee: handle_rpc_func_cmd: cmd = 0XA
> optee_supp_thrd_req: func=0XA
>
This looks like another side effect (earlier known one is here [1]) of
shared memory cache being allocated via RPC in a particular client's
context. There is an appropriate fix from Jens here [2] to rather use
driver internal tee_contex for RPC allocations. Can you try that and
let us know if you still observe this issue?
[1] https://github.com/OP-TEE/optee_os/issues/1918
[2] https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander(a)linaro.org/
-Sumit
> Introduce a shutdown state in the optee device object to return an
> immediate error to all RPC requests in the shutdown path.
>
> Fixes: f25889f93184 ("optee: fix tee out of memory failure seen during kexec reboot")
> Signed-off-by: Lars Persson <larper@axis.com>
> ---
> drivers/tee/optee/optee_private.h | 1 +
> drivers/tee/optee/smc_abi.c | 5 ++++-
> drivers/tee/optee/supp.c | 8 ++++++++
> 3 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
> index 46f74ab07c7e..9eb72931e11f 100644
> --- a/drivers/tee/optee/optee_private.h
> +++ b/drivers/tee/optee/optee_private.h
> @@ -162,6 +162,7 @@ struct optee {
> struct tee_shm_pool *pool;
> unsigned int rpc_arg_count;
> bool scan_bus_done;
> + bool shutting_down;
> struct workqueue_struct *scan_bus_wq;
> struct work_struct scan_bus_work;
> };
> diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
> index 449d6a72d289..10af747da816 100644
> --- a/drivers/tee/optee/smc_abi.c
> +++ b/drivers/tee/optee/smc_abi.c
> @@ -1356,7 +1356,10 @@ static int optee_smc_remove(struct platform_device *pdev)
> */
> static void optee_shutdown(struct platform_device *pdev)
> {
> - optee_disable_shm_cache(platform_get_drvdata(pdev));
> + struct optee *optee = platform_get_drvdata(pdev);
> +
> + optee->shutting_down = true;
> + optee_disable_shm_cache(optee);
> }
>
> static int optee_probe(struct platform_device *pdev)
> diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
> index 322a543b8c27..801b4ec659e8 100644
> --- a/drivers/tee/optee/supp.c
> +++ b/drivers/tee/optee/supp.c
> @@ -83,6 +83,14 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
> bool interruptable;
> u32 ret;
>
> + /*
> + * When the system is shutting down we cannot talk
> + * to the supplicant anymore even if we seem to have
> + * an open session to it.
> + */
> + if (optee->shutting_down)
> + return TEEC_ERROR_COMMUNICATION;
> +
> /*
> * Return in case there is no supplicant available and
> * non-blocking request.
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: < <CAFA6WYPNQnKqseZTbb=NyOmswUO=x-CE1CcADZa+Kga_fozB9g@mail.gmail.com>]
* Re: [PATCH v2] optee: Do not send requests to supplicant during shutdown
[not found] < <CAFA6WYPNQnKqseZTbb=NyOmswUO=x-CE1CcADZa+Kga_fozB9g@mail.gmail.com>
@ 2022-01-24 17:47 ` Lars Persson
2022-01-24 20:27 ` Jens Wiklander
0 siblings, 1 reply; 8+ messages in thread
From: Lars Persson @ 2022-01-24 17:47 UTC (permalink / raw)
To: op-tee
[-- Attachment #1: Type: text/plain, Size: 2701 bytes --]
On 2022-01-24 06:56, Sumit Garg wrote:
> Hi Lars,
>
> On Thu, 20 Jan 2022 at 16:10, Lars Persson <larper@axis.com> wrote:
>>
>> The addition of a shutdown hook by commit f25889f93184 ("optee:
>> fix tee out of memory failure seen during kexec reboot") introduced a
>> kernel shutdown regression that can be triggered after running the
>> xtest suites.
>>
>> Once the shutdown hook is called it is not possible to communicate any
>> more with the supplicant process because the system is not scheduling
>> task any longer. Thus if the optee driver shutdown path receives a
>> supplicant RPC request from the OP-TEE we will deadlock the kernel's
>> shutdown.
>>
>> This unexpected event will in fact occur after the xtest suite has
>> been run. It seems some cached SHM kept alive a context object which
>> in turn kept alive a session towards a PTA or TA. Closing the session
>> results in a socket RPC command being sent back from OP-TEE.
>>
>> This sequence of events is captured by a 5.15 kernel annotated with
>> extra prints:
>>
>> Calling OPTEE_SMC_DISABLE_SHM_CACHE
>> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
>> freeing SHM ptr 0xFFFFFF8001079380
>> Calling OPTEE_SMC_DISABLE_SHM_CACHE
>> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
>> freeing SHM ptr 0xFFFFFF8001CC5580
>> Calling OPTEE_SMC_DISABLE_SHM_CACHE
>> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
>> freeing SHM ptr 0xFFFFFF8006308A80
>> Calling OPTEE_SMC_DISABLE_SHM_CACHE
>> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
>> freeing SHM ptr 0xFFFFFF8006308B00
>> optee: optee_handle_rpc: a0=0XFFFF0000 a1=0XA0 a2=0X0
>> optee: optee_handle_rpc: a0=0XFFFF0005 a1=0XFFFFFF80 a2=0X61E6500
>> optee: handle_rpc_func_cmd: cmd = 0XA
>> optee_supp_thrd_req: func=0XA
>>
>
> This looks like another side effect (earlier known one is here [1]) of
> shared memory cache being allocated via RPC in a particular client's
> context. There is an appropriate fix from Jens here [2] to rather use
> driver internal tee_contex for RPC allocations. Can you try that and
> let us know if you still observe this issue?
>
> [1] https://github.com/OP-TEE/optee_os/issues/1918
> <https://github.com/OP-TEE/optee_os/issues/1918>
> [2]
> https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander(a)linaro.org/
> <https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander@linaro.org/>
>
> -Sumit
>
Indeed it seems this is the root cause. Backporting the patches to 5.15
also fixes our shutdown deadlock.
Nevertheless I prefer to have this extra guard in the supplicant
interface to prevent other shutdown deadlocks in the future.
/Lars
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v2] optee: Do not send requests to supplicant during shutdown
2022-01-24 17:47 ` Lars Persson
@ 2022-01-24 20:27 ` Jens Wiklander
0 siblings, 0 replies; 8+ messages in thread
From: Jens Wiklander @ 2022-01-24 20:27 UTC (permalink / raw)
To: op-tee
[-- Attachment #1: Type: text/plain, Size: 3078 bytes --]
On Mon, Jan 24, 2022 at 6:48 PM Lars Persson <larper@axis.com> wrote:
>
>
>
> On 2022-01-24 06:56, Sumit Garg wrote:
> > Hi Lars,
> >
> > On Thu, 20 Jan 2022 at 16:10, Lars Persson <larper@axis.com> wrote:
> >>
> >> The addition of a shutdown hook by commit f25889f93184 ("optee:
> >> fix tee out of memory failure seen during kexec reboot") introduced a
> >> kernel shutdown regression that can be triggered after running the
> >> xtest suites.
> >>
> >> Once the shutdown hook is called it is not possible to communicate any
> >> more with the supplicant process because the system is not scheduling
> >> task any longer. Thus if the optee driver shutdown path receives a
> >> supplicant RPC request from the OP-TEE we will deadlock the kernel's
> >> shutdown.
> >>
> >> This unexpected event will in fact occur after the xtest suite has
> >> been run. It seems some cached SHM kept alive a context object which
> >> in turn kept alive a session towards a PTA or TA. Closing the session
> >> results in a socket RPC command being sent back from OP-TEE.
> >>
> >> This sequence of events is captured by a 5.15 kernel annotated with
> >> extra prints:
> >>
> >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> >> freeing SHM ptr 0xFFFFFF8001079380
> >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> >> freeing SHM ptr 0xFFFFFF8001CC5580
> >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> >> freeing SHM ptr 0xFFFFFF8006308A80
> >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> >> freeing SHM ptr 0xFFFFFF8006308B00
> >> optee: optee_handle_rpc: a0=0XFFFF0000 a1=0XA0 a2=0X0
> >> optee: optee_handle_rpc: a0=0XFFFF0005 a1=0XFFFFFF80 a2=0X61E6500
> >> optee: handle_rpc_func_cmd: cmd = 0XA
> >> optee_supp_thrd_req: func=0XA
> >>
> >
> > This looks like another side effect (earlier known one is here [1]) of
> > shared memory cache being allocated via RPC in a particular client's
> > context. There is an appropriate fix from Jens here [2] to rather use
> > driver internal tee_contex for RPC allocations. Can you try that and
> > let us know if you still observe this issue?
> >
> > [1] https://github.com/OP-TEE/optee_os/issues/1918
> > <https://github.com/OP-TEE/optee_os/issues/1918>
> > [2]
> > https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander(a)linaro.org/
> > <https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander@linaro.org/>
> >
> > -Sumit
> >
>
> Indeed it seems this is the root cause. Backporting the patches to 5.15
> also fixes our shutdown deadlock.
That's good to hear.
>
> Nevertheless I prefer to have this extra guard in the supplicant
> interface to prevent other shutdown deadlocks in the future.
Yes, I agree, this still makes sense. Perhaps with an updated
description to avoid unnecessary confusion when applied on top of my
"tee: shared memory updates" patches.
Thanks,
Jens
^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: < <CAHUa44FQAtR_vXHckSujPOp3M7bAtho93x+b8fmS5tVxvTX1nA@mail.gmail.com>]
* Re: [PATCH v2] optee: Do not send requests to supplicant during shutdown
[not found] < <CAHUa44FQAtR_vXHckSujPOp3M7bAtho93x+b8fmS5tVxvTX1nA@mail.gmail.com>
@ 2022-01-25 6:05 ` Sumit Garg
0 siblings, 0 replies; 8+ messages in thread
From: Sumit Garg @ 2022-01-25 6:05 UTC (permalink / raw)
To: op-tee
[-- Attachment #1: Type: text/plain, Size: 3389 bytes --]
On Tue, 25 Jan 2022 at 01:57, Jens Wiklander <jens.wiklander@linaro.org> wrote:
>
> On Mon, Jan 24, 2022 at 6:48 PM Lars Persson <larper@axis.com> wrote:
> >
> >
> >
> > On 2022-01-24 06:56, Sumit Garg wrote:
> > > Hi Lars,
> > >
> > > On Thu, 20 Jan 2022 at 16:10, Lars Persson <larper@axis.com> wrote:
> > >>
> > >> The addition of a shutdown hook by commit f25889f93184 ("optee:
> > >> fix tee out of memory failure seen during kexec reboot") introduced a
> > >> kernel shutdown regression that can be triggered after running the
> > >> xtest suites.
> > >>
> > >> Once the shutdown hook is called it is not possible to communicate any
> > >> more with the supplicant process because the system is not scheduling
> > >> task any longer. Thus if the optee driver shutdown path receives a
> > >> supplicant RPC request from the OP-TEE we will deadlock the kernel's
> > >> shutdown.
> > >>
> > >> This unexpected event will in fact occur after the xtest suite has
> > >> been run. It seems some cached SHM kept alive a context object which
> > >> in turn kept alive a session towards a PTA or TA. Closing the session
> > >> results in a socket RPC command being sent back from OP-TEE.
> > >>
> > >> This sequence of events is captured by a 5.15 kernel annotated with
> > >> extra prints:
> > >>
> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> > >> freeing SHM ptr 0xFFFFFF8001079380
> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> > >> freeing SHM ptr 0xFFFFFF8001CC5580
> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> > >> freeing SHM ptr 0xFFFFFF8006308A80
> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> > >> freeing SHM ptr 0xFFFFFF8006308B00
> > >> optee: optee_handle_rpc: a0=0XFFFF0000 a1=0XA0 a2=0X0
> > >> optee: optee_handle_rpc: a0=0XFFFF0005 a1=0XFFFFFF80 a2=0X61E6500
> > >> optee: handle_rpc_func_cmd: cmd = 0XA
> > >> optee_supp_thrd_req: func=0XA
> > >>
> > >
> > > This looks like another side effect (earlier known one is here [1]) of
> > > shared memory cache being allocated via RPC in a particular client's
> > > context. There is an appropriate fix from Jens here [2] to rather use
> > > driver internal tee_contex for RPC allocations. Can you try that and
> > > let us know if you still observe this issue?
> > >
> > > [1] https://github.com/OP-TEE/optee_os/issues/1918
> > > <https://github.com/OP-TEE/optee_os/issues/1918>
> > > [2]
> > > https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander(a)linaro.org/
> > > <https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander(a)linaro.org/>
> > >
> > > -Sumit
> > >
> >
> > Indeed it seems this is the root cause. Backporting the patches to 5.15
> > also fixes our shutdown deadlock.
>
> That's good to hear.
>
> >
> > Nevertheless I prefer to have this extra guard in the supplicant
> > interface to prevent other shutdown deadlocks in the future.
>
> Yes, I agree, this still makes sense. Perhaps with an updated
> description to avoid unnecessary confusion when applied on top of my
> "tee: shared memory updates" patches.
>
That's fine with me and we should drop the fixes tag as well.
-Sumit
> Thanks,
> Jens
^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: < <CAFA6WYN4+s7XjVpRa-L28YHjEuoR2hR+3GqV-NZCwe12dK1H7w@mail.gmail.com>]
* Re: [PATCH v2] optee: Do not send requests to supplicant during shutdown
[not found] < <CAFA6WYN4+s7XjVpRa-L28YHjEuoR2hR+3GqV-NZCwe12dK1H7w@mail.gmail.com>
@ 2022-01-26 10:48 ` Lars Persson
2022-01-27 7:10 ` Sumit Garg
0 siblings, 1 reply; 8+ messages in thread
From: Lars Persson @ 2022-01-26 10:48 UTC (permalink / raw)
To: op-tee
[-- Attachment #1: Type: text/plain, Size: 3970 bytes --]
On 2022-01-25 07:05, Sumit Garg wrote:
> On Tue, 25 Jan 2022 at 01:57, Jens Wiklander <jens.wiklander@linaro.org>
> wrote:
>>
>> On Mon, Jan 24, 2022 at 6:48 PM Lars Persson <larper@axis.com> wrote:
>> >
>> >
>> >
>> > On 2022-01-24 06:56, Sumit Garg wrote:
>> > > Hi Lars,
>> > >
>> > > On Thu, 20 Jan 2022 at 16:10, Lars Persson <larper@axis.com> wrote:
>> > >>
>> > >> The addition of a shutdown hook by commit f25889f93184 ("optee:
>> > >> fix tee out of memory failure seen during kexec reboot") introduced a
>> > >> kernel shutdown regression that can be triggered after running the
>> > >> xtest suites.
>> > >>
>> > >> Once the shutdown hook is called it is not possible to communicate any
>> > >> more with the supplicant process because the system is not scheduling
>> > >> task any longer. Thus if the optee driver shutdown path receives a
>> > >> supplicant RPC request from the OP-TEE we will deadlock the kernel's
>> > >> shutdown.
>> > >>
>> > >> This unexpected event will in fact occur after the xtest suite has
>> > >> been run. It seems some cached SHM kept alive a context object which
>> > >> in turn kept alive a session towards a PTA or TA. Closing the session
>> > >> results in a socket RPC command being sent back from OP-TEE.
>> > >>
>> > >> This sequence of events is captured by a 5.15 kernel annotated with
>> > >> extra prints:
>> > >>
>> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
>> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
>> > >> freeing SHM ptr 0xFFFFFF8001079380
>> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
>> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
>> > >> freeing SHM ptr 0xFFFFFF8001CC5580
>> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
>> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
>> > >> freeing SHM ptr 0xFFFFFF8006308A80
>> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
>> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
>> > >> freeing SHM ptr 0xFFFFFF8006308B00
>> > >> optee: optee_handle_rpc: a0=0XFFFF0000 a1=0XA0 a2=0X0
>> > >> optee: optee_handle_rpc: a0=0XFFFF0005 a1=0XFFFFFF80 a2=0X61E6500
>> > >> optee: handle_rpc_func_cmd: cmd = 0XA
>> > >> optee_supp_thrd_req: func=0XA
>> > >>
>> > >
>> > > This looks like another side effect (earlier known one is here [1]) of
>> > > shared memory cache being allocated via RPC in a particular client's
>> > > context. There is an appropriate fix from Jens here [2] to rather use
>> > > driver internal tee_contex for RPC allocations. Can you try that and
>> > > let us know if you still observe this issue?
>> > >
>> > > [1] https://github.com/OP-TEE/optee_os/issues/1918
> <https://github.com/OP-TEE/optee_os/issues/1918>
>> > > <https://github.com/OP-TEE/optee_os/issues/1918
> <https://github.com/OP-TEE/optee_os/issues/1918>>
>> > > [2]
>> > > https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander(a)linaro.org/
> <https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander@linaro.org/>
>> > > <https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander(a)linaro.org/
> <https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander@linaro.org/>>
>> > >
>> > > -Sumit
>> > >
>> >
>> > Indeed it seems this is the root cause. Backporting the patches to 5.15
>> > also fixes our shutdown deadlock.
>>
>> That's good to hear.
>>
>> >
>> > Nevertheless I prefer to have this extra guard in the supplicant
>> > interface to prevent other shutdown deadlocks in the future.
>>
>> Yes, I agree, this still makes sense. Perhaps with an updated
>> description to avoid unnecessary confusion when applied on top of my
>> "tee: shared memory updates" patches.
>>
>
> That's fine with me and we should drop the fixes tag as well.
>
Should we really drop the Fixes ? Depending on if and when the SHM
owning context patch is backported, this fix may still be relevant for
the 5.15 LTS branch.
/Lars
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v2] optee: Do not send requests to supplicant during shutdown
2022-01-26 10:48 ` Lars Persson
@ 2022-01-27 7:10 ` Sumit Garg
0 siblings, 0 replies; 8+ messages in thread
From: Sumit Garg @ 2022-01-27 7:10 UTC (permalink / raw)
To: op-tee
[-- Attachment #1: Type: text/plain, Size: 4433 bytes --]
On Wed, 26 Jan 2022 at 16:18, Lars Persson <larper@axis.com> wrote:
>
>
>
> On 2022-01-25 07:05, Sumit Garg wrote:
> > On Tue, 25 Jan 2022 at 01:57, Jens Wiklander <jens.wiklander@linaro.org>
> > wrote:
> >>
> >> On Mon, Jan 24, 2022 at 6:48 PM Lars Persson <larper@axis.com> wrote:
> >> >
> >> >
> >> >
> >> > On 2022-01-24 06:56, Sumit Garg wrote:
> >> > > Hi Lars,
> >> > >
> >> > > On Thu, 20 Jan 2022 at 16:10, Lars Persson <larper@axis.com> wrote:
> >> > >>
> >> > >> The addition of a shutdown hook by commit f25889f93184 ("optee:
> >> > >> fix tee out of memory failure seen during kexec reboot") introduced a
> >> > >> kernel shutdown regression that can be triggered after running the
> >> > >> xtest suites.
> >> > >>
> >> > >> Once the shutdown hook is called it is not possible to communicate any
> >> > >> more with the supplicant process because the system is not scheduling
> >> > >> task any longer. Thus if the optee driver shutdown path receives a
> >> > >> supplicant RPC request from the OP-TEE we will deadlock the kernel's
> >> > >> shutdown.
> >> > >>
> >> > >> This unexpected event will in fact occur after the xtest suite has
> >> > >> been run. It seems some cached SHM kept alive a context object which
> >> > >> in turn kept alive a session towards a PTA or TA. Closing the session
> >> > >> results in a socket RPC command being sent back from OP-TEE.
> >> > >>
> >> > >> This sequence of events is captured by a 5.15 kernel annotated with
> >> > >> extra prints:
> >> > >>
> >> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> >> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> >> > >> freeing SHM ptr 0xFFFFFF8001079380
> >> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> >> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> >> > >> freeing SHM ptr 0xFFFFFF8001CC5580
> >> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> >> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> >> > >> freeing SHM ptr 0xFFFFFF8006308A80
> >> > >> Calling OPTEE_SMC_DISABLE_SHM_CACHE
> >> > >> OPTEE_SMC_DISABLE_SHM_CACHE returned 0
> >> > >> freeing SHM ptr 0xFFFFFF8006308B00
> >> > >> optee: optee_handle_rpc: a0=0XFFFF0000 a1=0XA0 a2=0X0
> >> > >> optee: optee_handle_rpc: a0=0XFFFF0005 a1=0XFFFFFF80 a2=0X61E6500
> >> > >> optee: handle_rpc_func_cmd: cmd = 0XA
> >> > >> optee_supp_thrd_req: func=0XA
> >> > >>
> >> > >
> >> > > This looks like another side effect (earlier known one is here [1]) of
> >> > > shared memory cache being allocated via RPC in a particular client's
> >> > > context. There is an appropriate fix from Jens here [2] to rather use
> >> > > driver internal tee_contex for RPC allocations. Can you try that and
> >> > > let us know if you still observe this issue?
> >> > >
> >> > > [1] https://github.com/OP-TEE/optee_os/issues/1918
> > <https://github.com/OP-TEE/optee_os/issues/1918>
> >> > > <https://github.com/OP-TEE/optee_os/issues/1918
> > <https://github.com/OP-TEE/optee_os/issues/1918>>
> >> > > [2]
> >> > > https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander(a)linaro.org/
> > <https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander@linaro.org/>
> >> > > <https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander@linaro.org/
> > <https://lore.kernel.org/lkml/20220114150824.3578829-8-jens.wiklander@linaro.org/>>
> >> > >
> >> > > -Sumit
> >> > >
> >> >
> >> > Indeed it seems this is the root cause. Backporting the patches to 5.15
> >> > also fixes our shutdown deadlock.
> >>
> >> That's good to hear.
> >>
> >> >
> >> > Nevertheless I prefer to have this extra guard in the supplicant
> >> > interface to prevent other shutdown deadlocks in the future.
> >>
> >> Yes, I agree, this still makes sense. Perhaps with an updated
> >> description to avoid unnecessary confusion when applied on top of my
> >> "tee: shared memory updates" patches.
> >>
> >
> > That's fine with me and we should drop the fixes tag as well.
> >
>
> Should we really drop the Fixes ? Depending on if and when the SHM
> owning context patch is backported, this fix may still be relevant for
> the 5.15 LTS branch.
I rather consider this as a robustness patch where the actual problem
is being fixed by Jens patch. I hope we should be able to target Jens
patch for 5.17 as a fix which should be backported to affected stable
releases.
-Sumit
>
> /Lars
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-01-27 7:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-20 10:40 [PATCH v2] optee: Do not send requests to supplicant during shutdown Lars Persson
2022-01-21 16:47 ` Tyler Hicks
2022-01-24 5:56 ` Sumit Garg
[not found] < <CAFA6WYPNQnKqseZTbb=NyOmswUO=x-CE1CcADZa+Kga_fozB9g@mail.gmail.com>
2022-01-24 17:47 ` Lars Persson
2022-01-24 20:27 ` Jens Wiklander
[not found] < <CAHUa44FQAtR_vXHckSujPOp3M7bAtho93x+b8fmS5tVxvTX1nA@mail.gmail.com>
2022-01-25 6:05 ` Sumit Garg
[not found] < <CAFA6WYN4+s7XjVpRa-L28YHjEuoR2hR+3GqV-NZCwe12dK1H7w@mail.gmail.com>
2022-01-26 10:48 ` Lars Persson
2022-01-27 7:10 ` Sumit Garg
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.