Linux cryptographic layer development
 help / color / mirror / Atom feed
* [PATCH v2] crypto: Fix hungtask for PADATA_RESET
@ 2023-08-23  7:30 Lu Jialin
  2023-08-23  9:28 ` Herbert Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Lu Jialin @ 2023-08-23  7:30 UTC (permalink / raw)
  To: Steffen Klassert, Daniel Jordan, Herbert Xu, David S . Miller
  Cc: Lu Jialin, Guo Zihua, linux-crypto

We found a hungtask bug in test_aead_vec_cfg as follows:

INFO: task cryptomgr_test:391009 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call trace:
 __switch_to+0x98/0xe0
 __schedule+0x6c4/0xf40
 schedule+0xd8/0x1b4
 schedule_timeout+0x474/0x560
 wait_for_common+0x368/0x4e0
 wait_for_completion+0x20/0x30
 test_aead_vec_cfg+0xab4/0xd50
 test_aead+0x144/0x1f0
 alg_test_aead+0xd8/0x1e0
 alg_test+0x634/0x890
 cryptomgr_test+0x40/0x70
 kthread+0x1e0/0x220
 ret_from_fork+0x10/0x18
 Kernel panic - not syncing: hung_task: blocked tasks

For padata_do_parallel, when the return err is 0 or -EBUSY, it will call
wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal
case, aead_request_complete() will be called in pcrypt_aead_serial and the
return err is 0 for padata_do_parallel. But, when pinst->flags is
PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it
won't call aead_request_complete(). Therefore, test_aead_vec_cfg will
hung at wait_for_completion(&wait->completion), which will cause
hungtask.

The problem comes as following:
(padata_do_parallel)                 |
    rcu_read_lock_bh();              |
    err = -EINVAL;                   |   (padata_replace)
                                     |     pinst->flags |= PADATA_RESET;
    err = -EBUSY                     |
    if (pinst->flags & PADATA_RESET) |
        rcu_read_unlock_bh()         |
        return err

In order to resolve the problem, we retry at most 5 times when
padata_do_parallel return -EBUSY. For more than 5 times, we replace the
return err -EBUSY with -EAGAIN, which means parallel_data is changing, and
the caller should call it again.

v2:
introduce padata_try_do_parallel() in pcrypt_aead_encrypt and
pcrypt_aead_decrypt to solve the hungtask

Signed-off-by: Lu Jialin <lujialin4@huawei.com>
Signed-off-by: Guo Zihua <guozihua@huawei.com>
---
 crypto/pcrypt.c | 33 +++++++++++++++++++++++++++------
 kernel/padata.c |  2 +-
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index 8c1d0ca41213..9d4470482165 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -74,6 +74,31 @@ static void pcrypt_aead_done(void *data, int err)
 	padata_do_serial(padata);
 }
 
+/*
+ *  We retry at most 5 times when padata_do_parallel return -EBUSY.
+ *  For more than 5 times, we replace the return err -EBUSY with -EAGAIN,
+ *  which means parallel_data is changing, the caller should call it again.
+ */
+static int padata_try_do_paralell(struct padata_shell *ps,
+				  struct padata_priv *padata, int *cb_cpu)
+{
+	int err = 0;
+	int nr_retries = 5;
+
+	while (nr_retries--) {
+		err = padata_do_parallel(ps, padata, cb_cpu);
+		if (err != -EBUSY)
+			break;
+	}
+
+	if (err == 0)
+		err = -EINPROGRESS;
+	else if (err == -EBUSY)
+		err = -EAGAIN;
+
+	return err;
+}
+
 static void pcrypt_aead_enc(struct padata_priv *padata)
 {
 	struct pcrypt_request *preq = pcrypt_padata_request(padata);
@@ -114,9 +139,7 @@ static int pcrypt_aead_encrypt(struct aead_request *req)
 			       req->cryptlen, req->iv);
 	aead_request_set_ad(creq, req->assoclen);
 
-	err = padata_do_parallel(ictx->psenc, padata, &ctx->cb_cpu);
-	if (!err)
-		return -EINPROGRESS;
+	err = padata_try_do_paralell(ictx->psenc, padata, &ctx->cb_cpu);
 
 	return err;
 }
@@ -161,9 +184,7 @@ static int pcrypt_aead_decrypt(struct aead_request *req)
 			       req->cryptlen, req->iv);
 	aead_request_set_ad(creq, req->assoclen);
 
-	err = padata_do_parallel(ictx->psdec, padata, &ctx->cb_cpu);
-	if (!err)
-		return -EINPROGRESS;
+	err = padata_try_do_paralell(ictx->psenc, padata, &ctx->cb_cpu);
 
 	return err;
 }
diff --git a/kernel/padata.c b/kernel/padata.c
index 222d60195de6..81c8183f3176 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -202,7 +202,7 @@ int padata_do_parallel(struct padata_shell *ps,
 		*cb_cpu = cpu;
 	}
 
-	err =  -EBUSY;
+	err = -EBUSY;
 	if ((pinst->flags & PADATA_RESET))
 		goto out;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] crypto: Fix hungtask for PADATA_RESET
  2023-08-23  7:30 [PATCH v2] crypto: Fix hungtask for PADATA_RESET Lu Jialin
@ 2023-08-23  9:28 ` Herbert Xu
  2023-09-01  2:28   ` Guozihua (Scott)
  0 siblings, 1 reply; 5+ messages in thread
From: Herbert Xu @ 2023-08-23  9:28 UTC (permalink / raw)
  To: Lu Jialin
  Cc: Steffen Klassert, Daniel Jordan, David S . Miller, Guo Zihua,
	linux-crypto

On Wed, Aug 23, 2023 at 07:30:47AM +0000, Lu Jialin wrote:
> We found a hungtask bug in test_aead_vec_cfg as follows:
> 
> INFO: task cryptomgr_test:391009 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Call trace:
>  __switch_to+0x98/0xe0
>  __schedule+0x6c4/0xf40
>  schedule+0xd8/0x1b4
>  schedule_timeout+0x474/0x560
>  wait_for_common+0x368/0x4e0
>  wait_for_completion+0x20/0x30
>  test_aead_vec_cfg+0xab4/0xd50
>  test_aead+0x144/0x1f0
>  alg_test_aead+0xd8/0x1e0
>  alg_test+0x634/0x890
>  cryptomgr_test+0x40/0x70
>  kthread+0x1e0/0x220
>  ret_from_fork+0x10/0x18
>  Kernel panic - not syncing: hung_task: blocked tasks
> 
> For padata_do_parallel, when the return err is 0 or -EBUSY, it will call
> wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal
> case, aead_request_complete() will be called in pcrypt_aead_serial and the
> return err is 0 for padata_do_parallel. But, when pinst->flags is
> PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it
> won't call aead_request_complete(). Therefore, test_aead_vec_cfg will
> hung at wait_for_completion(&wait->completion), which will cause
> hungtask.
> 
> The problem comes as following:
> (padata_do_parallel)                 |
>     rcu_read_lock_bh();              |
>     err = -EINVAL;                   |   (padata_replace)
>                                      |     pinst->flags |= PADATA_RESET;
>     err = -EBUSY                     |
>     if (pinst->flags & PADATA_RESET) |
>         rcu_read_unlock_bh()         |
>         return err
> 
> In order to resolve the problem, we retry at most 5 times when
> padata_do_parallel return -EBUSY. For more than 5 times, we replace the
> return err -EBUSY with -EAGAIN, which means parallel_data is changing, and
> the caller should call it again.

Steffen, should we retry this at all? Or should it just fail as it
did before?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] crypto: Fix hungtask for PADATA_RESET
  2023-08-23  9:28 ` Herbert Xu
@ 2023-09-01  2:28   ` Guozihua (Scott)
  2023-09-04  5:40     ` Steffen Klassert
  0 siblings, 1 reply; 5+ messages in thread
From: Guozihua (Scott) @ 2023-09-01  2:28 UTC (permalink / raw)
  To: Herbert Xu, Lu Jialin
  Cc: Steffen Klassert, Daniel Jordan, David S . Miller, linux-crypto

On 2023/8/23 17:28, Herbert Xu wrote:
> On Wed, Aug 23, 2023 at 07:30:47AM +0000, Lu Jialin wrote:
>> We found a hungtask bug in test_aead_vec_cfg as follows:
>>
>> INFO: task cryptomgr_test:391009 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Call trace:
>>  __switch_to+0x98/0xe0
>>  __schedule+0x6c4/0xf40
>>  schedule+0xd8/0x1b4
>>  schedule_timeout+0x474/0x560
>>  wait_for_common+0x368/0x4e0
>>  wait_for_completion+0x20/0x30
>>  test_aead_vec_cfg+0xab4/0xd50
>>  test_aead+0x144/0x1f0
>>  alg_test_aead+0xd8/0x1e0
>>  alg_test+0x634/0x890
>>  cryptomgr_test+0x40/0x70
>>  kthread+0x1e0/0x220
>>  ret_from_fork+0x10/0x18
>>  Kernel panic - not syncing: hung_task: blocked tasks
>>
>> For padata_do_parallel, when the return err is 0 or -EBUSY, it will call
>> wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal
>> case, aead_request_complete() will be called in pcrypt_aead_serial and the
>> return err is 0 for padata_do_parallel. But, when pinst->flags is
>> PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it
>> won't call aead_request_complete(). Therefore, test_aead_vec_cfg will
>> hung at wait_for_completion(&wait->completion), which will cause
>> hungtask.
>>
>> The problem comes as following:
>> (padata_do_parallel)                 |
>>     rcu_read_lock_bh();              |
>>     err = -EINVAL;                   |   (padata_replace)
>>                                      |     pinst->flags |= PADATA_RESET;
>>     err = -EBUSY                     |
>>     if (pinst->flags & PADATA_RESET) |
>>         rcu_read_unlock_bh()         |
>>         return err
>>
>> In order to resolve the problem, we retry at most 5 times when
>> padata_do_parallel return -EBUSY. For more than 5 times, we replace the
>> return err -EBUSY with -EAGAIN, which means parallel_data is changing, and
>> the caller should call it again.
> 
> Steffen, should we retry this at all? Or should it just fail as it
> did before?
> 
> Thanks,

It should be fine if we don't retry and just fail with -EAGAIN and let
caller handles it. It should not break the meaning of the error code.
-- 
Best
GUO Zihua


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] crypto: Fix hungtask for PADATA_RESET
  2023-09-01  2:28   ` Guozihua (Scott)
@ 2023-09-04  5:40     ` Steffen Klassert
  2023-09-04  6:58       ` Lu Jialin
  0 siblings, 1 reply; 5+ messages in thread
From: Steffen Klassert @ 2023-09-04  5:40 UTC (permalink / raw)
  To: Guozihua (Scott)
  Cc: Herbert Xu, Lu Jialin, Daniel Jordan, David S . Miller,
	linux-crypto

On Fri, Sep 01, 2023 at 10:28:08AM +0800, Guozihua (Scott) wrote:
> On 2023/8/23 17:28, Herbert Xu wrote:
> > On Wed, Aug 23, 2023 at 07:30:47AM +0000, Lu Jialin wrote:
> >>
> >> In order to resolve the problem, we retry at most 5 times when
> >> padata_do_parallel return -EBUSY. For more than 5 times, we replace the
> >> return err -EBUSY with -EAGAIN, which means parallel_data is changing, and
> >> the caller should call it again.
> > 
> > Steffen, should we retry this at all? Or should it just fail as it
> > did before?
> > 
> > Thanks,
> 
> It should be fine if we don't retry and just fail with -EAGAIN and let
> caller handles it. It should not break the meaning of the error code.

Just failing without a retry should be ok.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] crypto: Fix hungtask for PADATA_RESET
  2023-09-04  5:40     ` Steffen Klassert
@ 2023-09-04  6:58       ` Lu Jialin
  0 siblings, 0 replies; 5+ messages in thread
From: Lu Jialin @ 2023-09-04  6:58 UTC (permalink / raw)
  To: Steffen Klassert, Guozihua (Scott)
  Cc: Herbert Xu, Daniel Jordan, David S . Miller, linux-crypto

Thanks for your suggestion. I will update the patch and remove retry in v3.

On 2023/9/4 13:40, Steffen Klassert wrote:
> On Fri, Sep 01, 2023 at 10:28:08AM +0800, Guozihua (Scott) wrote:
>> On 2023/8/23 17:28, Herbert Xu wrote:
>>> On Wed, Aug 23, 2023 at 07:30:47AM +0000, Lu Jialin wrote:
>>>>
>>>> In order to resolve the problem, we retry at most 5 times when
>>>> padata_do_parallel return -EBUSY. For more than 5 times, we replace the
>>>> return err -EBUSY with -EAGAIN, which means parallel_data is changing, and
>>>> the caller should call it again.
>>>
>>> Steffen, should we retry this at all? Or should it just fail as it
>>> did before?
>>>
>>> Thanks,
>>
>> It should be fine if we don't retry and just fail with -EAGAIN and let
>> caller handles it. It should not break the meaning of the error code.
> 
> Just failing without a retry should be ok.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-09-04  6:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-23  7:30 [PATCH v2] crypto: Fix hungtask for PADATA_RESET Lu Jialin
2023-08-23  9:28 ` Herbert Xu
2023-09-01  2:28   ` Guozihua (Scott)
2023-09-04  5:40     ` Steffen Klassert
2023-09-04  6:58       ` Lu Jialin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox