public inbox for linux-crypto@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] More issues with arm/aes-neonbs
@ 2024-08-05 21:42 Russell King (Oracle)
  2024-08-06 10:35 ` Herbert Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Russell King (Oracle) @ 2024-08-05 21:42 UTC (permalink / raw)
  To: Horia Geantă, Ard Biesheuvel
  Cc: Herbert Xu, David S. Miller, linux-crypto

Hi,

I see there have been multiple attempts to fix this module, but sadly
it seems that the problems persist and are not fixed.

On my i.MX6 platforms, since 6.9, I enabled aes-arm-bs support, and
I've since been getting a load of hung tasks at boot. I've started to
try to debug this evening under 6.10 - involving hacking the kernel
code to try and get useful information out of the kernel. I've ended
up dumping the entire state of all threads when the hung task fires.

What I find is this - the aes-arm-neonbs module is being probed, and
this is its trace:

[   74.803096] task:modprobe        state:D stack:0     pid:613   tgid:613   ppid:37     flags:0x00000000
[   74.812620] Call trace:
[   74.812636] [<c0b784cc>] (__schedule) from [<c0b78bbc>] (schedule+0x50/0x128)
[   74.822586] [<c0b78bbc>] (schedule) from [<c0b82fac>] (schedule_timeout+0xb0/0x1b8)
[   74.830444] [<c0b82fac>] (schedule_timeout) from [<c0b79420>] (__wait_for_common+0x74/0x170)
[   74.839110] [<c0b79420>] (__wait_for_common) from [<c0488b8c>] (crypto_larval_wait+0x14/0x98)
[   74.847852] [<c0488b8c>] (crypto_larval_wait) from [<c0488e14>] (crypto_alg_mod_lookup+0x204/0x20c)
[   74.857118] [<c0488e14>] (crypto_alg_mod_lookup) from [<c0488f5c>] (crypto_alloc_tfm_node+0x48/0xb4)
[   74.866468] [<c0488f5c>] (crypto_alloc_tfm_node) from [<c048c478>] (crypto_alloc_skcipher+0x28/0x30)
[   74.875857] [<c048c478>] (crypto_alloc_skcipher) from [<bf3e88b8>] (cbc_init+0x1c/0x38 [aes_arm_bs])
[   74.885264] [<bf3e88b8>] (cbc_init [aes_arm_bs]) from [<c04889c0>] (crypto_create_tfm_node+0x34/0xd4)
[   74.894736] [<c04889c0>] (crypto_create_tfm_node) from [<c0488f74>] (crypto_alloc_tfm_node+0x60/0xb4)
[   74.894770] [<c0488f74>] (crypto_alloc_tfm_node) from [<c048c478>] (crypto_alloc_skcipher+0x28/0x30)
[   74.894800] [<c048c478>] (crypto_alloc_skcipher) from [<bf3de61c>] (simd_skcipher_create_compat+0x20/0x17c [crypto_simd])
[   74.894849] [<bf3de61c>] (simd_skcipher_create_compat [crypto_simd]) from [<bf3ef06c>] (aes_init+0x6c/0x1000 [aes_arm_bs])
[   74.894896] [<bf3ef06c>] (aes_init [aes_arm_bs]) from [<c0009ffc>] (do_one_initcall+0x60/0x2c0)
[   74.894933] [<c0009ffc>] (do_one_initcall) from [<c00e6640>] (do_init_module+0x54/0x1fc)
[   74.894962] [<c00e6640>] (do_init_module) from [<c00e8644>] (init_module_from_file+0x84/0xa4)
[   74.961860] [<c00e8644>] (init_module_from_file) from [<c00e892c>] (sys_finit_module+0x170/0x21c)
[   74.961897] [<c00e892c>] (sys_finit_module) from [<c0008320>] (ret_fast_syscall+0x0/0x1c)

What seems to be happening here is that we have registered all the
main ciphers using crypto_register_skciphers(), and then we walk the
array of algos, calling simd_skcipher_create_compat() on each.

We get to the __cbc(aes) entry, and this one seems to trigger the
larval_wait thing. With debug in crypto_alg_mod_lookup(), I find
this:

[   25.131852] modprobe:613: crypto_alg_mod_lookup: name=cbc(aes) type=0x5 mask=0x218e ok=32769
...
[   87.015070]   name=cbc(aes) alg=0xffffff92

and 0xffffff92 is an error-pointer for ETIMEDOUT.

i.MX6 does have the CAAM hardware that can do cbc(aes), so thinking
that may be the issue, I decided to try blacklisting the CAAM modules.
This made no difference.

It seems that the issue is centred around the aes-arm-bs module. Even
after boot, and having removed the module, manually reloading it also
causes the same problem:

# time modprobe aes-arm-bs
modprobe: ERROR: could not insert 'aes_arm_bs': Connection timed out

real    1m1.731s
user    0m0.004s
sys     0m0.052s

The interesting thing is... if I blacklist the aes-arm module, then
aes-arm-bs doesn't behave this way and loads successfully. If I pre-
load the aes-arm module, then the hanging behaviour returns.

So... with my debug in place, loading aes-arm-bs with aes-arm
blacklisted gives me:

[ 4289.026431] modprobe:1786: crypto_alg_mod_lookup: name=cbc(aes) type=0x5 mask=0x218e ok=32769
[ 4289.084516] cryptomgr_probe:1788: crypto_alg_mod_lookup: name=aes type=0x20004 mask=0x218f ok=0
[ 4289.084556]   name=aes alg=0xfffffffe
[ 4289.114602] cryptomgr_probe:1788: crypto_alg_mod_lookup: name=ecb(aes) type=0x20004 mask=0x218f ok=32769
[ 4289.163489] cryptomgr_probe:1793: crypto_alg_mod_lookup: name=aes type=0x20004 mask=0x218f ok=0
[ 4289.163530]   name=aes alg=0xfffffffe
[ 4289.165187]   name=ecb(aes) alg=0xc4b318c0
[ 4289.165367]   name=cbc(aes) alg=0xc4b31cc0

Hence, looking up "aes" returns an immediate -ENOENT (and this is the
only "name" that aes-arm provides.) With aes-arm loaded:

[ 3926.164204] modprobe:1691: crypto_alg_mod_lookup: name=cbc(aes) type=0x5 mask
=0x218e ok=32769
[ 3926.212563] cryptomgr_probe:1693: crypto_alg_mod_lookup: name=aes type=0x2000
4 mask=0x218f ok=0
[ 3926.212605]   name=aes alg=0xfffffffe
[ 3988.209746]   name=cbc(aes) alg=0xffffff92
[ 3988.412691] cryptomgr_probe:1693: crypto_alg_mod_lookup: name=ecb(aes) type=0x20004 mask=0x218f ok=32769
[ 3988.462116] cryptomgr_probe:1708: crypto_alg_mod_lookup: name=aes type=0x20004 mask=0x218f ok=0
[ 3988.462159]   name=aes alg=0xfffffffe
[ 3988.462292]   name=ecb(aes) alg=0xc4b320c0

It's interesting in the case where aes-arm is not loaded that the
cbc(aes) lookup only succeeds _after_ ecb(aes) has, but in the
failing case, we're clearly waiting for cbc(aes) before proceeding
to ecb(aes).

This is about as far as I've managed to get debugging this, and I'm
starting to hit the maze that is crypto probing/manager code that
isn't easy to understand... at least not on a late Monday evening.
Any suggestions?

Right now, though, from what I can see the aes-arm-bs module is
entirely unusable, and the only way I can get a reasonably bootable
system is to avoid loading this module (either by disabling it in
the kernel build or blacklisting it in modprobe - the latter being
my current solutions to this bug.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-10-07  8:31 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-05 21:42 [BUG] More issues with arm/aes-neonbs Russell King (Oracle)
2024-08-06 10:35 ` Herbert Xu
2024-08-08  6:17   ` Herbert Xu
2024-08-08 17:14     ` Linus Torvalds
2024-08-08 18:35       ` Linus Torvalds
2024-08-08 19:54       ` Linus Torvalds
2024-08-09  4:40         ` Herbert Xu
2024-08-09  5:19           ` Linus Torvalds
2024-08-09 16:25             ` Linus Torvalds
2024-08-09  7:50           ` Russell King (Oracle)
2024-08-10  2:41           ` [PATCH 1/3] crypto: api - Remove instance larval fulfilment Herbert Xu
2024-08-16  8:45             ` kernel test robot
2024-08-17  6:56               ` [v3 PATCH " Herbert Xu
2024-08-17  6:57                 ` [v3 PATCH 2/3] crypto: api - Do not wait for tests during registration Herbert Xu
2024-08-17  6:58                   ` [v3 PATCH 3/3] crypto: simd - Do not call crypto_alloc_tfm " Herbert Xu
2024-08-27 18:48                     ` Eric Biggers
2024-08-28  2:59                       ` Herbert Xu
2024-08-30 17:51                         ` Eric Biggers
2024-09-01  8:05                           ` [PATCH] crypto: api - Fix generic algorithm self-test races Herbert Xu
2024-09-02 17:05                             ` Eric Biggers
2024-09-02 23:07                               ` Herbert Xu
2024-10-05 22:24                                 ` Eric Biggers
2024-10-06  0:53                                   ` Herbert Xu
2024-10-06  3:06                                     ` Eric Biggers
2024-10-07  4:32                                       ` Herbert Xu
2024-10-07  7:58                                         ` Herbert Xu
2024-10-07  8:31                                         ` Herbert Xu
2024-08-10  2:42           ` [PATCH 2/3] crypto: api - Do not wait for tests during registration Herbert Xu
2024-08-11 13:30             ` Dan Carpenter
2024-08-12 10:33               ` Herbert Xu
2024-08-12 10:34                 ` [v2 PATCH 1/3] crypto: api - Remove instance larval fulfilment Herbert Xu
2024-08-12 10:35                   ` [v2 PATCH 2/3] crypto: api - Do not wait for tests during registration Herbert Xu
2024-08-12 10:36                     ` [v2 PATCH 3/3] crypto: simd - Do not call crypto_alloc_tfm " Herbert Xu
2024-08-10  2:43           ` [PATCH " Herbert Xu
2024-08-09 18:27     ` [BUG] More issues with arm/aes-neonbs Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox