* [linus:master] [lib/crypto] e5046823f8: stress-ng.urandom.ops_per_sec 4.3% regression
@ 2026-04-15 8:45 kernel test robot
2026-04-15 17:47 ` Eric Biggers
0 siblings, 1 reply; 3+ messages in thread
From: kernel test robot @ 2026-04-15 8:45 UTC (permalink / raw)
To: Eric Biggers; +Cc: oe-lkp, lkp, Ard Biesheuvel, linux-crypto, oliver.sang
Hello,
kernel test robot noticed a 4.3% regression of stress-ng.urandom.ops_per_sec on:
commit: e5046823f8fa3677341b541a25af2fcb99a5b1e0 ("lib/crypto: chacha: Zeroize permuted_state before it leaves scope")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[still regression on linus/master e774d5f1bc27a85f858bce7688509e866f8e8a4e]
[still regression on linux-next/master 66672af7a095d89f082c5327f3b15bc2f93d558e]
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: urandom
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202604151657.8e26ef70-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260415/202604151657.8e26ef70-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-icl-2sp8/urandom/stress-ng/60s
commit:
v7.0-rc5
e5046823f8 ("lib/crypto: chacha: Zeroize permuted_state before it leaves scope")
v7.0-rc5 e5046823f8fa3677341b541a25a
---------------- ---------------------------
%stddev %change %stddev
\ | \
92086428 -4.3% 88086482 stress-ng.time.minor_page_faults
1869 -5.2% 1773 stress-ng.urandom.million_random_bits_per_sec
94288 -4.3% 90188 stress-ng.urandom.million_random_bits_read
92078928 -4.3% 88078900 stress-ng.urandom.ops
1534939 -4.3% 1468337 stress-ng.urandom.ops_per_sec
1.198e+08 ± 2% +14.2% 1.368e+08 ± 12% cpuidle..time
177224 -1.8% 173977 vmstat.system.in
1.87 ± 29% +1.3 3.14 mpstat.cpu.all.idle%
0.43 ± 2% -0.0 0.41 ± 2% mpstat.cpu.all.soft%
2.99 ± 2% +0.4 3.40 ± 11% turbostat.C1%
2.86 ± 2% +14.3% 3.26 ± 12% turbostat.CPU%c1
2.114e+10 -4.4% 2.021e+10 perf-stat.i.branch-instructions
0.34 +0.0 0.35 perf-stat.i.branch-miss-rate%
68731688 -3.1% 66627770 perf-stat.i.cache-references
0.62 +4.1% 0.65 perf-stat.i.cpi
3.538e+11 -4.2% 3.39e+11 perf-stat.i.instructions
1.61 -4.0% 1.54 perf-stat.i.ipc
70.83 -4.4% 67.69 perf-stat.i.metric.K/sec
1510879 -4.4% 1443951 perf-stat.i.minor-faults
3020690 -4.4% 2886892 perf-stat.i.page-faults
0.30 +0.0 0.30 perf-stat.overall.branch-miss-rate%
0.62 +4.1% 0.65 perf-stat.overall.cpi
1.61 -4.0% 1.54 perf-stat.overall.ipc
2.081e+10 -4.4% 1.99e+10 perf-stat.ps.branch-instructions
68408155 -3.2% 66202291 perf-stat.ps.cache-references
3.483e+11 -4.2% 3.337e+11 perf-stat.ps.instructions
1487270 -4.4% 1421509 perf-stat.ps.minor-faults
2973479 -4.4% 2842012 perf-stat.ps.page-faults
2.135e+13 -3.8% 2.053e+13 perf-stat.total.instructions
57.02 -2.9 54.15 perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.get_random_bytes_user.vfs_read.ksys_read
13.65 -0.5 13.12 perf-profile.calltrace.cycles-pp._copy_to_iter.get_random_bytes_user.vfs_read.ksys_read.do_syscall_64
4.78 -0.1 4.65 perf-profile.calltrace.cycles-pp.__mmap
4.25 -0.1 4.13 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
4.30 -0.1 4.17 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
5.35 -0.1 5.26 perf-profile.calltrace.cycles-pp.__munmap
5.11 -0.1 5.02 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
5.08 -0.1 5.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
5.01 -0.1 4.92 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
4.82 -0.1 4.73 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
4.93 -0.1 4.84 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.02 -0.1 4.93 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
3.06 -0.1 2.98 perf-profile.calltrace.cycles-pp.__mprotect
2.75 -0.1 2.68 perf-profile.calltrace.cycles-pp.do_mprotect_pkey.__x64_sys_mprotect.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mprotect
2.84 -0.1 2.77 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mprotect
2.86 -0.1 2.79 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mprotect
2.77 -0.1 2.70 perf-profile.calltrace.cycles-pp.__x64_sys_mprotect.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mprotect
2.15 -0.1 2.08 perf-profile.calltrace.cycles-pp.mprotect_fixup.do_mprotect_pkey.__x64_sys_mprotect.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.33 -0.0 1.28 perf-profile.calltrace.cycles-pp.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.__get_unmapped_area.do_mmap
1.14 -0.0 1.10 perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
1.28 -0.0 1.24 perf-profile.calltrace.cycles-pp.vma_modify.vma_modify_flags.mprotect_fixup.do_mprotect_pkey.__x64_sys_mprotect
1.33 -0.0 1.29 perf-profile.calltrace.cycles-pp.vma_modify_flags.mprotect_fixup.do_mprotect_pkey.__x64_sys_mprotect.do_syscall_64
1.04 -0.0 1.00 perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
1.35 -0.0 1.31 perf-profile.calltrace.cycles-pp.vm_unmapped_area.arch_get_unmapped_area_topdown.__get_unmapped_area.do_mmap.vm_mmap_pgoff
1.12 -0.0 1.09 perf-profile.calltrace.cycles-pp.__split_vma.vma_modify.vma_modify_flags.mprotect_fixup.do_mprotect_pkey
0.83 -0.0 0.80 perf-profile.calltrace.cycles-pp.__get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
0.78 -0.0 0.76 perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.__get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
1.37 -0.0 1.35 perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.crng_fast_key_erasure.crng_make_state.get_random_bytes_user
0.59 -0.0 0.57 perf-profile.calltrace.cycles-pp.__x64_sys_pselect6.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.82 +0.1 1.88 perf-profile.calltrace.cycles-pp.crng_make_state.get_random_bytes_user.vfs_read.ksys_read.do_syscall_64
1.73 +0.1 1.79 perf-profile.calltrace.cycles-pp.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.vfs_read.ksys_read
1.54 +0.1 1.60 perf-profile.calltrace.cycles-pp.chacha_block_generic.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.vfs_read
83.26 +0.4 83.62 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
84.30 +0.4 84.66 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
84.21 +0.4 84.57 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
83.12 +0.4 83.48 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
82.09 +0.4 82.49 perf-profile.calltrace.cycles-pp.get_random_bytes_user.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
64.23 +0.9 65.16 perf-profile.calltrace.cycles-pp.chacha_block_generic.get_random_bytes_user.vfs_read.ksys_read.do_syscall_64
58.57 -2.9 55.67 perf-profile.children.cycles-pp.chacha_permute
15.25 -0.6 14.69 perf-profile.children.cycles-pp._copy_to_iter
4.91 -0.1 4.78 perf-profile.children.cycles-pp.__mmap
4.00 -0.1 3.89 perf-profile.children.cycles-pp.vm_mmap_pgoff
3.76 -0.1 3.66 perf-profile.children.cycles-pp.do_mmap
5.42 -0.1 5.32 perf-profile.children.cycles-pp.__munmap
5.02 -0.1 4.93 perf-profile.children.cycles-pp.__vm_munmap
5.02 -0.1 4.93 perf-profile.children.cycles-pp.__x64_sys_munmap
4.93 -0.1 4.84 perf-profile.children.cycles-pp.do_vmi_munmap
4.82 -0.1 4.73 perf-profile.children.cycles-pp.do_vmi_align_munmap
3.13 -0.1 3.05 perf-profile.children.cycles-pp.__mprotect
2.77 -0.1 2.70 perf-profile.children.cycles-pp.do_mprotect_pkey
2.78 -0.1 2.70 perf-profile.children.cycles-pp.__x64_sys_mprotect
2.16 -0.1 2.10 perf-profile.children.cycles-pp.mprotect_fixup
1.34 -0.0 1.30 perf-profile.children.cycles-pp.unmapped_area_topdown
1.50 -0.0 1.46 perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
1.59 -0.0 1.55 perf-profile.children.cycles-pp.__get_unmapped_area
1.15 -0.0 1.11 perf-profile.children.cycles-pp.ksys_mmap_pgoff
1.35 -0.0 1.32 perf-profile.children.cycles-pp.vm_unmapped_area
1.28 -0.0 1.25 perf-profile.children.cycles-pp.vma_modify
1.33 -0.0 1.30 perf-profile.children.cycles-pp.vma_modify_flags
1.14 -0.0 1.11 perf-profile.children.cycles-pp.__split_vma
0.82 ± 2% -0.0 0.80 perf-profile.children.cycles-pp.mas_empty_area_rev
1.06 -0.0 1.04 perf-profile.children.cycles-pp.mas_find
1.02 -0.0 1.00 perf-profile.children.cycles-pp.clear_bhb_loop
0.38 -0.0 0.36 ± 2% perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
0.60 -0.0 0.58 perf-profile.children.cycles-pp.__x64_sys_pselect6
0.50 -0.0 0.48 perf-profile.children.cycles-pp.ioctl
0.63 -0.0 0.61 perf-profile.children.cycles-pp.mas_next_slot
0.07 -0.0 0.06 perf-profile.children.cycles-pp.vma_set_page_prot
1.83 +0.1 1.89 perf-profile.children.cycles-pp.crng_make_state
1.74 +0.1 1.80 perf-profile.children.cycles-pp.crng_fast_key_erasure
83.14 +0.4 83.50 perf-profile.children.cycles-pp.vfs_read
83.27 +0.4 83.64 perf-profile.children.cycles-pp.ksys_read
82.61 +0.4 82.99 perf-profile.children.cycles-pp.get_random_bytes_user
66.11 +1.0 67.10 perf-profile.children.cycles-pp.chacha_block_generic
57.97 -2.9 55.10 perf-profile.self.cycles-pp.chacha_permute
10.80 -0.5 10.27 perf-profile.self.cycles-pp._copy_to_iter
1.31 -0.1 1.26 perf-profile.self.cycles-pp.get_random_bytes_user
1.01 -0.0 0.98 perf-profile.self.cycles-pp.clear_bhb_loop
0.49 -0.0 0.47 perf-profile.self.cycles-pp.mas_rev_awalk
0.22 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
7.32 +3.9 11.18 perf-profile.self.cycles-pp.chacha_block_generic
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [linus:master] [lib/crypto] e5046823f8: stress-ng.urandom.ops_per_sec 4.3% regression
2026-04-15 8:45 [linus:master] [lib/crypto] e5046823f8: stress-ng.urandom.ops_per_sec 4.3% regression kernel test robot
@ 2026-04-15 17:47 ` Eric Biggers
2026-04-16 0:45 ` Jason A. Donenfeld
0 siblings, 1 reply; 3+ messages in thread
From: Eric Biggers @ 2026-04-15 17:47 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, Ard Biesheuvel, linux-crypto, Jason A. Donenfeld,
Theodore Ts'o
[+Cc Jason and Ted]
On Wed, Apr 15, 2026 at 04:45:48PM +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 4.3% regression of stress-ng.urandom.ops_per_sec on:
>
>
> commit: e5046823f8fa3677341b541a25af2fcb99a5b1e0 ("lib/crypto: chacha: Zeroize permuted_state before it leaves scope")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
This commit fixed the forward secrecy of the RNG, so it needed to go in.
For large RNG requests, we could get most of this performance back by
refactoring the chacha20_block() API to move the allocation of the
temporary state array into the caller.
We could also get much better performance than before by using the
architecture-optimized ChaCha20 code instead of the generic ChaCha20
code.
However, neither would be a simple change.
- Eric
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [linus:master] [lib/crypto] e5046823f8: stress-ng.urandom.ops_per_sec 4.3% regression
2026-04-15 17:47 ` Eric Biggers
@ 2026-04-16 0:45 ` Jason A. Donenfeld
0 siblings, 0 replies; 3+ messages in thread
From: Jason A. Donenfeld @ 2026-04-16 0:45 UTC (permalink / raw)
To: Eric Biggers
Cc: kernel test robot, oe-lkp, lkp, Ard Biesheuvel, linux-crypto,
Theodore Ts'o
On Wed, Apr 15, 2026 at 7:47 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> [+Cc Jason and Ted]
>
> On Wed, Apr 15, 2026 at 04:45:48PM +0800, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 4.3% regression of stress-ng.urandom.ops_per_sec on:
> >
> >
> > commit: e5046823f8fa3677341b541a25af2fcb99a5b1e0 ("lib/crypto: chacha: Zeroize permuted_state before it leaves scope")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> This commit fixed the forward secrecy of the RNG, so it needed to go in.
>
> For large RNG requests, we could get most of this performance back by
> refactoring the chacha20_block() API to move the allocation of the
> temporary state array into the caller.
>
> We could also get much better performance than before by using the
> architecture-optimized ChaCha20 code instead of the generic ChaCha20
> code.
>
> However, neither would be a simple change.
I saw this commit when you were making it and also benched it and it
didn't seem like a big deal. (Otherwise I would have piped up or tried
to come up with a different solution.) For a while, I was thinking
that arch-optimized code in random.c would be neat, but with
getrandom() being in the vDSO, we already get architecture-optimized
code there, by necessity. So I think practically speaking, this is not
a big deal. I had also looked into what happens to that stack in the
context of the RNG, and it gets pretty quickly corrupted (and
remember, you don't need to erase all of it for it to become
practically non-invertible). But why should we play games with that
sort of thing? Zeroing is the right move.
Jason
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-16 0:45 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15 8:45 [linus:master] [lib/crypto] e5046823f8: stress-ng.urandom.ops_per_sec 4.3% regression kernel test robot
2026-04-15 17:47 ` Eric Biggers
2026-04-16 0:45 ` Jason A. Donenfeld
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox