public inbox for linux-crypto@vger.kernel.org
 help / color / mirror / Atom feed
* [linus:master] [lib/crypto]  e5046823f8: stress-ng.urandom.ops_per_sec 4.3% regression
@ 2026-04-15  8:45 kernel test robot
  2026-04-15 17:47 ` Eric Biggers
  0 siblings, 1 reply; 3+ messages in thread
From: kernel test robot @ 2026-04-15  8:45 UTC (permalink / raw)
  To: Eric Biggers; +Cc: oe-lkp, lkp, Ard Biesheuvel, linux-crypto, oliver.sang



Hello,

kernel test robot noticed a 4.3% regression of stress-ng.urandom.ops_per_sec on:


commit: e5046823f8fa3677341b541a25af2fcb99a5b1e0 ("lib/crypto: chacha: Zeroize permuted_state before it leaves scope")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[still regression on linus/master      e774d5f1bc27a85f858bce7688509e866f8e8a4e]
[still regression on linux-next/master 66672af7a095d89f082c5327f3b15bc2f93d558e]

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: urandom
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202604151657.8e26ef70-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260415/202604151657.8e26ef70-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-icl-2sp8/urandom/stress-ng/60s

commit: 
  v7.0-rc5
  e5046823f8 ("lib/crypto: chacha: Zeroize permuted_state before it leaves scope")

        v7.0-rc5 e5046823f8fa3677341b541a25a 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  92086428            -4.3%   88086482        stress-ng.time.minor_page_faults
      1869            -5.2%       1773        stress-ng.urandom.million_random_bits_per_sec
     94288            -4.3%      90188        stress-ng.urandom.million_random_bits_read
  92078928            -4.3%   88078900        stress-ng.urandom.ops
   1534939            -4.3%    1468337        stress-ng.urandom.ops_per_sec
 1.198e+08 ±  2%     +14.2%  1.368e+08 ± 12%  cpuidle..time
    177224            -1.8%     173977        vmstat.system.in
      1.87 ± 29%      +1.3        3.14        mpstat.cpu.all.idle%
      0.43 ±  2%      -0.0        0.41 ±  2%  mpstat.cpu.all.soft%
      2.99 ±  2%      +0.4        3.40 ± 11%  turbostat.C1%
      2.86 ±  2%     +14.3%       3.26 ± 12%  turbostat.CPU%c1
 2.114e+10            -4.4%  2.021e+10        perf-stat.i.branch-instructions
      0.34            +0.0        0.35        perf-stat.i.branch-miss-rate%
  68731688            -3.1%   66627770        perf-stat.i.cache-references
      0.62            +4.1%       0.65        perf-stat.i.cpi
 3.538e+11            -4.2%   3.39e+11        perf-stat.i.instructions
      1.61            -4.0%       1.54        perf-stat.i.ipc
     70.83            -4.4%      67.69        perf-stat.i.metric.K/sec
   1510879            -4.4%    1443951        perf-stat.i.minor-faults
   3020690            -4.4%    2886892        perf-stat.i.page-faults
      0.30            +0.0        0.30        perf-stat.overall.branch-miss-rate%
      0.62            +4.1%       0.65        perf-stat.overall.cpi
      1.61            -4.0%       1.54        perf-stat.overall.ipc
 2.081e+10            -4.4%   1.99e+10        perf-stat.ps.branch-instructions
  68408155            -3.2%   66202291        perf-stat.ps.cache-references
 3.483e+11            -4.2%  3.337e+11        perf-stat.ps.instructions
   1487270            -4.4%    1421509        perf-stat.ps.minor-faults
   2973479            -4.4%    2842012        perf-stat.ps.page-faults
 2.135e+13            -3.8%  2.053e+13        perf-stat.total.instructions
     57.02            -2.9       54.15        perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.get_random_bytes_user.vfs_read.ksys_read
     13.65            -0.5       13.12        perf-profile.calltrace.cycles-pp._copy_to_iter.get_random_bytes_user.vfs_read.ksys_read.do_syscall_64
      4.78            -0.1        4.65        perf-profile.calltrace.cycles-pp.__mmap
      4.25            -0.1        4.13        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      4.30            -0.1        4.17        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
      5.35            -0.1        5.26        perf-profile.calltrace.cycles-pp.__munmap
      5.11            -0.1        5.02        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
      5.08            -0.1        5.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      5.01            -0.1        4.92        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      4.82            -0.1        4.73        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      4.93            -0.1        4.84        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.02            -0.1        4.93        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      3.06            -0.1        2.98        perf-profile.calltrace.cycles-pp.__mprotect
      2.75            -0.1        2.68        perf-profile.calltrace.cycles-pp.do_mprotect_pkey.__x64_sys_mprotect.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mprotect
      2.84            -0.1        2.77        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mprotect
      2.86            -0.1        2.79        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mprotect
      2.77            -0.1        2.70        perf-profile.calltrace.cycles-pp.__x64_sys_mprotect.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mprotect
      2.15            -0.1        2.08        perf-profile.calltrace.cycles-pp.mprotect_fixup.do_mprotect_pkey.__x64_sys_mprotect.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.33            -0.0        1.28        perf-profile.calltrace.cycles-pp.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.__get_unmapped_area.do_mmap
      1.14            -0.0        1.10        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      1.28            -0.0        1.24        perf-profile.calltrace.cycles-pp.vma_modify.vma_modify_flags.mprotect_fixup.do_mprotect_pkey.__x64_sys_mprotect
      1.33            -0.0        1.29        perf-profile.calltrace.cycles-pp.vma_modify_flags.mprotect_fixup.do_mprotect_pkey.__x64_sys_mprotect.do_syscall_64
      1.04            -0.0        1.00        perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      1.35            -0.0        1.31        perf-profile.calltrace.cycles-pp.vm_unmapped_area.arch_get_unmapped_area_topdown.__get_unmapped_area.do_mmap.vm_mmap_pgoff
      1.12            -0.0        1.09        perf-profile.calltrace.cycles-pp.__split_vma.vma_modify.vma_modify_flags.mprotect_fixup.do_mprotect_pkey
      0.83            -0.0        0.80        perf-profile.calltrace.cycles-pp.__get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.78            -0.0        0.76        perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.__get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
      1.37            -0.0        1.35        perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.crng_fast_key_erasure.crng_make_state.get_random_bytes_user
      0.59            -0.0        0.57        perf-profile.calltrace.cycles-pp.__x64_sys_pselect6.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.82            +0.1        1.88        perf-profile.calltrace.cycles-pp.crng_make_state.get_random_bytes_user.vfs_read.ksys_read.do_syscall_64
      1.73            +0.1        1.79        perf-profile.calltrace.cycles-pp.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.vfs_read.ksys_read
      1.54            +0.1        1.60        perf-profile.calltrace.cycles-pp.chacha_block_generic.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.vfs_read
     83.26            +0.4       83.62        perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
     84.30            +0.4       84.66        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
     84.21            +0.4       84.57        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
     83.12            +0.4       83.48        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
     82.09            +0.4       82.49        perf-profile.calltrace.cycles-pp.get_random_bytes_user.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
     64.23            +0.9       65.16        perf-profile.calltrace.cycles-pp.chacha_block_generic.get_random_bytes_user.vfs_read.ksys_read.do_syscall_64
     58.57            -2.9       55.67        perf-profile.children.cycles-pp.chacha_permute
     15.25            -0.6       14.69        perf-profile.children.cycles-pp._copy_to_iter
      4.91            -0.1        4.78        perf-profile.children.cycles-pp.__mmap
      4.00            -0.1        3.89        perf-profile.children.cycles-pp.vm_mmap_pgoff
      3.76            -0.1        3.66        perf-profile.children.cycles-pp.do_mmap
      5.42            -0.1        5.32        perf-profile.children.cycles-pp.__munmap
      5.02            -0.1        4.93        perf-profile.children.cycles-pp.__vm_munmap
      5.02            -0.1        4.93        perf-profile.children.cycles-pp.__x64_sys_munmap
      4.93            -0.1        4.84        perf-profile.children.cycles-pp.do_vmi_munmap
      4.82            -0.1        4.73        perf-profile.children.cycles-pp.do_vmi_align_munmap
      3.13            -0.1        3.05        perf-profile.children.cycles-pp.__mprotect
      2.77            -0.1        2.70        perf-profile.children.cycles-pp.do_mprotect_pkey
      2.78            -0.1        2.70        perf-profile.children.cycles-pp.__x64_sys_mprotect
      2.16            -0.1        2.10        perf-profile.children.cycles-pp.mprotect_fixup
      1.34            -0.0        1.30        perf-profile.children.cycles-pp.unmapped_area_topdown
      1.50            -0.0        1.46        perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      1.59            -0.0        1.55        perf-profile.children.cycles-pp.__get_unmapped_area
      1.15            -0.0        1.11        perf-profile.children.cycles-pp.ksys_mmap_pgoff
      1.35            -0.0        1.32        perf-profile.children.cycles-pp.vm_unmapped_area
      1.28            -0.0        1.25        perf-profile.children.cycles-pp.vma_modify
      1.33            -0.0        1.30        perf-profile.children.cycles-pp.vma_modify_flags
      1.14            -0.0        1.11        perf-profile.children.cycles-pp.__split_vma
      0.82 ±  2%      -0.0        0.80        perf-profile.children.cycles-pp.mas_empty_area_rev
      1.06            -0.0        1.04        perf-profile.children.cycles-pp.mas_find
      1.02            -0.0        1.00        perf-profile.children.cycles-pp.clear_bhb_loop
      0.38            -0.0        0.36 ±  2%  perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
      0.60            -0.0        0.58        perf-profile.children.cycles-pp.__x64_sys_pselect6
      0.50            -0.0        0.48        perf-profile.children.cycles-pp.ioctl
      0.63            -0.0        0.61        perf-profile.children.cycles-pp.mas_next_slot
      0.07            -0.0        0.06        perf-profile.children.cycles-pp.vma_set_page_prot
      1.83            +0.1        1.89        perf-profile.children.cycles-pp.crng_make_state
      1.74            +0.1        1.80        perf-profile.children.cycles-pp.crng_fast_key_erasure
     83.14            +0.4       83.50        perf-profile.children.cycles-pp.vfs_read
     83.27            +0.4       83.64        perf-profile.children.cycles-pp.ksys_read
     82.61            +0.4       82.99        perf-profile.children.cycles-pp.get_random_bytes_user
     66.11            +1.0       67.10        perf-profile.children.cycles-pp.chacha_block_generic
     57.97            -2.9       55.10        perf-profile.self.cycles-pp.chacha_permute
     10.80            -0.5       10.27        perf-profile.self.cycles-pp._copy_to_iter
      1.31            -0.1        1.26        perf-profile.self.cycles-pp.get_random_bytes_user
      1.01            -0.0        0.98        perf-profile.self.cycles-pp.clear_bhb_loop
      0.49            -0.0        0.47        perf-profile.self.cycles-pp.mas_rev_awalk
      0.22 ±  2%      -0.0        0.21        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
      7.32            +3.9       11.18        perf-profile.self.cycles-pp.chacha_block_generic




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [linus:master] [lib/crypto]  e5046823f8: stress-ng.urandom.ops_per_sec 4.3% regression
  2026-04-15  8:45 [linus:master] [lib/crypto] e5046823f8: stress-ng.urandom.ops_per_sec 4.3% regression kernel test robot
@ 2026-04-15 17:47 ` Eric Biggers
  2026-04-16  0:45   ` Jason A. Donenfeld
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Biggers @ 2026-04-15 17:47 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, Ard Biesheuvel, linux-crypto, Jason A. Donenfeld,
	Theodore Ts'o

[+Cc Jason and Ted]

On Wed, Apr 15, 2026 at 04:45:48PM +0800, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a 4.3% regression of stress-ng.urandom.ops_per_sec on:
> 
> 
> commit: e5046823f8fa3677341b541a25af2fcb99a5b1e0 ("lib/crypto: chacha: Zeroize permuted_state before it leaves scope")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

This commit fixed the forward secrecy of the RNG, so it needed to go in.

For large RNG requests, we could get most of this performance back by
refactoring the chacha20_block() API to move the allocation of the
temporary state array into the caller.

We could also get much better performance than before by using the
architecture-optimized ChaCha20 code instead of the generic ChaCha20
code.

However, neither would be a simple change.

- Eric

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [linus:master] [lib/crypto] e5046823f8: stress-ng.urandom.ops_per_sec 4.3% regression
  2026-04-15 17:47 ` Eric Biggers
@ 2026-04-16  0:45   ` Jason A. Donenfeld
  0 siblings, 0 replies; 3+ messages in thread
From: Jason A. Donenfeld @ 2026-04-16  0:45 UTC (permalink / raw)
  To: Eric Biggers
  Cc: kernel test robot, oe-lkp, lkp, Ard Biesheuvel, linux-crypto,
	Theodore Ts'o

On Wed, Apr 15, 2026 at 7:47 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> [+Cc Jason and Ted]
>
> On Wed, Apr 15, 2026 at 04:45:48PM +0800, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 4.3% regression of stress-ng.urandom.ops_per_sec on:
> >
> >
> > commit: e5046823f8fa3677341b541a25af2fcb99a5b1e0 ("lib/crypto: chacha: Zeroize permuted_state before it leaves scope")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> This commit fixed the forward secrecy of the RNG, so it needed to go in.
>
> For large RNG requests, we could get most of this performance back by
> refactoring the chacha20_block() API to move the allocation of the
> temporary state array into the caller.
>
> We could also get much better performance than before by using the
> architecture-optimized ChaCha20 code instead of the generic ChaCha20
> code.
>
> However, neither would be a simple change.

I saw this commit when you were making it and also benched it and it
didn't seem like a big deal. (Otherwise I would have piped up or tried
to come up with a different solution.) For a while, I was thinking
that arch-optimized code in random.c would be neat, but with
getrandom() being in the vDSO, we already get architecture-optimized
code there, by necessity. So I think practically speaking, this is not
a big deal. I had also looked into what happens to that stack in the
context of the RNG, and it gets pretty quickly corrupted (and
remember, you don't need to erase all of it for it to become
practically non-invertible). But why should we play games with that
sort of thing? Zeroing is the right move.

Jason

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-16  0:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15  8:45 [linus:master] [lib/crypto] e5046823f8: stress-ng.urandom.ops_per_sec 4.3% regression kernel test robot
2026-04-15 17:47 ` Eric Biggers
2026-04-16  0:45   ` Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox