Linux-RISC-V Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Paul Walmsley <pjw@kernel.org>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Kees Cook <kees@kernel.org>,
	"Gustavo A. R. Silva" <gustavoars@kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Mark Rutland <mark.rutland@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Jeremy Linton <jeremy.linton@arm.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
	linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
	linux-s390@vger.kernel.org, linux-hardening@vger.kernel.org
Subject: Re: [PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline
Date: Mon, 5 Jan 2026 10:36:34 +0000	[thread overview]
Message-ID: <3062af1d-48b3-4d64-8528-3470e07069bb@arm.com> (raw)
In-Reply-To: <563a5d0d-c27a-45de-9495-a82403026886@kernel.org>

On 03/01/2026 08:00, Christophe Leroy (CS GROUP) wrote:
> 
> 
> Le 02/01/2026 à 15:09, Ryan Roberts a écrit :
>> On 02/01/2026 13:39, Jason A. Donenfeld wrote:
>>> Hi Ryan,
>>>
>>> On Fri, Jan 2, 2026 at 2:12 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>> context. Given the function is just a handful of operations and doesn't
>>>
>>> How many? What's this looking like in terms of assembly?
>>
>> 25 instructions on arm64:
> 
> 31 instructions on powerpc:
> 
> 00000000 <prandom_u32_state>:
>    0:    7c 69 1b 78     mr      r9,r3
>    4:    80 63 00 00     lwz     r3,0(r3)
>    8:    80 89 00 08     lwz     r4,8(r9)
>    c:    81 69 00 04     lwz     r11,4(r9)
>   10:    80 a9 00 0c     lwz     r5,12(r9)
>   14:    54 67 30 32     slwi    r7,r3,6
>   18:    7c e7 1a 78     xor     r7,r7,r3
>   1c:    55 66 10 3a     slwi    r6,r11,2
>   20:    54 88 68 24     slwi    r8,r4,13
>   24:    54 63 90 18     rlwinm  r3,r3,18,0,12
>   28:    7d 6b 32 78     xor     r11,r11,r6
>   2c:    7d 08 22 78     xor     r8,r8,r4
>   30:    54 aa 18 38     slwi    r10,r5,3
>   34:    54 e7 9b 7e     srwi    r7,r7,13
>   38:    7c e7 1a 78     xor     r7,r7,r3
>   3c:    51 66 2e fe     rlwimi  r6,r11,5,27,31
>   40:    54 84 38 28     rlwinm  r4,r4,7,0,20
>   44:    7d 4a 2a 78     xor     r10,r10,r5
>   48:    55 08 5d 7e     srwi    r8,r8,21
>   4c:    7d 08 22 78     xor     r8,r8,r4
>   50:    7c e3 32 78     xor     r3,r7,r6
>   54:    54 a5 68 16     rlwinm  r5,r5,13,0,11
>   58:    55 4a a3 3e     srwi    r10,r10,12
>   5c:    7d 4a 2a 78     xor     r10,r10,r5
>   60:    7c 63 42 78     xor     r3,r3,r8
>   64:    90 e9 00 00     stw     r7,0(r9)
>   68:    90 c9 00 04     stw     r6,4(r9)
>   6c:    91 09 00 08     stw     r8,8(r9)
>   70:    91 49 00 0c     stw     r10,12(r9)
>   74:    7c 63 52 78     xor     r3,r3,r10
>   78:    4e 80 00 20     blr
> 
> Among those, 8 instructions are for reading/writing the state in stack. They of
> course disappear when inlining.
> 
>>
>>> It'd also be
>>> nice to have some brief analysis of other call sites to have
>>> confirmation this isn't blowing up other users.
>>
>> I compiled defconfig before and after this patch on arm64 and compared the text
>> sizes:
>>
>> $ ./scripts/bloat-o-meter -t vmlinux.before vmlinux.after
>> add/remove: 3/4 grow/shrink: 4/1 up/down: 836/-128 (708)
>> Function                                     old     new   delta
>> prandom_seed_full_state                      364     932    +568
>> pick_next_task_fair                         1940    2036     +96
>> bpf_user_rnd_u32                             104     196     +92
>> prandom_bytes_state                          204     260     +56
>> e843419@0f2b_00012d69_e34                      -       8      +8
>> e843419@0db7_00010ec3_23ec                     -       8      +8
>> e843419@02cb_00003767_25c                      -       8      +8
>> bpf_prog_select_runtime                      448     444      -4
>> e843419@0aa3_0000cfd1_1580                     8       -      -8
>> e843419@0aa2_0000cfba_147c                     8       -      -8
>> e843419@075f_00008d8c_184                      8       -      -8
>> prandom_u32_state                            100       -    -100
>> Total: Before=19078072, After=19078780, chg +0.00%
>>
>> So 708 bytes more after inlining. The main cost is prandom_seed_full_state(),
>> which calls prandom_u32_state() 10 times (via prandom_warmup()). I expect we
>> could turn that into a loop to reduce ~450 bytes overall.
>>
> With following change the increase of prandom_seed_full_state() remains
> reasonnable and performance wise it is a lot better as it avoids the read/write
> of the state via the stack
> 
> diff --git a/lib/random32.c b/lib/random32.c
> index 24e7acd9343f6..28a5b109c9018 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -94,17 +94,11 @@ EXPORT_SYMBOL(prandom_bytes_state);
> 
>  static void prandom_warmup(struct rnd_state *state)
>  {
> +    int i;
> +
>      /* Calling RNG ten times to satisfy recurrence condition */
> -    prandom_u32_state(state);
> -    prandom_u32_state(state);
> -    prandom_u32_state(state);
> -    prandom_u32_state(state);
> -    prandom_u32_state(state);
> -    prandom_u32_state(state);
> -    prandom_u32_state(state);
> -    prandom_u32_state(state);
> -    prandom_u32_state(state);
> -    prandom_u32_state(state);
> +    for (i = 0; i < 10; i++)
> +        prandom_u32_state(state);
>  }
> 
>  void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state)
> 
> The loop is:
> 
>  248:    38 e0 00 0a     li      r7,10
>  24c:    7c e9 03 a6     mtctr   r7
>  250:    55 05 30 32     slwi    r5,r8,6
>  254:    55 46 68 24     slwi    r6,r10,13
>  258:    55 27 18 38     slwi    r7,r9,3
>  25c:    7c a5 42 78     xor     r5,r5,r8
>  260:    7c c6 52 78     xor     r6,r6,r10
>  264:    7c e7 4a 78     xor     r7,r7,r9
>  268:    54 8b 10 3a     slwi    r11,r4,2
>  26c:    7d 60 22 78     xor     r0,r11,r4
>  270:    54 a5 9b 7e     srwi    r5,r5,13
>  274:    55 08 90 18     rlwinm  r8,r8,18,0,12
>  278:    54 c6 5d 7e     srwi    r6,r6,21
>  27c:    55 4a 38 28     rlwinm  r10,r10,7,0,20
>  280:    54 e7 a3 3e     srwi    r7,r7,12
>  284:    55 29 68 16     rlwinm  r9,r9,13,0,11
>  288:    7d 64 5b 78     mr      r4,r11
>  28c:    7c a8 42 78     xor     r8,r5,r8
>  290:    7c ca 52 78     xor     r10,r6,r10
>  294:    7c e9 4a 78     xor     r9,r7,r9
>  298:    50 04 2e fe     rlwimi  r4,r0,5,27,31
>  29c:    42 00 ff b4     bdnz    250 <prandom_seed_full_state+0x7c>
> 
> Which replaces the 10 calls to prandom_u32_state()
> 
>   fc:    91 3f 00 0c     stw     r9,12(r31)
>  100:    7f e3 fb 78     mr      r3,r31
>  104:    48 00 00 01     bl      104 <prandom_seed_full_state+0x88>
>             104: R_PPC_REL24    prandom_u32_state
>  108:    7f e3 fb 78     mr      r3,r31
>  10c:    48 00 00 01     bl      10c <prandom_seed_full_state+0x90>
>             10c: R_PPC_REL24    prandom_u32_state
>  110:    7f e3 fb 78     mr      r3,r31
>  114:    48 00 00 01     bl      114 <prandom_seed_full_state+0x98>
>             114: R_PPC_REL24    prandom_u32_state
>  118:    7f e3 fb 78     mr      r3,r31
>  11c:    48 00 00 01     bl      11c <prandom_seed_full_state+0xa0>
>             11c: R_PPC_REL24    prandom_u32_state
>  120:    7f e3 fb 78     mr      r3,r31
>  124:    48 00 00 01     bl      124 <prandom_seed_full_state+0xa8>
>             124: R_PPC_REL24    prandom_u32_state
>  128:    7f e3 fb 78     mr      r3,r31
>  12c:    48 00 00 01     bl      12c <prandom_seed_full_state+0xb0>
>             12c: R_PPC_REL24    prandom_u32_state
>  130:    7f e3 fb 78     mr      r3,r31
>  134:    48 00 00 01     bl      134 <prandom_seed_full_state+0xb8>
>             134: R_PPC_REL24    prandom_u32_state
>  138:    7f e3 fb 78     mr      r3,r31
>  13c:    48 00 00 01     bl      13c <prandom_seed_full_state+0xc0>
>             13c: R_PPC_REL24    prandom_u32_state
>  140:    7f e3 fb 78     mr      r3,r31
>  144:    48 00 00 01     bl      144 <prandom_seed_full_state+0xc8>
>             144: R_PPC_REL24    prandom_u32_state
>  148:    80 01 00 24     lwz     r0,36(r1)
>  14c:    7f e3 fb 78     mr      r3,r31
>  150:    83 e1 00 1c     lwz     r31,28(r1)
>  154:    7c 08 03 a6     mtlr    r0
>  158:    38 21 00 20     addi    r1,r1,32
>  15c:    48 00 00 00     b       15c <prandom_seed_full_state+0xe0>
>             15c: R_PPC_REL24    prandom_u32_state
> 
> 
> So approx the same number of instructions in size, while better performance.
> 
>> I'm not really sure if 708 is good or bad...
> 
> That's in the noise compared to the overall size of vmlinux, but if we change it
> to a loop we also reduce pressure on the cache.

Thanks for the analysis; I'm going to follow David's suggestion and refactor
this into both an __always_inline and an out-of-line version. That way the
existing callsites can continue to use the out-of-line version and we will only
use the inline version for the kstack offset randomization.

Thanks,
Ryan

> 
> Christophe


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2026-01-05 10:36 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-02 13:11 [PATCH v3 0/3] Fix bugs and performance of kstack offset randomisation Ryan Roberts
2026-01-02 13:11 ` [PATCH v3 1/3] randomize_kstack: Maintain kstack_offset per task Ryan Roberts
2026-01-02 22:44   ` David Laight
2026-01-05 10:30     ` Ryan Roberts
2026-01-19 10:23   ` Mark Rutland
2026-01-02 13:11 ` [PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline Ryan Roberts
2026-01-02 13:39   ` Jason A. Donenfeld
2026-01-02 14:09     ` Ryan Roberts
2026-01-03  8:00       ` Christophe Leroy (CS GROUP)
2026-01-05 10:36         ` Ryan Roberts [this message]
2026-01-03 10:46       ` David Laight
2026-01-05 10:34         ` Ryan Roberts
2026-01-02 22:54     ` David Laight
2026-01-19 10:26   ` Mark Rutland
2026-01-02 13:11 ` [PATCH v3 3/3] randomize_kstack: Unify random source across arches Ryan Roberts
2026-01-04 23:01   ` David Laight
2026-01-05 11:05     ` Ryan Roberts
2026-01-05 14:45       ` David Laight
2026-01-07 14:05     ` David Laight
2026-01-12 12:26       ` Ryan Roberts
2026-01-12 13:36         ` David Laight
2026-01-19 10:48   ` Mark Rutland
2026-01-19 10:52 ` [PATCH v3 0/3] Fix bugs and performance of kstack offset randomisation Mark Rutland
2026-01-19 12:22   ` David Laight
2026-01-19 12:58     ` Ryan Roberts
2026-01-19 12:59   ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3062af1d-48b3-4d64-8528-3470e07069bb@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=Jason@zx2c4.com \
    --cc=agordeev@linux.ibm.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=chleroy@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=gor@linux.ibm.com \
    --cc=gustavoars@kernel.org \
    --cc=hca@linux.ibm.com \
    --cc=jeremy.linton@arm.com \
    --cc=kees@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=loongarch@lists.linux.dev \
    --cc=maddy@linux.ibm.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=palmer@dabbelt.com \
    --cc=pjw@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox