From: David Laight <david.laight.linux@gmail.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Huacai Chen <chenhuacai@kernel.org>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Paul Walmsley <pjw@kernel.org>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Kees Cook <kees@kernel.org>,
"Gustavo A. R. Silva" <gustavoars@kernel.org>,
Arnd Bergmann <arnd@arndb.de>,
Mark Rutland <mark.rutland@arm.com>,
"Jason A. Donenfeld" <Jason@zx2c4.com>,
Ard Biesheuvel <ardb@kernel.org>,
Jeremy Linton <jeremy.linton@arm.com>,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, linux-hardening@vger.kernel.org
Subject: Re: [PATCH v3 3/3] randomize_kstack: Unify random source across arches
Date: Mon, 5 Jan 2026 14:45:45 +0000 [thread overview]
Message-ID: <20260105144545.45f2b0ba@pumpkin> (raw)
In-Reply-To: <60c5d7b1-1ab7-490c-8cb8-dfd50cf23856@arm.com>
On Mon, 5 Jan 2026 11:05:18 +0000
Ryan Roberts <ryan.roberts@arm.com> wrote:
> On 04/01/2026 23:01, David Laight wrote:
> > On Fri, 2 Jan 2026 13:11:54 +0000
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >
> >> Previously different architectures were using random sources of
> >> differing strength and cost to decide the random kstack offset. A number
> >> of architectures (loongarch, powerpc, s390, x86) were using their
> >> timestamp counter, at whatever the frequency happened to be. Other
> >> arches (arm64, riscv) were using entropy from the crng via
> >> get_random_u16().
> >>
> >> There have been concerns that in some cases the timestamp counters may
> >> be too weak, because they can be easily guessed or influenced by user
> >> space. And get_random_u16() has been shown to be too costly for the
> >> level of protection kstack offset randomization provides.
> >>
> >> So let's use a common, architecture-agnostic source of entropy; a
> >> per-cpu prng, seeded at boot-time from the crng. This has a few
> >> benefits:
> >>
> >> - We can remove choose_random_kstack_offset(); That was only there to
> >> try to make the timestamp counter value a bit harder to influence
> >> from user space.
> >>
> >> - The architecture code is simplified. All it has to do now is call
> >> add_random_kstack_offset() in the syscall path.
> >>
> >> - The strength of the randomness can be reasoned about independently
> >> of the architecture.
> >>
> >> - Arches previously using get_random_u16() now have much faster
> >> syscall paths, see below results.
> >>
> >> There have been some claims that a prng may be less strong than the
> >> timestamp counter if not regularly reseeded. But the prng has a period
> >> of about 2^113. So as long as the prng state remains secret, it should
> >> not be possible to guess. If the prng state can be accessed, we have
> >> bigger problems.
> >
> > If you have 128 bits of output from consecutive outputs I think you
> > can trivially determine the full state using (almost) 'school boy' maths
> > that could be done on pencil and paper.
> > (Most of the work only has to be done once.)
> >
> > The underlying problem is that the TAUSWORTHE() transformation is 'linear'
> > So that TAUSWORTHE(x ^ y) == TAUSWORTHE(x) ^ TAUSWORTHE(y).
> > (This is true of a LFSR/CRC and TOUSWORTH() is doing some subset of CRCs.)
> > This means that each output bit is the 'xor' of some of the input bits.
> > The four new 'state' values are just xor of the the bits of the old ones.
> > The final xor of the four states gives a 32bit value with each bit just
> > an xor of some of the 128 state bits.
> > Get four consecutive 32 bit values and you can solve the 128 simultaneous
> > equations (by trivial substitution) and get the initial state.
> > The solution gives you the 128 128bit constants for:
> > u128 state = 0;
> > u128 val = 'value returned from 4 calls';
> > for (int i = 0; i < 128; i++)
> > state |= parity(const128[i] ^ val) << i;
>
> What is const128[] here?
Some values you prepared earlier :-)
> > You done need all 32bits, just accumulate 128 bits.
> > So if you can get the 5bit stack offset from 26 system calls you know the
> > value that will be used for all the subsequent calls.
>
> It's not immediately obvious to me how user space would do this, but I'll take
> it on faith that it may be possible.
It shouldn't be possible, but anything that leaks a stack address would
give it away.
It is also pretty much why you care about the cycle length of the PRNG.
(If the length is short a rogue application can remember all the values.)
> >
> > Simply changing the final line to use + not ^ makes the output non-linear
> > and solving the equations a lot harder.
>
> There has been pushback on introducing new primitives [1] but I don't think
> that's a reason not to considder it.
That is a more general issue with the PRNG.
ISTR it was true for the previous version that explicitly used four CRC.
Jason should know more about whether the xor are a good idea.
>
> [1] https://lore.kernel.org/all/aRyppb8PCxFKVphr@zx2c4.com/
>
> >
> > I might sit down tomorrow and see if I can actually code it...
>
> Thanks for the analysis! I look forward to seeing your conclusion... although
> I'm not sure I'll be qualified to evaluate it mathematically.
I need to drag out the brian cells from when I learnt about CRC (actually
relating to burst error correction) over 40 years ago...
> FWIW, I previously had a go at various schemes using siphash to calculate some
> random bits. I found it to be significantly slower than this prng. I've so far
> taken the view that 6 bits of randomness is not much of a defence against brute
> force so we really shouldn't be spending too many cycles to generate the bits.
> If we can get to approach to work, I think that's best.
Indeed.
A single 32bit CRC using (crc + (crc >> 16)) & 0x3f could be 'good enough'.
Especially if the value is 'perturbed' during (say) context switch.
The '16' might need adjusting for the actual CRC, especially if TAUSWORTHE()
is used - you don't want the value to match one of the shifts it uses.
prandom_u32_state() is defined as:
#define TAUSWORTHE(s, a, b, c, d) ((s & c) << d) ^ (((s << a) ^ s) >> b)
state->s1 = TAUSWORTHE(state->s1, 6U, 13U, 4294967294U, 18U);
state->s2 = TAUSWORTHE(state->s2, 2U, 27U, 4294967288U, 2U);
state->s3 = TAUSWORTHE(state->s3, 13U, 21U, 4294967280U, 7U);
state->s4 = TAUSWORTHE(state->s4, 3U, 12U, 4294967168U, 13U);
return (state->s1 ^ state->s2 ^ state->s3 ^ state->s4);
This is equivalent to:
#define TAUSWORTHE(s, a, b, c, d) ((s & ~c) << d) ^ (s >> a) ^ (s >> b)
state->s1 = TAUSWORTHE(state->s1, 7, 13, 1, 18);
state->s2 = TAUSWORTHE(state->s2, 25, 27, 7, 2);
state->s3 = TAUSWORTHE(state->s3, 8, 21, 15, 7);
state->s4 = TAUSWORTHE(state->s4, 9, 12, 127, 13);
which makes it clear that some low bits of each 's' get discarded reducing
the length of each CRC to (I think) 31, 29, 28 and 25.
Since 'b + d' matches the bits discarded by 'c', two of those shifts are
actually just a rotate, so there isn't really much 'bit stirring' going on.
By comparison CRC-16 (for hdlc comms like x25, isdn and ss7) reduces to:
u32 crc_step(u32 crc, u8 byte_val)
{
u8 t = crc ^ byte_val;
t = (t ^ t << 4);
return crc >> 8 ^ t << 8 ^ t << 3 ^ t >> 4;
}
Much more 'stirring'.
David
WARNING: multiple messages have this Message-ID (diff)
From: David Laight <david.laight.linux@gmail.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Huacai Chen <chenhuacai@kernel.org>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Paul Walmsley <pjw@kernel.org>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Kees Cook <kees@kernel.org>,
"Gustavo A. R. Silva" <gustavoars@kernel.org>,
Arnd Bergmann <arnd@arndb.de>,
Mark Rutland <mark.rutland@arm.com>,
"Jason A. Donenfeld" <Jason@zx2c4.com>,
Ard Biesheuvel <ardb@kernel.org>,
Jeremy Linton <jeremy.linton@arm.com>,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, linux-hardening@vger.kernel.org
Subject: Re: [PATCH v3 3/3] randomize_kstack: Unify random source across arches
Date: Mon, 5 Jan 2026 14:45:45 +0000 [thread overview]
Message-ID: <20260105144545.45f2b0ba@pumpkin> (raw)
In-Reply-To: <60c5d7b1-1ab7-490c-8cb8-dfd50cf23856@arm.com>
On Mon, 5 Jan 2026 11:05:18 +0000
Ryan Roberts <ryan.roberts@arm.com> wrote:
> On 04/01/2026 23:01, David Laight wrote:
> > On Fri, 2 Jan 2026 13:11:54 +0000
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >
> >> Previously different architectures were using random sources of
> >> differing strength and cost to decide the random kstack offset. A number
> >> of architectures (loongarch, powerpc, s390, x86) were using their
> >> timestamp counter, at whatever the frequency happened to be. Other
> >> arches (arm64, riscv) were using entropy from the crng via
> >> get_random_u16().
> >>
> >> There have been concerns that in some cases the timestamp counters may
> >> be too weak, because they can be easily guessed or influenced by user
> >> space. And get_random_u16() has been shown to be too costly for the
> >> level of protection kstack offset randomization provides.
> >>
> >> So let's use a common, architecture-agnostic source of entropy; a
> >> per-cpu prng, seeded at boot-time from the crng. This has a few
> >> benefits:
> >>
> >> - We can remove choose_random_kstack_offset(); That was only there to
> >> try to make the timestamp counter value a bit harder to influence
> >> from user space.
> >>
> >> - The architecture code is simplified. All it has to do now is call
> >> add_random_kstack_offset() in the syscall path.
> >>
> >> - The strength of the randomness can be reasoned about independently
> >> of the architecture.
> >>
> >> - Arches previously using get_random_u16() now have much faster
> >> syscall paths, see below results.
> >>
> >> There have been some claims that a prng may be less strong than the
> >> timestamp counter if not regularly reseeded. But the prng has a period
> >> of about 2^113. So as long as the prng state remains secret, it should
> >> not be possible to guess. If the prng state can be accessed, we have
> >> bigger problems.
> >
> > If you have 128 bits of output from consecutive outputs I think you
> > can trivially determine the full state using (almost) 'school boy' maths
> > that could be done on pencil and paper.
> > (Most of the work only has to be done once.)
> >
> > The underlying problem is that the TAUSWORTHE() transformation is 'linear'
> > So that TAUSWORTHE(x ^ y) == TAUSWORTHE(x) ^ TAUSWORTHE(y).
> > (This is true of a LFSR/CRC and TOUSWORTH() is doing some subset of CRCs.)
> > This means that each output bit is the 'xor' of some of the input bits.
> > The four new 'state' values are just xor of the the bits of the old ones.
> > The final xor of the four states gives a 32bit value with each bit just
> > an xor of some of the 128 state bits.
> > Get four consecutive 32 bit values and you can solve the 128 simultaneous
> > equations (by trivial substitution) and get the initial state.
> > The solution gives you the 128 128bit constants for:
> > u128 state = 0;
> > u128 val = 'value returned from 4 calls';
> > for (int i = 0; i < 128; i++)
> > state |= parity(const128[i] ^ val) << i;
>
> What is const128[] here?
Some values you prepared earlier :-)
> > You done need all 32bits, just accumulate 128 bits.
> > So if you can get the 5bit stack offset from 26 system calls you know the
> > value that will be used for all the subsequent calls.
>
> It's not immediately obvious to me how user space would do this, but I'll take
> it on faith that it may be possible.
It shouldn't be possible, but anything that leaks a stack address would
give it away.
It is also pretty much why you care about the cycle length of the PRNG.
(If the length is short a rogue application can remember all the values.)
> >
> > Simply changing the final line to use + not ^ makes the output non-linear
> > and solving the equations a lot harder.
>
> There has been pushback on introducing new primitives [1] but I don't think
> that's a reason not to considder it.
That is a more general issue with the PRNG.
ISTR it was true for the previous version that explicitly used four CRC.
Jason should know more about whether the xor are a good idea.
>
> [1] https://lore.kernel.org/all/aRyppb8PCxFKVphr@zx2c4.com/
>
> >
> > I might sit down tomorrow and see if I can actually code it...
>
> Thanks for the analysis! I look forward to seeing your conclusion... although
> I'm not sure I'll be qualified to evaluate it mathematically.
I need to drag out the brian cells from when I learnt about CRC (actually
relating to burst error correction) over 40 years ago...
> FWIW, I previously had a go at various schemes using siphash to calculate some
> random bits. I found it to be significantly slower than this prng. I've so far
> taken the view that 6 bits of randomness is not much of a defence against brute
> force so we really shouldn't be spending too many cycles to generate the bits.
> If we can get to approach to work, I think that's best.
Indeed.
A single 32bit CRC using (crc + (crc >> 16)) & 0x3f could be 'good enough'.
Especially if the value is 'perturbed' during (say) context switch.
The '16' might need adjusting for the actual CRC, especially if TAUSWORTHE()
is used - you don't want the value to match one of the shifts it uses.
prandom_u32_state() is defined as:
#define TAUSWORTHE(s, a, b, c, d) ((s & c) << d) ^ (((s << a) ^ s) >> b)
state->s1 = TAUSWORTHE(state->s1, 6U, 13U, 4294967294U, 18U);
state->s2 = TAUSWORTHE(state->s2, 2U, 27U, 4294967288U, 2U);
state->s3 = TAUSWORTHE(state->s3, 13U, 21U, 4294967280U, 7U);
state->s4 = TAUSWORTHE(state->s4, 3U, 12U, 4294967168U, 13U);
return (state->s1 ^ state->s2 ^ state->s3 ^ state->s4);
This is equivalent to:
#define TAUSWORTHE(s, a, b, c, d) ((s & ~c) << d) ^ (s >> a) ^ (s >> b)
state->s1 = TAUSWORTHE(state->s1, 7, 13, 1, 18);
state->s2 = TAUSWORTHE(state->s2, 25, 27, 7, 2);
state->s3 = TAUSWORTHE(state->s3, 8, 21, 15, 7);
state->s4 = TAUSWORTHE(state->s4, 9, 12, 127, 13);
which makes it clear that some low bits of each 's' get discarded reducing
the length of each CRC to (I think) 31, 29, 28 and 25.
Since 'b + d' matches the bits discarded by 'c', two of those shifts are
actually just a rotate, so there isn't really much 'bit stirring' going on.
By comparison CRC-16 (for hdlc comms like x25, isdn and ss7) reduces to:
u32 crc_step(u32 crc, u8 byte_val)
{
u8 t = crc ^ byte_val;
t = (t ^ t << 4);
return crc >> 8 ^ t << 8 ^ t << 3 ^ t >> 4;
}
Much more 'stirring'.
David
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2026-01-05 14:45 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-02 13:11 [PATCH v3 0/3] Fix bugs and performance of kstack offset randomisation Ryan Roberts
2026-01-02 13:11 ` Ryan Roberts
2026-01-02 13:11 ` [PATCH v3 1/3] randomize_kstack: Maintain kstack_offset per task Ryan Roberts
2026-01-02 13:11 ` Ryan Roberts
2026-01-02 22:44 ` David Laight
2026-01-02 22:44 ` David Laight
2026-01-05 10:30 ` Ryan Roberts
2026-01-05 10:30 ` Ryan Roberts
2026-01-19 10:23 ` Mark Rutland
2026-01-19 10:23 ` Mark Rutland
2026-01-02 13:11 ` [PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline Ryan Roberts
2026-01-02 13:11 ` Ryan Roberts
2026-01-02 13:39 ` Jason A. Donenfeld
2026-01-02 13:39 ` Jason A. Donenfeld
2026-01-02 14:09 ` Ryan Roberts
2026-01-02 14:09 ` Ryan Roberts
2026-01-03 8:00 ` Christophe Leroy (CS GROUP)
2026-01-03 8:00 ` Christophe Leroy (CS GROUP)
2026-01-05 10:36 ` Ryan Roberts
2026-01-05 10:36 ` Ryan Roberts
2026-01-03 10:46 ` David Laight
2026-01-03 10:46 ` David Laight
2026-01-05 10:34 ` Ryan Roberts
2026-01-05 10:34 ` Ryan Roberts
2026-01-02 22:54 ` David Laight
2026-01-02 22:54 ` David Laight
2026-01-19 10:26 ` Mark Rutland
2026-01-19 10:26 ` Mark Rutland
2026-01-02 13:11 ` [PATCH v3 3/3] randomize_kstack: Unify random source across arches Ryan Roberts
2026-01-02 13:11 ` Ryan Roberts
2026-01-04 23:01 ` David Laight
2026-01-04 23:01 ` David Laight
2026-01-05 11:05 ` Ryan Roberts
2026-01-05 11:05 ` Ryan Roberts
2026-01-05 14:45 ` David Laight [this message]
2026-01-05 14:45 ` David Laight
2026-01-07 14:05 ` David Laight
2026-01-07 14:05 ` David Laight
2026-01-12 12:26 ` Ryan Roberts
2026-01-12 12:26 ` Ryan Roberts
2026-01-12 13:36 ` David Laight
2026-01-12 13:36 ` David Laight
2026-01-19 10:48 ` Mark Rutland
2026-01-19 10:48 ` Mark Rutland
2026-01-19 10:52 ` [PATCH v3 0/3] Fix bugs and performance of kstack offset randomisation Mark Rutland
2026-01-19 10:52 ` Mark Rutland
2026-01-19 12:22 ` David Laight
2026-01-19 12:22 ` David Laight
2026-01-19 12:58 ` Ryan Roberts
2026-01-19 12:58 ` Ryan Roberts
2026-01-19 12:59 ` Ryan Roberts
2026-01-19 12:59 ` Ryan Roberts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260105144545.45f2b0ba@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=Jason@zx2c4.com \
--cc=agordeev@linux.ibm.com \
--cc=aou@eecs.berkeley.edu \
--cc=ardb@kernel.org \
--cc=arnd@arndb.de \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=chenhuacai@kernel.org \
--cc=dave.hansen@linux.intel.com \
--cc=gor@linux.ibm.com \
--cc=gustavoars@kernel.org \
--cc=hca@linux.ibm.com \
--cc=jeremy.linton@arm.com \
--cc=kees@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=loongarch@lists.linux.dev \
--cc=maddy@linux.ibm.com \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=palmer@dabbelt.com \
--cc=pjw@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.