From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: SeongJae Park <sj@kernel.org>
Cc: damon@lists.linux.dev, Jiayuan Chen <jiayuan.chen@shopee.com>,
Andrew Morton <akpm@linux-foundation.org>,
Shu Anzai <shu17az@gmail.com>,
Quanmin Yan <yanquanmin1@huawei.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm/damon: replace damon_rand() with a per-ctx lockless PRNG
Date: Thu, 7 May 2026 07:41:11 +0800 [thread overview]
Message-ID: <d053e344-7643-4504-b766-a04f2b4a6569@linux.dev> (raw)
In-Reply-To: <20260506163701.6288-1-sj@kernel.org>
Hello SJ,
On 5/7/26 12:37 AM, SeongJae Park wrote:
> Hello Jiayuan,
>
>
> Thank you for posting this second version!
>
> On Tue, 5 May 2026 22:52:06 +0800 Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
>> From: Jiayuan Chen <jiayuan.chen@shopee.com>
>>
>> damon_rand() on the sampling_addr hot path called
>> get_random_u32_below(), which takes a local_lock_irqsave() around a
>> per-CPU batched entropy pool and periodically refills it with
>> ChaCha20. At elevated nr_regions counts (20k+), the lock_acquire /
>> local_lock pair plus __get_random_u32_below() dominate kdamond perf
>> profiles.
>>
>> Replace the helper with a lockless lfsr113 generator (struct rnd_state)
>> held per damon_ctx and seeded from get_random_u64() in damon_new_ctx().
>> kdamond is the single consumer of a given ctx, so no synchronization
>> is required. Range mapping uses Lemire's (u64)rnd * span >> 32 on the
>> fast path;
> Could we add some links to the algorithm? I found a blog [1] from Google and
> guessing that is what you're saying, but I'm not really sure since I failed at
> finding time to thoroughtly read it.
The link covers a related family of techniques, but I don't
think we need an external reference. The pattern is just the
multiply-shift fast path that the kernel already uses in
__get_random_u32_below() (drivers/char/random.c) -- (u64)rnd *
span >> 32 -- which the existing comment there calls "traditional
reciprocal multiplication". Of course I can add simple comment like
"traditional reciprocal multiplication, similar as get_random_u32_below()"
>> for spans larger than U32_MAX (only reachable on 64-bit) the
>> slow path combines two u32 outputs and uses mul_u64_u64_shr() at 64-bit
>> width. On 32-bit the slow path is dead code and gets eliminated by
>> the compiler.
>>
>> The new helper takes a ctx parameter; damon_split_regions_of() and
>> the kunit tests that call it directly are updated accordingly.
>>
>> lfsr113 is a linear PRNG and MUST NOT be used for anything
>> security-sensitive. DAMON's sampling_addr is not exposed to userspace
>> and is only consumed as a probe point for PTE accessed-bit sampling,
>> so a non-cryptographic PRNG is appropriate here.
>>
>> Tested with paddr monitoring and max_nr_regions=20000: kdamond CPU
>> usage reduced from ~72% to ~50% of one core.
>>
>> Link: https://lore.kernel.org/damon/20260426173346.86238-1-sj@kernel.org/T/#m4f1fd74112728f83a41511e394e8c3fef703039c
> Why are you adding the above link? It would be good to add description of the
> link on the commit message.
>
> Other than that, commit message looks good to me.
Will move it to the patch changelog (after ---) instead of the
commit message
>> Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
>> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
>> ---
>> include/linux/damon.h | 27 +++++++++++++++++++++------
>> mm/damon/core.c | 12 ++++++++----
>> mm/damon/paddr.c | 8 ++++----
>> mm/damon/tests/core-kunit.h | 28 ++++++++++++++++++++++------
>> mm/damon/vaddr.c | 7 ++++---
>> 5 files changed, 59 insertions(+), 23 deletions(-)
>>
>> diff --git a/include/linux/damon.h b/include/linux/damon.h
>> index f2cdb7c3f5e6..e16012a7f41a 100644
>> --- a/include/linux/damon.h
>> +++ b/include/linux/damon.h
>> @@ -8,8 +8,10 @@
>> #ifndef _DAMON_H_
>> #define _DAMON_H_
>>
>> +#include <linux/math64.h>
>> #include <linux/memcontrol.h>
>> #include <linux/mutex.h>
>> +#include <linux/prandom.h>
>> #include <linux/time64.h>
>> #include <linux/types.h>
>> #include <linux/random.h>
> Maybe we can remove random.h?
Right !
>> @@ -19,12 +21,6 @@
>> /* Max priority score for DAMON-based operation schemes */
>> #define DAMOS_MAX_SCORE (99)
>>
>> -/* Get a random number in [l, r) */
>> -static inline unsigned long damon_rand(unsigned long l, unsigned long r)
>> -{
>> - return l + get_random_u32_below(r - l);
>> -}
>> -
>> /**
>> * struct damon_addr_range - Represents an address region of [@start, @end).
>> * @start: Start address of the region (inclusive).
>> @@ -843,8 +839,27 @@ struct damon_ctx {
>>
>> struct list_head adaptive_targets;
>> struct list_head schemes;
>> +
>> + /* Per-ctx PRNG state for damon_rand(); kdamond is the sole consumer. */
>> + struct rnd_state rnd_state;
>> };
>>
>> +/* Get a random number in [@l, @r) using @ctx's lockless PRNG. */
>> +static inline unsigned long damon_rand(struct damon_ctx *ctx,
>> + unsigned long l, unsigned long r)
>> +{
>> + unsigned long span = r - l;
>> + u64 rnd;
>> +
>> + if (span <= U32_MAX) {
>> + rnd = prandom_u32_state(&ctx->rnd_state);
>> + return l + (unsigned long)((rnd * span) >> 32);
>> + }
>> + rnd = ((u64)prandom_u32_state(&ctx->rnd_state) << 32) |
>> + prandom_u32_state(&ctx->rnd_state);
>> + return l + mul_u64_u64_shr(rnd, span, 64);
>> +}
>> +
> I was unable to find time to thoroughly read the link I found, or understand
> the algorithm with only the code, so tested by myself like below.
>
> '''
> $ cat ./foo.c
> #include <stdio.h>
> #include <stdbool.h>
> #include <stdlib.h>
> #include <stdint.h>
>
> int main(int argc, char *argv[])
> {
> unsigned long span;
> int i;
>
> if (argc != 2) {
> printf("Usge: %s <span>\n", argv[0]);
> return -1;
> }
> span = atoi(argv[1]);
>
> for (i = 0; i < 1000; i++) {
> uint64_t rnd = rand();
>
> printf("%lu\n", (uint64_t)((rnd * span) >> 32));
> }
> return 0;
> }
> $ gcc ./foo.c
> $ ./a.out 10 | sort -n | uniq --count
> 195 0
> 194 1
> 188 2
> 223 3
> 200 4
> '''
>
> So I expected the program to generate number 0-9 randomly with similar
> proportions. But it is generating only numbers 0-4. I confirmed doubling
> 'span' value makes it work in my expected way. Should we do that here, too?
I wrote a kmod to verify. The helper is a verbatim copy of
damon_rand() from the patch:
static inline unsigned long damon_rand_test(struct rnd_state *state,
unsigned long l, unsigned
long r)
{
unsigned long span = r - l;
u64 rnd;
if (span <= U32_MAX) {
rnd = prandom_u32_state(state);
return l + (unsigned long)((rnd * span) >> 32);
}
rnd = ((u64)prandom_u32_state(state) << 32) |
prandom_u32_state(state);
return l + mul_u64_u64_shr(rnd, span, 64);
}
static int __init damon_rand_test_init(void)
{
struct rnd_state state;
unsigned long counts[SPAN] = {0};
unsigned long i, v;
int j;
prandom_seed_state(&state, get_random_u64());
for (i = 0; i < ITERATIONS; i++) {
v = damon_rand_test(&state, 0, SPAN);
if (v >= SPAN) {
pr_err("out of range: %lu\n", v);
return -EINVAL;
}
counts[v]++;
}
pr_info("damon_rand_test: span=%lu iterations=%lu\n",
SPAN, ITERATIONS);
for (j = 0; j < SPAN; j++)
pr_info("damon_rand_test: %d -> %lu\n", j, counts[j]);
return 0;
}
dmesg output (span=10, 1000 iterations):
damon_rand_test: span=10 iterations=1000
damon_rand_test: 0 -> 99
damon_rand_test: 1 -> 107
damon_rand_test: 2 -> 79
damon_rand_test: 3 -> 94
damon_rand_test: 4 -> 97
damon_rand_test: 5 -> 97
damon_rand_test: 6 -> 94
damon_rand_test: 7 -> 100
damon_rand_test: 8 -> 114
damon_rand_test: 9 -> 119
All 10 buckets get hits, roughly uniform.
Looking into the usrspace test program, the issue is that libc rand()
returns a value in [0, RAND_MAX]. On glibc RAND_MAX is
2^31 - 1 (not 2^32 - 1), so the upper bit is missing and
the multiply-shift only fills the lower half of the output
range. prandom_u32_state() returns a full u32 in [0, 2^32),
so the in-kernel formula behaves correctly without doubling.
> [...]
>
> Other than above, code changes look good to me.
>
> [1] https://lemire.me/blog/2024/08/17/faster-random-integer-generation-with-batching/
>
>
> Thanks,
> SJ
next prev parent reply other threads:[~2026-05-06 23:41 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 14:52 [PATCH v2] mm/damon: replace damon_rand() with a per-ctx lockless PRNG Jiayuan Chen
2026-05-06 16:37 ` SeongJae Park
2026-05-06 23:41 ` Jiayuan Chen [this message]
2026-05-07 8:10 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d053e344-7643-4504-b766-a04f2b4a6569@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=damon@lists.linux.dev \
--cc=jiayuan.chen@shopee.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shu17az@gmail.com \
--cc=sj@kernel.org \
--cc=yanquanmin1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox