From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Charlie Jenkins <thecharlesjenkins@gmail.com>
Cc: "Thomas Gleixner" <tglx@kernel.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"Paul Walmsley" <pjw@kernel.org>,
"Palmer Dabbelt" <palmer@dabbelt.com>,
"Albert Ou" <aou@eecs.berkeley.edu>,
"Guo Ren" <guoren@kernel.org>,
"Darren Hart" <dvhart@infradead.org>,
"Davidlohr Bueso" <dave@stgolabs.net>,
"André Almeida" <andrealmeid@igalia.com>,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-s390@vger.kernel.org, linux-riscv@lists.infradead.org,
linux-arm-kernel@lists.infradead.org,
"Alexandre Ghiti" <alex@ghiti.fr>,
"Charlie Jenkins" <charlie@rivosinc.com>,
"Jisheng Zhang" <jszhang@kernel.org>,
"Charles Mirabile" <cmirabil@redhat.com>
Subject: Re: [PATCH v4 5/8] riscv/runtime-const: Introduce runtime_const_mask_32()
Date: Tue, 23 Jun 2026 11:43:39 +0530 [thread overview]
Message-ID: <ff9678fb-4cca-4849-8ffb-7cb76db60e1a@amd.com> (raw)
In-Reply-To: <178219229643.10927.7189200920480581019.b4-review@b4>
Hello Charlie,
On 6/23/2026 10:54 AM, Charlie Jenkins wrote:
> On Thu, 30 Apr 2026 09:47:27 +0000, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>> Futex hash computation requires a mask operation with read-only after
>> init data that will be converted to a runtime constant in the subsequent
>> commit.
>>
>> Introduce runtime_const_mask_32 to further optimize the mask operation
>> in the futex hash computation hot path. GCC generates a:
>>
>> lui a0, 0x12346 # upper; +0x800 then >>12 for correct rounding
>> addi a0, a0, 0x678 # lower 12 bits
>> and a1, a1, a0 # a1 = a1 & a0
>>
>> pattern to tackle arbitrary 32-bit masks and the same was also suggested
>> by Claude which is implemented here. The final (__ret & val) operation
>> is intentionally placed outside of asm block to allow compilers to
>> further optimize it if possible.
>
> If the mask fits in 12 bits, we can nop the lui and the addi and just
> patch an "andi" instruction with the 12 bits of the mask. We already do
> this with the lui+addi block and nop the lui if val fits in 12 bits. I
> would be happy to help draft that optimization.
>
> But I think the better solution would be to take the power of 2
> assumption since that will also benefit arm. We should still only emit
> an andi if val fits in 12 bits, but if it doesn't we can patch in
> shifts:
>
> slli a0,a0,x
> srli a0,a0,x
>
> Where x is the constant (arch_size - _futex_shift - 1)
I can do that for the next version and use ubfx for ARM. I can just put
in a BUG_ON() at the arch/ specific __runtime_fixup_mask() and if a
new use case arises which hits that, we can perhaps move on the dynamic
nop patching scheme that you mentioned earlier.
Let me know if that works and I can pivot to that scheme in v5 and send
it out post -rc1 after some testing.
--
Thanks and Regards,
Prateek
WARNING: multiple messages have this Message-ID (diff)
From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Charlie Jenkins <thecharlesjenkins@gmail.com>
Cc: "Thomas Gleixner" <tglx@kernel.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"Paul Walmsley" <pjw@kernel.org>,
"Palmer Dabbelt" <palmer@dabbelt.com>,
"Albert Ou" <aou@eecs.berkeley.edu>,
"Guo Ren" <guoren@kernel.org>,
"Darren Hart" <dvhart@infradead.org>,
"Davidlohr Bueso" <dave@stgolabs.net>,
"André Almeida" <andrealmeid@igalia.com>,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-s390@vger.kernel.org, linux-riscv@lists.infradead.org,
linux-arm-kernel@lists.infradead.org,
"Alexandre Ghiti" <alex@ghiti.fr>,
"Charlie Jenkins" <charlie@rivosinc.com>,
"Jisheng Zhang" <jszhang@kernel.org>,
"Charles Mirabile" <cmirabil@redhat.com>
Subject: Re: [PATCH v4 5/8] riscv/runtime-const: Introduce runtime_const_mask_32()
Date: Tue, 23 Jun 2026 11:43:39 +0530 [thread overview]
Message-ID: <ff9678fb-4cca-4849-8ffb-7cb76db60e1a@amd.com> (raw)
In-Reply-To: <178219229643.10927.7189200920480581019.b4-review@b4>
Hello Charlie,
On 6/23/2026 10:54 AM, Charlie Jenkins wrote:
> On Thu, 30 Apr 2026 09:47:27 +0000, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>> Futex hash computation requires a mask operation with read-only after
>> init data that will be converted to a runtime constant in the subsequent
>> commit.
>>
>> Introduce runtime_const_mask_32 to further optimize the mask operation
>> in the futex hash computation hot path. GCC generates a:
>>
>> lui a0, 0x12346 # upper; +0x800 then >>12 for correct rounding
>> addi a0, a0, 0x678 # lower 12 bits
>> and a1, a1, a0 # a1 = a1 & a0
>>
>> pattern to tackle arbitrary 32-bit masks and the same was also suggested
>> by Claude which is implemented here. The final (__ret & val) operation
>> is intentionally placed outside of asm block to allow compilers to
>> further optimize it if possible.
>
> If the mask fits in 12 bits, we can nop the lui and the addi and just
> patch an "andi" instruction with the 12 bits of the mask. We already do
> this with the lui+addi block and nop the lui if val fits in 12 bits. I
> would be happy to help draft that optimization.
>
> But I think the better solution would be to take the power of 2
> assumption since that will also benefit arm. We should still only emit
> an andi if val fits in 12 bits, but if it doesn't we can patch in
> shifts:
>
> slli a0,a0,x
> srli a0,a0,x
>
> Where x is the constant (arch_size - _futex_shift - 1)
I can do that for the next version and use ubfx for ARM. I can just put
in a BUG_ON() at the arch/ specific __runtime_fixup_mask() and if a
new use case arises which hits that, we can perhaps move on the dynamic
nop patching scheme that you mentioned earlier.
Let me know if that works and I can pivot to that scheme in v5 and send
it out post -rc1 after some testing.
--
Thanks and Regards,
Prateek
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2026-06-23 6:13 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 9:47 [PATCH v4 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-04-30 9:47 ` [PATCH v4 1/8] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-04-30 9:47 ` [PATCH v4 2/8] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-05-06 15:28 ` Catalin Marinas
2026-05-06 15:28 ` Catalin Marinas
2026-04-30 9:47 ` [PATCH v4 3/8] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-05-06 15:37 ` Catalin Marinas
2026-05-06 15:37 ` Catalin Marinas
2026-06-23 5:24 ` Charlie Jenkins
2026-06-23 5:24 ` Charlie Jenkins
2026-04-30 9:47 ` [PATCH v4 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-06-23 5:24 ` Charlie Jenkins
2026-06-23 5:24 ` Charlie Jenkins
2026-04-30 9:47 ` [PATCH v4 5/8] riscv/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-05-19 7:33 ` K Prateek Nayak
2026-05-19 7:33 ` K Prateek Nayak
2026-06-23 5:24 ` Charlie Jenkins
2026-06-23 5:24 ` Charlie Jenkins
2026-06-23 6:13 ` K Prateek Nayak [this message]
2026-06-23 6:13 ` K Prateek Nayak
2026-06-23 7:01 ` Charlie Jenkins
2026-06-23 7:01 ` Charlie Jenkins
2026-04-30 9:47 ` [PATCH v4 6/8] s390/runtime-const: " K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-04-30 9:47 ` [PATCH v4 7/8] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-04-30 9:47 ` [PATCH v4 8/8] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ff9678fb-4cca-4849-8ffb-7cb76db60e1a@amd.com \
--to=kprateek.nayak@amd.com \
--cc=alex@ghiti.fr \
--cc=andrealmeid@igalia.com \
--cc=aou@eecs.berkeley.edu \
--cc=bigeasy@linutronix.de \
--cc=charlie@rivosinc.com \
--cc=cmirabil@redhat.com \
--cc=dave@stgolabs.net \
--cc=dvhart@infradead.org \
--cc=guoren@kernel.org \
--cc=jszhang@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=palmer@dabbelt.com \
--cc=peterz@infradead.org \
--cc=pjw@kernel.org \
--cc=tglx@kernel.org \
--cc=thecharlesjenkins@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.