From: Charlie Jenkins <thecharlesjenkins@gmail.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: "Thomas Gleixner" <tglx@kernel.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"Paul Walmsley" <pjw@kernel.org>,
"Palmer Dabbelt" <palmer@dabbelt.com>,
"Albert Ou" <aou@eecs.berkeley.edu>,
"Guo Ren" <guoren@kernel.org>,
"Darren Hart" <dvhart@infradead.org>,
"Davidlohr Bueso" <dave@stgolabs.net>,
"André Almeida" <andrealmeid@igalia.com>,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-s390@vger.kernel.org, linux-riscv@lists.infradead.org,
linux-arm-kernel@lists.infradead.org,
"Alexandre Ghiti" <alex@ghiti.fr>,
"Charlie Jenkins" <charlie@rivosinc.com>,
"Jisheng Zhang" <jszhang@kernel.org>,
"Charles Mirabile" <cmirabil@redhat.com>
Subject: Re: [PATCH v4 5/8] riscv/runtime-const: Introduce runtime_const_mask_32()
Date: Tue, 23 Jun 2026 00:01:08 -0700 [thread overview]
Message-ID: <ajovNDH2uo6V4NJx@blinky> (raw)
In-Reply-To: <ff9678fb-4cca-4849-8ffb-7cb76db60e1a@amd.com>
On Tue, Jun 23, 2026 at 11:43:39AM +0530, K Prateek Nayak wrote:
> Hello Charlie,
>
> On 6/23/2026 10:54 AM, Charlie Jenkins wrote:
> > On Thu, 30 Apr 2026 09:47:27 +0000, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> >> Futex hash computation requires a mask operation with read-only after
> >> init data that will be converted to a runtime constant in the subsequent
> >> commit.
> >>
> >> Introduce runtime_const_mask_32 to further optimize the mask operation
> >> in the futex hash computation hot path. GCC generates a:
> >>
> >> lui a0, 0x12346 # upper; +0x800 then >>12 for correct rounding
> >> addi a0, a0, 0x678 # lower 12 bits
> >> and a1, a1, a0 # a1 = a1 & a0
> >>
> >> pattern to tackle arbitrary 32-bit masks and the same was also suggested
> >> by Claude which is implemented here. The final (__ret & val) operation
> >> is intentionally placed outside of asm block to allow compilers to
> >> further optimize it if possible.
> >
> > If the mask fits in 12 bits, we can nop the lui and the addi and just
> > patch an "andi" instruction with the 12 bits of the mask. We already do
> > this with the lui+addi block and nop the lui if val fits in 12 bits. I
> > would be happy to help draft that optimization.
> >
> > But I think the better solution would be to take the power of 2
> > assumption since that will also benefit arm. We should still only emit
> > an andi if val fits in 12 bits, but if it doesn't we can patch in
> > shifts:
> >
> > slli a0,a0,x
> > srli a0,a0,x
> >
> > Where x is the constant (arch_size - _futex_shift - 1)
>
> I can do that for the next version and use ubfx for ARM. I can just put
> in a BUG_ON() at the arch/ specific __runtime_fixup_mask() and if a
> new use case arises which hits that, we can perhaps move on the dynamic
> nop patching scheme that you mentioned earlier.
>
> Let me know if that works and I can pivot to that scheme in v5 and send
> it out post -rc1 after some testing.
That sounds like a great plan :)
- Charlie
>
> --
> Thanks and Regards,
> Prateek
>
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
WARNING: multiple messages have this Message-ID (diff)
From: Charlie Jenkins <thecharlesjenkins@gmail.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: "Thomas Gleixner" <tglx@kernel.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"Paul Walmsley" <pjw@kernel.org>,
"Palmer Dabbelt" <palmer@dabbelt.com>,
"Albert Ou" <aou@eecs.berkeley.edu>,
"Guo Ren" <guoren@kernel.org>,
"Darren Hart" <dvhart@infradead.org>,
"Davidlohr Bueso" <dave@stgolabs.net>,
"André Almeida" <andrealmeid@igalia.com>,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-s390@vger.kernel.org, linux-riscv@lists.infradead.org,
linux-arm-kernel@lists.infradead.org,
"Alexandre Ghiti" <alex@ghiti.fr>,
"Charlie Jenkins" <charlie@rivosinc.com>,
"Jisheng Zhang" <jszhang@kernel.org>,
"Charles Mirabile" <cmirabil@redhat.com>
Subject: Re: [PATCH v4 5/8] riscv/runtime-const: Introduce runtime_const_mask_32()
Date: Tue, 23 Jun 2026 00:01:08 -0700 [thread overview]
Message-ID: <ajovNDH2uo6V4NJx@blinky> (raw)
In-Reply-To: <ff9678fb-4cca-4849-8ffb-7cb76db60e1a@amd.com>
On Tue, Jun 23, 2026 at 11:43:39AM +0530, K Prateek Nayak wrote:
> Hello Charlie,
>
> On 6/23/2026 10:54 AM, Charlie Jenkins wrote:
> > On Thu, 30 Apr 2026 09:47:27 +0000, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> >> Futex hash computation requires a mask operation with read-only after
> >> init data that will be converted to a runtime constant in the subsequent
> >> commit.
> >>
> >> Introduce runtime_const_mask_32 to further optimize the mask operation
> >> in the futex hash computation hot path. GCC generates a:
> >>
> >> lui a0, 0x12346 # upper; +0x800 then >>12 for correct rounding
> >> addi a0, a0, 0x678 # lower 12 bits
> >> and a1, a1, a0 # a1 = a1 & a0
> >>
> >> pattern to tackle arbitrary 32-bit masks and the same was also suggested
> >> by Claude which is implemented here. The final (__ret & val) operation
> >> is intentionally placed outside of asm block to allow compilers to
> >> further optimize it if possible.
> >
> > If the mask fits in 12 bits, we can nop the lui and the addi and just
> > patch an "andi" instruction with the 12 bits of the mask. We already do
> > this with the lui+addi block and nop the lui if val fits in 12 bits. I
> > would be happy to help draft that optimization.
> >
> > But I think the better solution would be to take the power of 2
> > assumption since that will also benefit arm. We should still only emit
> > an andi if val fits in 12 bits, but if it doesn't we can patch in
> > shifts:
> >
> > slli a0,a0,x
> > srli a0,a0,x
> >
> > Where x is the constant (arch_size - _futex_shift - 1)
>
> I can do that for the next version and use ubfx for ARM. I can just put
> in a BUG_ON() at the arch/ specific __runtime_fixup_mask() and if a
> new use case arises which hits that, we can perhaps move on the dynamic
> nop patching scheme that you mentioned earlier.
>
> Let me know if that works and I can pivot to that scheme in v5 and send
> it out post -rc1 after some testing.
That sounds like a great plan :)
- Charlie
>
> --
> Thanks and Regards,
> Prateek
>
next prev parent reply other threads:[~2026-06-23 7:01 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 9:47 [PATCH v4 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-04-30 9:47 ` [PATCH v4 1/8] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-04-30 9:47 ` [PATCH v4 2/8] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-05-06 15:28 ` Catalin Marinas
2026-05-06 15:28 ` Catalin Marinas
2026-04-30 9:47 ` [PATCH v4 3/8] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-05-06 15:37 ` Catalin Marinas
2026-05-06 15:37 ` Catalin Marinas
2026-06-23 5:24 ` Charlie Jenkins
2026-06-23 5:24 ` Charlie Jenkins
2026-04-30 9:47 ` [PATCH v4 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-06-23 5:24 ` Charlie Jenkins
2026-06-23 5:24 ` Charlie Jenkins
2026-04-30 9:47 ` [PATCH v4 5/8] riscv/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-05-19 7:33 ` K Prateek Nayak
2026-05-19 7:33 ` K Prateek Nayak
2026-06-23 5:24 ` Charlie Jenkins
2026-06-23 5:24 ` Charlie Jenkins
2026-06-23 6:13 ` K Prateek Nayak
2026-06-23 6:13 ` K Prateek Nayak
2026-06-23 7:01 ` Charlie Jenkins [this message]
2026-06-23 7:01 ` Charlie Jenkins
2026-04-30 9:47 ` [PATCH v4 6/8] s390/runtime-const: " K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-04-30 9:47 ` [PATCH v4 7/8] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
2026-04-30 9:47 ` [PATCH v4 8/8] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
2026-04-30 9:47 ` K Prateek Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajovNDH2uo6V4NJx@blinky \
--to=thecharlesjenkins@gmail.com \
--cc=alex@ghiti.fr \
--cc=andrealmeid@igalia.com \
--cc=aou@eecs.berkeley.edu \
--cc=bigeasy@linutronix.de \
--cc=charlie@rivosinc.com \
--cc=cmirabil@redhat.com \
--cc=dave@stgolabs.net \
--cc=dvhart@infradead.org \
--cc=guoren@kernel.org \
--cc=jszhang@kernel.org \
--cc=kprateek.nayak@amd.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=palmer@dabbelt.com \
--cc=peterz@infradead.org \
--cc=pjw@kernel.org \
--cc=tglx@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.