From: "Alexandre Courbot" <acourbot@nvidia.com>
To: "Benno Lossin" <lossin@kernel.org>,
"Miguel Ojeda" <ojeda@kernel.org>,
"Alex Gaynor" <alex.gaynor@gmail.com>,
"Boqun Feng" <boqun.feng@gmail.com>,
"Gary Guo" <gary@garyguo.net>,
"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
"Benno Lossin" <benno.lossin@proton.me>,
"Andreas Hindborg" <a.hindborg@kernel.org>,
"Alice Ryhl" <aliceryhl@google.com>,
"Trevor Gross" <tmgross@umich.edu>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>
Cc: "John Hubbard" <jhubbard@nvidia.com>,
"Ben Skeggs" <bskeggs@nvidia.com>,
"Joel Fernandes" <joelagnelf@nvidia.com>,
"Timur Tabi" <ttabi@nvidia.com>,
"Alistair Popple" <apopple@nvidia.com>,
<linux-kernel@vger.kernel.org>, <rust-for-linux@vger.kernel.org>,
<nouveau@lists.freedesktop.org>,
<dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH v4 04/20] rust: add new `num` module with useful integer operations
Date: Fri, 13 Jun 2025 14:31:02 +0900 [thread overview]
Message-ID: <DAL5QC6RHXQB.GDEGYBUJJAJO@nvidia.com> (raw)
In-Reply-To: <DAKMZL839IBG.1D43UR9NNWZSM@kernel.org>
On Thu Jun 12, 2025 at 11:49 PM JST, Benno Lossin wrote:
> On Thu Jun 12, 2025 at 3:27 PM CEST, Alexandre Courbot wrote:
>> On Thu Jun 12, 2025 at 10:17 PM JST, Alexandre Courbot wrote:
>>> On Wed Jun 4, 2025 at 4:18 PM JST, Benno Lossin wrote:
>>>> On Wed Jun 4, 2025 at 2:05 AM CEST, Alexandre Courbot wrote:
>>>>> On Wed Jun 4, 2025 at 8:02 AM JST, Benno Lossin wrote:
>>>>>> On Mon Jun 2, 2025 at 3:09 PM CEST, Alexandre Courbot wrote:
>>>>>>> On Thu May 29, 2025 at 4:27 PM JST, Benno Lossin wrote:
>>>>>>>> On Thu May 29, 2025 at 3:18 AM CEST, Alexandre Courbot wrote:
>>>>>>>>> On Thu May 29, 2025 at 5:17 AM JST, Benno Lossin wrote:
>>>>>>>>>> On Wed May 21, 2025 at 8:44 AM CEST, Alexandre Courbot wrote:
>>>>>>>>>>> + /// Align `self` up to `alignment`.
>>>>>>>>>>> + ///
>>>>>>>>>>> + /// `alignment` must be a power of 2 for accurate results.
>>>>>>>>>>> + ///
>>>>>>>>>>> + /// Wraps around to `0` if the requested alignment pushes the result above the type's limits.
>>>>>>>>>>> + ///
>>>>>>>>>>> + /// # Examples
>>>>>>>>>>> + ///
>>>>>>>>>>> + /// ```
>>>>>>>>>>> + /// use kernel::num::NumExt;
>>>>>>>>>>> + ///
>>>>>>>>>>> + /// assert_eq!(0x4fffu32.align_up(0x1000), 0x5000);
>>>>>>>>>>> + /// assert_eq!(0x4000u32.align_up(0x1000), 0x4000);
>>>>>>>>>>> + /// assert_eq!(0x0u32.align_up(0x1000), 0x0);
>>>>>>>>>>> + /// assert_eq!(0xffffu16.align_up(0x100), 0x0);
>>>>>>>>>>> + /// assert_eq!(0x4fffu32.align_up(0x0), 0x0);
>>>>>>>>>>> + /// ```
>>>>>>>>>>> + fn align_up(self, alignment: Self) -> Self;
>>>>>>>>>>
>>>>>>>>>> Isn't this `next_multiple_of` [1] (it also allows non power of 2
>>>>>>>>>> inputs).
>>>>>>>>>>
>>>>>>>>>> [1]: https://doc.rust-lang.org/std/primitive.u32.html#method.next_multiple_of
>>>>>>>>>
>>>>>>>>> It is, however the fact that `next_multiple_of` works with non powers of
>>>>>>>>> two also means it needs to perform a modulo operation. That operation
>>>>>>>>> might well be optimized away by the compiler, but ACAICT we have no way
>>>>>>>>> of proving it will always be the case, hence the always-optimal
>>>>>>>>> implementation here.
>>>>>>>>
>>>>>>>> When you use a power of 2 constant, then I'm very sure that it will get
>>>>>>>> optimized [1]. Even with non-powers of 2, you don't get a division [2].
>>>>>>>> If you find some code that is not optimized, then sure add a custom
>>>>>>>> function.
>>>>>>>>
>>>>>>>> [1]: https://godbolt.org/z/57M9e36T3
>>>>>>>> [2]: https://godbolt.org/z/9P4P8zExh
>>>>>>>
>>>>>>> That's impressive and would definitely work well with a constant. But
>>>>>>> when the value is not known at compile-time, the division does occur
>>>>>>> unfortunately: https://godbolt.org/z/WK1bPMeEx
>>>>>>>
>>>>>>> So I think we will still need a kernel-optimized version of these
>>>>>>> alignment functions.
>>>>>>
>>>>>> Hmm what exactly is the use-case for a variable align amount? Could you
>>>>>> store it in const generics?
>>>>>
>>>>> Say you have an IOMMU with support for different pages sizes, the size
>>>>> of a particular page can be decided at runtime.
>>>>>
>>>>>>
>>>>>> If not, there are also these two variants that are more efficient:
>>>>>>
>>>>>> * option: https://godbolt.org/z/ecnb19zaM
>>>>>> * unsafe: https://godbolt.org/z/EqTaGov71
>>>>>>
>>>>>> So if the compiler can infer it from context it still optimizes it :)
>>>>>
>>>>> I think the `Option` (and subsequent `unwrap`) is something we want to
>>>>> avoid on such a common operation.
>>>>
>>>> Makes sense.
>>>>
>>>>>> But yeah to be extra sure, you need your version. By the way, what
>>>>>> happens if `align` is not a power of 2 in your version?
>>>>>
>>>>> It will just return `(self + (self - 1)) & (alignment - 1)`, which will
>>>>> likely be a value you don't want.
>>>>
>>>> So wouldn't it be better to make users validate that they gave a
>>>> power-of-2 alignment?
>>>>
>>>>> So yes, for this particular operation we would prefer to only use powers
>>>>> of 2 as inputs - if we can ensure that then it solves most of our
>>>>> problems (can use `next_multiple_of`, no `Option`, etc).
>>>>>
>>>>> Maybe we can introduce a new integer type that, similarly to `NonZero`,
>>>>> guarantees that the value it stores is a power of 2? Users with const
>>>>> values (90+% of uses) won't see any difference, and if working with a
>>>>> runtime-generated value we will want to validate it anyway...
>>>>
>>>> I like this idea. But it will mean that we have to have a custom
>>>> function that is either standalone and const or in an extension trait :(
>>>> But for this one we can use the name `align_up` :)
>>>>
>>>> Here is a cool idea for the implementation: https://godbolt.org/z/x6navM5WK
>>>
>>> Yeah that's close to what I had in mind. Actually, we can also define
>>> `align_up` and `align_down` within this new type, and these methods can
>>> now be const since they are not implemented via a trait!
>
> That sounds like a good idea.
>
>> ... with one difference though: I would like to avoid the use of
>> `unsafe` for something so basic, so the implementation is close to the C
>> one (using masks and logical operations). I think it's a great
>> demonstration of the compiler's abilities that we can generate an
>> always-optimized version of `next_multiple_of`, but for our use-case it
>> feels like jumping through hoops just to show that we can jump through
>> these hoops. I'll reconsider if there is pushback on v5 though. :)
>
> It's always a balance when to use `unsafe` vs when not to. For me using
> `hint::unreachable` & `next_multiple_of` is much easier to read than
>
> self.wrapping_add(alignment.wrapping_sub(1)).align_down(alignment)
>
> given that `align_down` is
>
> self & !alignment.wrapping_sub(1)
>
> But that is totally due to my lack of experience with raw bit
> operations. I also looked at the resulting assembly again and it seems
> like (not an assembly expert at all :) your safe version produces better
> code: https://godbolt.org/z/qhMbG7Mqd
Thanks for checking it! My x86 assembly literacy dates from a time when
32-bit registers were considered fancy, but it indeed seems to be
slightly more compact and faster. I guess alongside the lack of unsafe
block this makes me favor this version for now.
next prev parent reply other threads:[~2025-06-13 5:31 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 6:44 [PATCH v4 00/20] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
2025-05-21 6:44 ` [PATCH v4 01/20] rust: dma: expose the count and size of CoherentAllocation Alexandre Courbot
2025-05-21 8:00 ` Danilo Krummrich
2025-05-22 5:24 ` Alexandre Courbot
2025-05-21 12:43 ` Boqun Feng
2025-05-21 15:57 ` Joel Fernandes
2025-05-21 15:59 ` Joel Fernandes
2025-05-22 5:29 ` Alexandre Courbot
2025-06-02 9:24 ` Danilo Krummrich
2025-05-21 6:44 ` [PATCH v4 02/20] rust: make ETIMEDOUT error available Alexandre Courbot
2025-05-21 7:27 ` Benno Lossin
2025-05-21 6:44 ` [PATCH v4 03/20] rust: sizes: add constants up to SZ_2G Alexandre Courbot
2025-05-21 12:45 ` Boqun Feng
2025-05-21 6:44 ` [PATCH v4 04/20] rust: add new `num` module with useful integer operations Alexandre Courbot
2025-05-22 4:00 ` Alexandre Courbot
2025-05-22 8:44 ` Miguel Ojeda
2025-05-22 9:31 ` Alexandre Courbot
2025-05-28 19:56 ` Alice Ryhl
2025-05-29 1:35 ` Alexandre Courbot
2025-05-28 20:17 ` Benno Lossin
2025-05-29 1:18 ` Alexandre Courbot
2025-05-29 7:27 ` Benno Lossin
2025-06-02 9:39 ` Danilo Krummrich
2025-06-03 22:53 ` Benno Lossin
2025-06-03 23:54 ` Alexandre Courbot
2025-06-04 7:21 ` Benno Lossin
2025-06-02 13:09 ` Alexandre Courbot
2025-06-03 23:02 ` Benno Lossin
2025-06-04 0:05 ` Alexandre Courbot
2025-06-04 7:18 ` Benno Lossin
2025-06-12 13:17 ` Alexandre Courbot
2025-06-12 13:27 ` Alexandre Courbot
2025-06-12 14:49 ` Benno Lossin
2025-06-13 5:31 ` Alexandre Courbot [this message]
2025-05-21 6:45 ` [PATCH v4 05/20] gpu: nova-core: use absolute paths in register!() macro Alexandre Courbot
2025-05-30 21:38 ` Lyude Paul
2025-05-21 6:45 ` [PATCH v4 06/20] gpu: nova-core: add delimiter for helper rules " Alexandre Courbot
2025-05-30 21:39 ` Lyude Paul
2025-05-21 6:45 ` [PATCH v4 07/20] gpu: nova-core: expose the offset of each register as a type constant Alexandre Courbot
2025-05-30 21:40 ` Lyude Paul
2025-05-21 6:45 ` [PATCH v4 08/20] gpu: nova-core: allow register aliases Alexandre Courbot
2025-05-21 8:37 ` Danilo Krummrich
2025-05-22 5:14 ` Alexandre Courbot
2025-05-21 6:45 ` [PATCH v4 09/20] gpu: nova-core: increase BAR0 size to 16MB Alexandre Courbot
2025-05-30 21:46 ` Lyude Paul
2025-06-02 11:21 ` Alexandre Courbot
2025-05-21 6:45 ` [PATCH v4 10/20] gpu: nova-core: add helper function to wait on condition Alexandre Courbot
2025-05-21 6:45 ` [PATCH v4 11/20] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
2025-05-30 21:51 ` Lyude Paul
2025-05-31 14:09 ` Miguel Ojeda
2025-05-31 14:37 ` Danilo Krummrich
2025-05-31 14:45 ` Miguel Ojeda
2025-06-02 11:21 ` Alexandre Courbot
2025-05-21 6:45 ` [PATCH v4 12/20] gpu: nova-core: add DMA object struct Alexandre Courbot
2025-05-30 21:53 ` Lyude Paul
2025-05-21 6:45 ` [PATCH v4 13/20] gpu: nova-core: register sysmem flush page Alexandre Courbot
2025-05-30 21:57 ` Lyude Paul
2025-06-02 11:09 ` Danilo Krummrich
2025-06-02 11:20 ` Alexandre Courbot
2025-05-21 6:45 ` [PATCH v4 14/20] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
2025-05-30 22:22 ` Lyude Paul
2025-06-03 8:03 ` Alexandre Courbot
2025-06-02 12:06 ` Danilo Krummrich
2025-06-03 7:59 ` Alexandre Courbot
2025-05-21 6:45 ` [PATCH v4 15/20] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS Alexandre Courbot
2025-05-30 22:23 ` Lyude Paul
2025-06-02 12:26 ` Danilo Krummrich
2025-06-04 3:58 ` Alexandre Courbot
2025-05-21 6:45 ` [PATCH v4 16/20] nova-core: Add support for VBIOS ucode extraction for boot Alexandre Courbot
2025-05-27 20:38 ` Joel Fernandes
2025-05-29 6:47 ` Alexandre Courbot
2025-06-03 21:15 ` Lyude Paul
2025-06-05 16:18 ` Joel Fernandes
2025-06-02 13:33 ` Danilo Krummrich
2025-06-02 15:15 ` Joel Fernandes
2025-06-03 8:12 ` Alexandre Courbot
2025-06-03 13:47 ` Joel Fernandes
2025-06-03 13:49 ` Danilo Krummrich
2025-06-03 14:29 ` Joel Fernandes
2025-06-04 18:23 ` Joel Fernandes
2025-06-03 21:05 ` Lyude Paul
2025-06-04 10:03 ` Miguel Ojeda
2025-06-05 16:09 ` Joel Fernandes
2025-06-05 16:21 ` Danilo Krummrich
2025-06-05 16:28 ` Joel Fernandes
2025-05-21 6:45 ` [PATCH v4 17/20] gpu: nova-core: compute layout of the FRTS region Alexandre Courbot
2025-06-03 21:14 ` Lyude Paul
2025-06-04 4:18 ` Alexandre Courbot
2025-06-04 10:24 ` Danilo Krummrich
2025-06-05 13:14 ` Alexandre Courbot
2025-06-04 10:23 ` Danilo Krummrich
2025-06-05 13:36 ` Alexandre Courbot
2025-05-21 6:45 ` [PATCH v4 18/20] gpu: nova-core: add types for patching firmware binaries Alexandre Courbot
2025-06-03 21:16 ` Lyude Paul
2025-06-04 10:28 ` Danilo Krummrich
2025-06-12 7:19 ` Alexandre Courbot
2025-06-12 10:54 ` Danilo Krummrich
2025-06-12 12:52 ` Alexandre Courbot
2025-05-21 6:45 ` [PATCH v4 19/20] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS Alexandre Courbot
2025-06-03 21:32 ` Lyude Paul
2025-06-04 1:11 ` Alexandre Courbot
2025-06-04 10:42 ` Danilo Krummrich
2025-06-12 7:20 ` Alexandre Courbot
2025-05-21 6:45 ` [PATCH v4 20/20] gpu: nova-core: load and " Alexandre Courbot
2025-05-29 21:30 ` Timur Tabi
2025-05-30 22:32 ` Lyude Paul
2025-06-04 1:37 ` Alexandre Courbot
2025-06-03 21:45 ` Lyude Paul
2025-06-04 1:38 ` Alexandre Courbot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DAL5QC6RHXQB.GDEGYBUJJAJO@nvidia.com \
--to=acourbot@nvidia.com \
--cc=a.hindborg@kernel.org \
--cc=airlied@gmail.com \
--cc=alex.gaynor@gmail.com \
--cc=aliceryhl@google.com \
--cc=apopple@nvidia.com \
--cc=benno.lossin@proton.me \
--cc=bjorn3_gh@protonmail.com \
--cc=boqun.feng@gmail.com \
--cc=bskeggs@nvidia.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=gary@garyguo.net \
--cc=jhubbard@nvidia.com \
--cc=joelagnelf@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lossin@kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=ojeda@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=simona@ffwll.ch \
--cc=tmgross@umich.edu \
--cc=ttabi@nvidia.com \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.