Re: [PATCH v4 2/3] Adds Rust Bitmap API.

rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alice Ryhl <aliceryhl@google.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: Burak Emir <bqe@google.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	 Viresh Kumar <viresh.kumar@linaro.org>,
	Miguel Ojeda <ojeda@kernel.org>,
	 Alex Gaynor <alex.gaynor@gmail.com>,
	Boqun Feng <boqun.feng@gmail.com>,  Gary Guo <gary@garyguo.net>,
	Bjorn Roy Baron <bjorn3_gh@protonmail.com>,
	 Benno Lossin <benno.lossin@proton.me>,
	Andreas Hindborg <a.hindborg@kernel.org>,
	 Trevor Gross <tmgross@umich.edu>,
	rust-for-linux@vger.kernel.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 2/3] Adds Rust Bitmap API.
Date: Wed, 19 Mar 2025 16:03:45 +0000	[thread overview]
Message-ID: <Z9rq4aqY3PR8XzEN@google.com> (raw)
In-Reply-To: <Z9rVMJMlx4klpt4h@thinkpad>

On Wed, Mar 19, 2025 at 10:31:12AM -0400, Yury Norov wrote:
> On Wed, Mar 19, 2025 at 10:39:57AM +0000, Alice Ryhl wrote:
> > On Tue, Mar 18, 2025 at 04:17:18PM -0400, Yury Norov wrote:
> > > On Tue, Mar 18, 2025 at 05:51:53PM +0000, Burak Emir wrote:
> 
> [...]
> 
> > > Are those 'drop' and 'new' something like a special rust words? If not -
> > > can you use alloc and free wording? Would be nice to have rust part
> > > looking similar to C. Nobody wants to keep two sets of names in mind.
> > 
> > Rewording `new` to `alloc` seems reasonable.
> > 
> > As for "drop", that word is special. It's the destructor that is called
> > automatically when the bitmap goes out of scope - you can't call it
> > directly. It must be called "drop".
> 
> OK, then drop and new.
> 
> > > > +        if let Ok(nbits_u32) = u32::try_from(nbits) {
> > > > +            // SAFETY: `nbits == 0` is permitted and `nbits <= u32::MAX`.
> > > > +            let ptr = unsafe { bindings::bitmap_zalloc(nbits_u32, flags.as_raw()) };
> > > > +            // Zero-size allocation is ok and yields a dangling pointer.
> > > 
> > > Maybe it's OK, but I'm not aware of any a correct algorithm that needs
> > > 0-sized bitmaps. I already asked for it on previous iteration, right?
> > > So unless you can show me such an example and explain what for you need
> > > 0-sized bitmaps, please drop this wording.
> > 
> > Thinking about it, I actually think there is a good reason to support
> > zero-sized bitmaps for the Binder use-case. Right now, we always
> > allocate at least one long worth of bits even if they're all 0. But we
> > can improve the memory usage of this code by not allocating any memory
> > for the bitmap until the first time we use it.
> > 
> > The reason that dbitmap.h doesn't do this is historical. Originally, the
> > bitmap started out having BIT(0) set to 1, so we needed an allocation to
> > hold that from the very beginning. But that was changed in commit
> > 11512c197d38 ("binder: fix descriptor lookup for context manager"), so
> > the bitmap now starts out empty.
> 
> Empty bitmap is not a 0-length bitmap, right?
> 
> If it was me inventing dynamic bitmaps, and if I was concerned about
> usability and performance, I would think carefully what exactly the
> request for 0-bits Bitmap object means.
> 
> I would probably consider it as a caller error, which makes total
> sense.
> 
> Or I would consider it as a special hint, like 'give me something to
> begin with'.
> 
> If I decided to go with the 2nd option, I'd probably avoid memory
> allocation at all, and re-use the pointer as the bitmap of the length
> BITS_PER_LONG. That would save _a lot_ on useless small kmalloc()
> calls.

Okay that's clever. We would probably avoid the need for allocations in
the vast vast majority of cases with that approach.

> The whole business of dynamic arrays is about potentially infinite
> number of elements. User of your array doesn't even care about the
> exact length, because it may get changed anytime, right?
> 
> In that case, the comment would sound like:
> 
>   // Zero-size Bitmap request yields a bitmap of an arbitrary
>   // non-zero length.
> 
> [...]
>  
> > The clippy linter emits a warning if you have a `len` method but don't
> > have an `is_empty` method, since Rust containers usually have both.
> > 
> > https://rust-lang.github.io/rust-clippy/master/index.html#len_without_is_empty
> > 
> > But the confusion with "no set bits" seems like a good reason to silence
> > that warning for bitmap.
>  
> So the comment at your link says:
> 
>  It is good custom to have both methods, because for some data structures,
>  asking about the length will be a costly operation, whereas .is_empty()
>  can usually answer in constant time.
> 
> In this case both len() and is_empty() are O(1), but the last one
> is totally misleading.
> 
> You guys would better teach your linter to understand when the len()
> is cheap, so is_empty() is useless.

Heh, I'm not sure I agree with the lint's explanation. But all stdlib
Rust data structures have both `len` and `is_empty` even if computing
the length is O(1). In my eyes, it's just a convention, similar to
calling the constructor "new".

> > > > +    /// Sets bit with index `index`.
> > > > +    ///
> > > > +    /// # Panics
> > > > +    ///
> > > > +    /// Panics if `index` is greater than or equal to `self.nbits`.
> > > > +    #[inline]
> > > > +    pub fn set_bit(&mut self, index: usize) {
> > > 
> > > set_bit() is an atomic name, but you wire it to a non-atomic operation.
> > > This is a mess.
> > 
> > Hmm. I do generally agree that we should try to match C names, but I'm
> > unsure about this particular case due to the underscores.
> > 
> > This method takes "&mut self" rather than "&self", which means that the
> > caller must have exclusive access to the bitmap to call this method.
> > Attempting to call it when the bitmap is shared will result in a
> > compilation failure, so it is impossible to call the non-atomic method
> > in cases where you must use the atomic version.
> 
> Does mutual access implies memory barriers? Your code doesn't add
> them. Is rust compiler so smart to realize that you need it?
> 
> I mean the following scenario:
> 
> CPU1                            CPU2
> a.set_bit(1)
>                                 a.test_bit(1) -> 0
> cache_flush()
>                                 a.test_bit(1) -> 1
> 
> If the above is impossible, then yes, your set_bit is atomic.

Yes, if a method is "&mut self" then this case is impossible.

Basically, the way it works is that the only APIs that hand out `&mut`
references to a value must somehow enforce that the access is exclusive.
How it enforces that is up to the API.

So for example, if you have a local variable that you know is not
shared, you can just obtain a `&mut` reference to it. No barriers needed
since it's a local variable that is not shared.

On the other hand, if you have a variable protected by a spinlock, then
the spinlock API will only hand out `&mut` references when the spinlock
is locked. And the borrow-checker enforces that the `&mut` can't be used
after you unlock it again. In this case, the spin_lock / spin_unlock
calls are sufficient barriers to use __set_bit.

And so on for each synchronization method.

Point is, any time you have a `&mut` reference to a value, you know that
it's exclusive, because whichever API created that reference has some
mechanism to ensure that.

> > We could call this method __set_bit, but using underscore prefixes for
> > methods is very very rare in Rust code because prefixing a name with an
> > underscore usually means "do not emit warnings if this thing is unused".
> > For example, when locking a mutex, you might write this:
> > 
> > {
> >     let _lock = my_mutex.lock();
> >     // do stuff ...
> > 
> >     // _lock goes out of scope here, unlocking the mutex
> > }
> > 
> > Here, if you called the variable "lock" you would get an unused variable
> > warning, but prefixing the variable name with an underscore silences
> > that warning.
> 
> But underscored method is not a local underscored variable. It doesn't
> have scope.

If a method isn't public, then Rust will perform a similar check to the
entire compilation unit and emit a warning if it's unused. The
underscore silences that.

Though as Miguel points out, this particular method is public, so the
check is not performed - there might be callers in other compilation
units that this rustc invocation doesn't know about.

> > We can still call the method __set_bit if you think that is best, but
> 
> No I don't. Nobody likes underscored notation for non-atomic bit ops,
> but it's a historical thing.
> 
> > because underscore prefixes usually mean something different in Rust, I
> > wonder if we should use slightly different names in Rust. Thoughts?
> 
> This is a new project, and you may decide to change notation. Just
> please be very specific about it.
> 
> So I'd suggest the following:
> 
> 1. Mirror C interface: __set_bit() and set_bit()
> 2. set_bit_nonatomic() and set_bit()
> 3. set_bit() and set_bit_atomic()
> 
> The last one looks the best to me, except that it just the opposite
> to what people got used to in Linux. If you go with #3, please add
> both atomic and non-atomic set_bit()s in the same patch, together with
> a bunch of comments, so people will be aware.

Solution #3 sounds good to me.

Alice

next prev parent reply	other threads:[~2025-03-19 16:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-18 16:42 [PATCH v4 0/3] Rust: Add Rust bitmap API, ID pool and bindings Burak Emir
2025-03-18 17:45 ` [PATCH v4 1/3] Adds bitmap.c and bitops.c Rust bindings Burak Emir
2025-03-18 18:49   ` Yury Norov
2025-03-18 17:51 ` [PATCH v4 2/3] Adds Rust Bitmap API Burak Emir
2025-03-18 20:17   ` Yury Norov
2025-03-19 10:39     ` Alice Ryhl
2025-03-19 11:41       ` Miguel Ojeda
2025-03-19 14:31       ` Yury Norov
2025-03-19 16:03         ` Alice Ryhl [this message]
2025-03-18 17:54 ` [PATCH v4 3/3] Add DynamicIdPool Rust API Burak Emir

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9rq4aqY3PR8XzEN@google.com \
    --to=aliceryhl@google.com \
    --cc=a.hindborg@kernel.org \
    --cc=alex.gaynor@gmail.com \
    --cc=benno.lossin@proton.me \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=bqe@google.com \
    --cc=gary@garyguo.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=tmgross@umich.edu \
    --cc=viresh.kumar@linaro.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).