Re: [PATCH v5 6/6] rust_binder: use bitmap for allocation of handles

rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alice Ryhl <aliceryhl@google.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Arve Hjønnevåg" <arve@android.com>,
	"Todd Kjos" <tkjos@android.com>,
	"Martijn Coenen" <maco@android.com>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	"Christian Brauner" <brauner@kernel.org>,
	"Carlos Llamas" <cmllamas@google.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Burak Emir" <bqe@google.com>, "Miguel Ojeda" <ojeda@kernel.org>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <lossin@kernel.org>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Danilo Krummrich" <dakr@kernel.org>,
	rust-for-linux@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 6/6] rust_binder: use bitmap for allocation of handles
Date: Wed, 19 Nov 2025 23:16:34 +0000	[thread overview]
Message-ID: <aR5P0p-7SQ8dGrmz@google.com> (raw)
In-Reply-To: <aRYaYlX1vQG_GUAy@yury>

On Thu, Nov 13, 2025 at 12:50:37PM -0500, Yury Norov wrote:
> On Wed, Nov 12, 2025 at 11:35:07PM +0000, Alice Ryhl wrote:
> > On Wed, Nov 12, 2025 at 02:09:19PM -0500, Yury Norov wrote:
> > > On Wed, Nov 12, 2025 at 12:47:24PM +0000, Alice Ryhl wrote:
> > > > To find an unused Binder handle, Rust Binder currently iterates the
> > > > red/black tree from the beginning until it finds a gap in the keys. This
> > > > is extremely slow.
> > > 
> > > Can you share performance numbers? 
> > 
> > I have not benchmarked it in the Rust driver, but it replaces
> > potentially thousands of calls to rb_next() with a single call to
> > find_unused_id(), so I'm feeling good about the performance. And the
> 
> Feelings are good, but numbers are better. In the original dbitmap
> patch, Carlos shared the experiment details and rough numbers, and
> it's enough for me. Can you just reproduce his steps?

Unfortunately Carlos doesn't have his benchmark files anymore, but I
made my own benchmark. It's a test where a client repeatedly sends a
transaction containing a node to a server, and the server stashes it
forever. This way, the number of nodes keeps growing forever. (Up to
100k nodes.)

Transaction Range │ Baseline (Rust) │ Bitmap (Rust) │ Comparison (C)
0 - 10,000        │          176.88 │         92.93 │          99.41
10,000 - 20,000   │          437.37 │         87.74 │          98.55
20,000 - 30,000   │          677.49 │         76.24 │          96.37
30,000 - 40,000   │          901.76 │         83.39 │          96.73
40,000 - 50,000   │         1126.62 │        100.44 │          94.57
50,000 - 60,000   │         1288.98 │         94.38 │          96.64
60,000 - 70,000   │         1588.74 │         88.27 │          96.36
70,000 - 80,000   │         1812.97 │         93.97 │          91.24
80,000 - 90,000   │         2062.95 │         92.22 │         102.01
90,000 - 100,000  │         2330.03 │         97.18 │         100.31

The numbers measure the roundtrip latency in microseconds that it takes
to send the transaction containing the node, and to receive the
response.

I would not necessarily trust the precise numbers. Rust seems slightly
better here, but I ran it on an emulator so it may be noisy. Regardless,
I think it's pretty clear that an improvement has happened.

> > equivalent change in the C driver was done because this particular piece
> > of code was causing contention issues by holding the spinlock for a long
> > time.
> > 
> > The characteristics of this collection is that there is one per process
> > using the driver. Most processes have only a few entries in this bitmap,
> > so the inline representation will apply in most cases. However, there
> > are a few processes that have a large number of entries in the 4 to
> > maybe 5 figures range. Those processes are what caused the contention
> > issue mentioned above.
> > 
> > > > To improve the performance, add a bitmap that keeps track of which
> > > > indices are actually in use. This allows us to quickly find an unused
> > > > key in the red/black tree.
> > > > 
> > > > This logic matches the approach used by C Binder. It was chosen
> > > > partially because it's the most memory efficient solution.
> > > 
> > > That inaccurate. You are adding a new data structure (bitmap), advocating
> > > it with an improvement on search side, and that makes sense.
> > > 
> > > But now you're saying it's also a more memory efficient approach, which
> > > doesn't sound trivial because the most memory efficient solution is to
> > > bring no new data structures at all.
> > > 
> > > I guess you meant that bitmap is the most efficient data structure to
> > > index used/unused nodes. If so, can you please rephrase the sentence?
> > 
> > Yes I can rephrase.
> > 
> > Adding more data is of course always less memory efficient. What I meant
> > is that this is more memory efficient than the competing solution of
> > using an augmented rbtree that Carlos mentioned here:
> > 
> > https://lore.kernel.org/p/aC1PQ7tmcqMSmbHc@google.com
> > 
> > > > +            if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> > > > +                match refs.by_handle.entry(res.as_u32()) {
> > > > +                    rbtree::Entry::Vacant(entry) => break (res, entry),
> > > > +                    rbtree::Entry::Occupied(_) => {
> > > > +                        pr_err!("Detected mismatch between handle_is_present and by_handle");
> > > > +                        res.acquire();
> > > > +                        continue;
> > > 
> > > At this point you've detected mismatch between two linked data
> > > structures. It means that one of them or both are corrupted. To
> > > me it looks fatal, and your pr_err() confirms it. How could you
> > > continue then?
> > 
> > Although we should never hit this codepath in real code, I don't think
> > we need to kill the kernel. We can treat the r/b tree as source of truth
> > and adjust the bitmap when mismathces are detected.
> 
> There's no such thing like a 'source of truth', and rb-tree is not even
> close to that.
> 
> You add bitmap for performance reasons, but with that you also bring
> some redundancy. Now, you cross-check two data structures and see
> discrepancy. At this point you can only say that either one of them
> or both are corrupted.
> 
> Relying on rb-tree over bitmap is simply wrong. Statistically
> speaking, there's more chances to corrupt rb-tree - just because it
> is scattered and takes more memory.
> 
> > I could add a kernel warning, though. That shouldn't kill an Android
> > device.
> 
> Assuming, you're talking about WARN(), I agree. But it looks like my
> reasoning differs.
>  
> You never hit this path during a normal operation, as you said. So if
> you're there, it means that something is already going wrong. If you
> issue WARN(), you let those focused on system integrity to leverage
> panic_on_warn and shut the system down to minimize any possible damage. 
> 
> With that precaution, you're free to do whatever you find practical to
> 'recover', or even do nothing. But please avoid referring rb-tree as a
> source of truth - it's a misleading and dangerous misconception.

Ok.

I picked the rb-tree because using the bitmap isn't possible - it
doesn't store the auxiliary data that we need.

> > > > +                    }
> > > > +                }
> > > > +            }
> > > > +
> > > > +            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> > > > +            drop(refs_lock);
> > > > +            let resizer = grow_request.realloc(GFP_KERNEL)?;
> > > > +            refs_lock = self.node_refs.lock();
> > > > +            refs = &mut *refs_lock;
> > > > +            refs.handle_is_present.grow(resizer);
> > > 
> > > Is it possible to turn this block into a less wordy statement? Maybe a
> > > wrapper function for it? Ideally, the grow request should be handled
> > > transparently in .find_unused_id().
> > 
> > I can extract this block into a separate function, but I think it would
> > be tricky to move the entire logic inside .find_unused_id() because of
> > the mutex lock/unlock situation.
> > 
> > > > @@ -905,6 +924,16 @@ pub(crate) fn update_ref(
> > > >                  let id = info.node_ref().node.global_id();
> > > >                  refs.by_handle.remove(&handle);
> > > >                  refs.by_node.remove(&id);
> > > > +                refs.handle_is_present.release_id(handle as usize);
> > > > +
> > > > +                if let Some(shrink) = refs.handle_is_present.shrink_request() {
> > > 
> > > This is questionable. Shrinking is usually the very slow path, and we
> > > don't shrink unless we're really close or even inside the OOM condition.
> > > 
> > > In this case, shrink_request() on average returns false, but it's
> > > O(N), which makes _every_ release_id() O(N), while it should be O(1).
> > 
> > The current implementation of shrink_request() will refuse to shrink the
> > pool unless the largest bit is less than 1/4 of the capacity, so it
> > should not perform the expensive operation very often.
> > 
> > That said, it does call find_last_bit() each time, which I guess is
> > O(N). But my assumption was that find_last_bit() is cheap enough that it
> > wouldn't be a problem.
> 
> It's O(N), while the expectation for release_id() is to be O(1). But if
> you think it's OK for you - I'm OK as well. Can you explicitly mention
> it in function description?

Sure I will mention it.

> > > Consider a very realistic case: you're destroying every object, and thus
> > > removing every ID in the associate ID pool, doing it in LIFO order. That
> > > way you will need to call shrink_request() about O(log(N)) times, making
> > > the whole complexity ~O(N*log(N)); and you'll have to make log(N)
> > > realloc()s for nothing. If you release IDs in FIFO order, you don't
> > > call realloc(), but your shrink_request() total complexity will be O(N^2). 
> > 
> > Even if we end up making log(N) reallocs, the total complexity of the
> > reallocs is O(N) because the amount of data being reallocd halves each
> > time. So the total number of bytes copied by reallocs ends up being:
> > 
> >     1 + 2 + 4 + 8 + ... + 2^log(N) <= 2^(1+log(N)) = 2*N
> > 
> > which is O(N).
> 
> OK, I trust your math better than mine. :)
> 
> > Of course, deleting the corresponding entry from the red/black tree is
> > O(log N), so it's still O(N*log(N)) for the N deletions from the rb
> > tree.
> > 
> > > Can you compare performance numbers with and without shrinking under a
> > > typical payload? Is there any mechanism to inspect the ID pools at runtime,
> > > like expose via procfs?
> > 
> > We expose the contents of the red/black tree via the binder_logs
> > mechanism, but that doesn't show the *capacity* of the bitmap. Only the
> > index of the largest set bit.
> > 
> > > Can you make your shrinking logic conditional on some reasonable OOM
> > > heuristics, maybe OOM event driven?
> > > 
> > > And even without OOM, you can safely skip shrinking if the number of IDs in
> > > use is greater than 1/4 of the capacity, or there's any used ID with
> > > the index greater than the 1/2 capacity.
> > 
> > I guess we could register a shrinker, but I don't think it's worth it.
> 
> OK, if you're fine with O(N) for release_id() - I'm fine as well. Can
> you mention OOM-driven shrinking as a possible alternative for the
> following improvements?

Sure.

Alice

next prev parent reply	other threads:[~2025-11-19 23:16 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12 12:47 [PATCH v5 0/6] Use Rust Bitmap from Rust Binder driver Alice Ryhl
2025-11-12 12:47 ` [PATCH v5 1/6] rust: bitmap: add MAX_LEN and MAX_INLINE_LEN constants Alice Ryhl
2025-11-12 12:47 ` [PATCH v5 2/6] rust: bitmap: add BitmapVec::new_inline() Alice Ryhl
2025-11-12 15:36   ` Yury Norov
2025-11-13 17:41   ` kernel test robot
2025-11-12 12:47 ` [PATCH v5 3/6] rust: bitmap: rename IdPool::new() to with_capacity() Alice Ryhl
2025-11-12 12:55   ` Alice Ryhl
2025-11-12 15:46   ` Yury Norov
2025-11-14  1:31   ` kernel test robot
2025-11-12 12:47 ` [PATCH v5 4/6] rust: id_pool: do not supply starting capacity Alice Ryhl
2025-11-12 12:47 ` [PATCH v5 5/6] rust: id_pool: do not immediately acquire new ids Alice Ryhl
2025-11-12 12:47 ` [PATCH v5 6/6] rust_binder: use bitmap for allocation of handles Alice Ryhl
2025-11-12 19:09   ` Yury Norov
2025-11-12 23:35     ` Alice Ryhl
2025-11-13  8:32       ` Miguel Ojeda
2025-11-13  9:14         ` Alice Ryhl
2025-11-13  9:21           ` Miguel Ojeda
2025-11-13 17:50       ` Yury Norov
2025-11-19 23:16         ` Alice Ryhl [this message]
2025-11-12 15:43 ` [PATCH v5 0/6] Use Rust Bitmap from Rust Binder driver Yury Norov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aR5P0p-7SQ8dGrmz@google.com \
    --to=aliceryhl@google.com \
    --cc=a.hindborg@kernel.org \
    --cc=arve@android.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=bqe@google.com \
    --cc=brauner@kernel.org \
    --cc=cmllamas@google.com \
    --cc=dakr@kernel.org \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=maco@android.com \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=tkjos@android.com \
    --cc=tmgross@umich.edu \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).