From: Danilo Krummrich <dakr@kernel.org>
To: Alexandre Courbot <acourbot@nvidia.com>
Cc: "Miguel Ojeda" <ojeda@kernel.org>,
"Alex Gaynor" <alex.gaynor@gmail.com>,
"Boqun Feng" <boqun.feng@gmail.com>,
"Gary Guo" <gary@garyguo.net>,
"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
"Benno Lossin" <benno.lossin@proton.me>,
"Andreas Hindborg" <a.hindborg@kernel.org>,
"Alice Ryhl" <aliceryhl@google.com>,
"Trevor Gross" <tmgross@umich.edu>,
"Joel Fernandes" <joelagnelf@nvidia.com>,
"John Hubbard" <jhubbard@nvidia.com>,
rust-for-linux@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] rust: alloc: implement `extend` for `Vec`
Date: Tue, 22 Apr 2025 19:03:53 +0200 [thread overview]
Message-ID: <aAfL-e6qA9oBce5t@cassiopeiae> (raw)
In-Reply-To: <D9C61DDI99JX.31T59XPQGYBB1@nvidia.com>
On Mon, Apr 21, 2025 at 05:15:29PM +0900, Alexandre Courbot wrote:
> On Tue Apr 8, 2025 at 10:34 PM JST, Alexandre Courbot wrote:
> > On Mon Apr 7, 2025 at 8:01 PM JST, Danilo Krummrich wrote:
> >>> + /// Extends the vector by the elements of `iter`.
> >>> + ///
> >>> + /// This uses [`Iterator::size_hint`] to optimize memory reallocations, but will work even with
> >>> + /// imprecise implementations - albeit in a non-optimal way.
> >>> + ///
> >>> + /// This method returns an error if a memory reallocation required to accommodate the new items
> >>> + /// failed. In this case, callers must assume that some (but not all) elements of `iter` might
> >>> + /// have been added to the vector.
> >>> + ///
> >>> + /// # Note on optimal behavior and correctness
> >>> + ///
> >>> + /// The efficiency of this method depends on how reliable the [`Iterator::size_hint`]
> >>> + /// implementation of the `iter` is.
> >>> + ///
> >>> + /// It performs optimally with at most a single memory reallocation if the lower bound of
> >>> + /// `size_hint` is the exact number of items actually yielded.
> >>> + ///
> >>> + /// If `size_hint` is more vague, there may be as many memory reallocations as necessary to
> >>> + /// cover the whole iterator from the successive lower bounds returned by `size_hint`.
> >>> + ///
> >>> + /// If `size_hint` signals more items than actually yielded by the iterator, some unused memory
> >>> + /// might be reserved.
> >>> + ///
> >>> + /// Finally, whenever `size_hint` returns `(0, Some(0))`, the method assumes that no more items
> >>> + /// are yielded by the iterator and returns. This may result in some items not being added if
> >>> + /// there were still some remaining.
> >>> + ///
> >>> + /// In the kernel most iterators are expected to have a precise and correct `size_hint`
> >>> + /// implementation, so this should nicely optimize out for these cases.
> >>
> >> I agree, hence I think we should enforce to be provided with a guaranteed
> >> correct size hint and simplify the code. I think we should extend the signature.
> >>
> >> pub fn extend<I>(&mut self, iter: I, flags: Flags) -> Result<(), AllocError>
> >> where
> >> I: IntoIterator<Item = T>,
> >> I::IntoIter: ExactSizeIterator,
> >>
> >> And implement ExactSizeIterator for IntoIter.
> >>
> >> The only thing that bothers me a bit is that the documentation [1] of
> >> ExactSizeIterator sounds a bit ambiguous.
> >>
> >> It says: "When implementing an ExactSizeIterator, you must also implement
> >> Iterator. When doing so, the implementation of Iterator::size_hint *must*
> >> return the exact size of the iterator."
> >>
> >> But it also says: "Note that this trait is a safe trait and as such does not and
> >> cannot guarantee that the returned length is correct. This means that unsafe
> >> code must not rely on the correctness of Iterator::size_hint. The unstable and
> >> unsafe TrustedLen trait gives this additional guarantee."
> >
> > Yeah ExactSizeIterator is not the solution to this, since it can be
> > implemented without an unsafe block and the implementer is perfectly
> > free to provide an incorrect value - so we cannot trust its result.
> >
> >>
> >> Acknowledging the latter, I think we should implement our own trait for this
> >> instead. Our own version of TrustedLen seems reasonable to me.
> >
> > That sounds reasonable and would greatly simplify the code (and remove
> > most of my fears about its optimization). Let me explore that direction.
>
> Well, that turned out to be an interesting rabbit hole.
>
> Leveraging the existing traits seems a bit difficult:
>
> - `ExactSizeIterator` cannot be implemented for adapters that increase the
> length of their iterators, because if one of them is already `usize::MAX` long
> then the size wouldn't be exact anymore. [1]
>
> - And `TrustedLen` cannot be implemented for adapters that make an iterator
> shorter, because if the iterator returns more than `usize::MAX` items (i.e.
> has an upper bound set to `None`) then the adapter can't predict the actual
> length. [2]
Why is this a problem for the above implementation of Vec::extend()?
I just looked it up and it seems that std [1] does the same thing. Do I miss
anything?
[1] https://github.com/rust-lang/rust/blob/master/library/alloc/src/vec/spec_extend.rs#L25
>
> So in both cases, the model breaks at the limit. OTOH, in our case we want to
> gather items into some collection, meaning that we are quite unlikely to ever
> reach that limit, as doing so would likely trigger an OOM anyway.
>
> Which means that we need to come with our own unsafe trait
> (`ExactSizeCollectible`?), which will have its own limits. It shall only be
> used to collect things (because we are unlikely to reach a size of `usize::MAX`
> in that context), and will take the lower bound of `size_hint` at face value,
> meaning it might collect less than the whole collection if the lower bound of
> the iterator or one of its adapters ever reaches `usize::MAX`. Again in the
> context of a collection this should never happen, but it's still a limitation.
>
> If we can live with this, then with a bit of code (because the new trait would
> need to be implemented for every iterator and adapter we want to collect out
> there) we should be able to provide an efficient, one-pass `collect()` method.
>
> Thoughts?
>
> [1] https://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html#when-shouldnt-an-adapter-be-exactsizeiterator
> [2] https://doc.rust-lang.org/core/iter/trait.TrustedLen.html#when-shouldnt-an-adapter-be-trustedlen
>
next prev parent reply other threads:[~2025-04-22 17:04 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-06 13:01 [PATCH v3] rust: alloc: implement `extend` for `Vec` Alexandre Courbot
2025-04-07 11:01 ` Danilo Krummrich
2025-04-08 13:34 ` Alexandre Courbot
2025-04-21 8:15 ` Alexandre Courbot
2025-04-22 17:03 ` Danilo Krummrich [this message]
2025-04-23 1:02 ` Alexandre Courbot
2025-04-23 8:51 ` Alice Ryhl
2025-04-23 9:40 ` Alexandre Courbot
2025-04-23 16:03 ` Boqun Feng
2025-04-24 11:50 ` Alice Ryhl
2025-04-24 13:36 ` Boqun Feng
2025-04-23 9:47 ` Danilo Krummrich
2025-04-23 13:15 ` Alexandre Courbot
-- strict thread matches above, loose matches on Subject: below --
2025-04-07 16:33 Benno Lossin
2025-04-08 14:00 ` Alexandre Courbot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aAfL-e6qA9oBce5t@cassiopeiae \
--to=dakr@kernel.org \
--cc=a.hindborg@kernel.org \
--cc=acourbot@nvidia.com \
--cc=alex.gaynor@gmail.com \
--cc=aliceryhl@google.com \
--cc=benno.lossin@proton.me \
--cc=bjorn3_gh@protonmail.com \
--cc=boqun.feng@gmail.com \
--cc=gary@garyguo.net \
--cc=jhubbard@nvidia.com \
--cc=joelagnelf@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ojeda@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=tmgross@umich.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox