Re: [PATCH v3] rust: alloc: implement `extend` for `Vec`

public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed

From: Danilo Krummrich <dakr@kernel.org>
To: Alexandre Courbot <acourbot@nvidia.com>
Cc: "Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <benno.lossin@proton.me>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	rust-for-linux@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] rust: alloc: implement `extend` for `Vec`
Date: Tue, 22 Apr 2025 19:03:53 +0200	[thread overview]
Message-ID: <aAfL-e6qA9oBce5t@cassiopeiae> (raw)
In-Reply-To: <D9C61DDI99JX.31T59XPQGYBB1@nvidia.com>

On Mon, Apr 21, 2025 at 05:15:29PM +0900, Alexandre Courbot wrote:
> On Tue Apr 8, 2025 at 10:34 PM JST, Alexandre Courbot wrote:
> > On Mon Apr 7, 2025 at 8:01 PM JST, Danilo Krummrich wrote:
> >>> +    /// Extends the vector by the elements of `iter`.
> >>> +    ///
> >>> +    /// This uses [`Iterator::size_hint`] to optimize memory reallocations, but will work even with
> >>> +    /// imprecise implementations - albeit in a non-optimal way.
> >>> +    ///
> >>> +    /// This method returns an error if a memory reallocation required to accommodate the new items
> >>> +    /// failed. In this case, callers must assume that some (but not all) elements of `iter` might
> >>> +    /// have been added to the vector.
> >>> +    ///
> >>> +    /// # Note on optimal behavior and correctness
> >>> +    ///
> >>> +    /// The efficiency of this method depends on how reliable the [`Iterator::size_hint`]
> >>> +    /// implementation of the `iter` is.
> >>> +    ///
> >>> +    /// It performs optimally with at most a single memory reallocation if the lower bound of
> >>> +    /// `size_hint` is the exact number of items actually yielded.
> >>> +    ///
> >>> +    /// If `size_hint` is more vague, there may be as many memory reallocations as necessary to
> >>> +    /// cover the whole iterator from the successive lower bounds returned by `size_hint`.
> >>> +    ///
> >>> +    /// If `size_hint` signals more items than actually yielded by the iterator, some unused memory
> >>> +    /// might be reserved.
> >>> +    ///
> >>> +    /// Finally, whenever `size_hint` returns `(0, Some(0))`, the method assumes that no more items
> >>> +    /// are yielded by the iterator and returns. This may result in some items not being added if
> >>> +    /// there were still some remaining.
> >>> +    ///
> >>> +    /// In the kernel most iterators are expected to have a precise and correct `size_hint`
> >>> +    /// implementation, so this should nicely optimize out for these cases.
> >>
> >> I agree, hence I think we should enforce to be provided with a guaranteed
> >> correct size hint and simplify the code. I think we should extend the signature.
> >>
> >>      pub fn extend<I>(&mut self, iter: I, flags: Flags) -> Result<(), AllocError>
> >>      where
> >>          I: IntoIterator<Item = T>,
> >>          I::IntoIter: ExactSizeIterator,
> >>
> >> And implement ExactSizeIterator for IntoIter.
> >>
> >> The only thing that bothers me a bit is that the documentation [1] of
> >> ExactSizeIterator sounds a bit ambiguous.
> >>
> >> It says: "When implementing an ExactSizeIterator, you must also implement
> >> Iterator. When doing so, the implementation of Iterator::size_hint *must*
> >> return the exact size of the iterator."
> >>
> >> But it also says: "Note that this trait is a safe trait and as such does not and
> >> cannot guarantee that the returned length is correct. This means that unsafe
> >> code must not rely on the correctness of Iterator::size_hint. The unstable and
> >> unsafe TrustedLen trait gives this additional guarantee."
> >
> > Yeah ExactSizeIterator is not the solution to this, since it can be
> > implemented without an unsafe block and the implementer is perfectly
> > free to provide an incorrect value - so we cannot trust its result.
> >
> >>
> >> Acknowledging the latter, I think we should implement our own trait for this
> >> instead. Our own version of TrustedLen seems reasonable to me.
> >
> > That sounds reasonable and would greatly simplify the code (and remove
> > most of my fears about its optimization). Let me explore that direction.
> 
> Well, that turned out to be an interesting rabbit hole.
> 
> Leveraging the existing traits seems a bit difficult:
> 
> - `ExactSizeIterator` cannot be implemented for adapters that increase the
>   length of their iterators, because if one of them is already `usize::MAX` long
>   then the size wouldn't be exact anymore. [1]
> 
> - And `TrustedLen` cannot be implemented for adapters that make an iterator
>   shorter, because if the iterator returns more than `usize::MAX` items (i.e.
>   has an upper bound set to `None`) then the adapter can't predict the actual
>   length. [2]

Why is this a problem for the above implementation of Vec::extend()?

I just looked it up and it seems that std [1] does the same thing. Do I miss
anything?

[1] https://github.com/rust-lang/rust/blob/master/library/alloc/src/vec/spec_extend.rs#L25

> 
> So in both cases, the model breaks at the limit. OTOH, in our case we want to
> gather items into some collection, meaning that we are quite unlikely to ever
> reach that limit, as doing so would likely trigger an OOM anyway.
> 
> Which means that we need to come with our own unsafe trait
> (`ExactSizeCollectible`?), which will have its own limits. It shall only be
> used to collect things (because we are unlikely to reach a size of `usize::MAX`
> in that context), and will take the lower bound of `size_hint` at face value,
> meaning it might collect less than the whole collection if the lower bound of
> the iterator or one of its adapters ever reaches `usize::MAX`. Again in the
> context of a collection this should never happen, but it's still a limitation.
> 
> If we can live with this, then with a bit of code (because the new trait would
> need to be implemented for every iterator and adapter we want to collect out
> there) we should be able to provide an efficient, one-pass `collect()` method.
> 
> Thoughts?
> 
> [1] https://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html#when-shouldnt-an-adapter-be-exactsizeiterator
> [2] https://doc.rust-lang.org/core/iter/trait.TrustedLen.html#when-shouldnt-an-adapter-be-trustedlen
>

next prev parent reply	other threads:[~2025-04-22 17:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-06 13:01 [PATCH v3] rust: alloc: implement `extend` for `Vec` Alexandre Courbot
2025-04-07 11:01 ` Danilo Krummrich
2025-04-08 13:34   ` Alexandre Courbot
2025-04-21  8:15     ` Alexandre Courbot
2025-04-22 17:03       ` Danilo Krummrich [this message]
2025-04-23  1:02         ` Alexandre Courbot
2025-04-23  8:51           ` Alice Ryhl
2025-04-23  9:40             ` Alexandre Courbot
2025-04-23 16:03               ` Boqun Feng
2025-04-24 11:50                 ` Alice Ryhl
2025-04-24 13:36                   ` Boqun Feng
2025-04-23  9:47           ` Danilo Krummrich
2025-04-23 13:15             ` Alexandre Courbot
  -- strict thread matches above, loose matches on Subject: below --
2025-04-07 16:33 Benno Lossin
2025-04-08 14:00 ` Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aAfL-e6qA9oBce5t@cassiopeiae \
    --to=dakr@kernel.org \
    --cc=a.hindborg@kernel.org \
    --cc=acourbot@nvidia.com \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=benno.lossin@proton.me \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=gary@garyguo.net \
    --cc=jhubbard@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=tmgross@umich.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox