public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed
From: "Alexandre Courbot" <acourbot@nvidia.com>
To: "Alexandre Courbot" <acourbot@nvidia.com>,
	"Danilo Krummrich" <dakr@kernel.org>
Cc: "Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <benno.lossin@proton.me>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	rust-for-linux@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] rust: alloc: implement `extend` for `Vec`
Date: Mon, 21 Apr 2025 17:15:29 +0900	[thread overview]
Message-ID: <D9C61DDI99JX.31T59XPQGYBB1@nvidia.com> (raw)
In-Reply-To: <D91AOKOTDXWC.7R5K5DI87QU4@nvidia.com>

On Tue Apr 8, 2025 at 10:34 PM JST, Alexandre Courbot wrote:
> On Mon Apr 7, 2025 at 8:01 PM JST, Danilo Krummrich wrote:
>>> +    /// Extends the vector by the elements of `iter`.
>>> +    ///
>>> +    /// This uses [`Iterator::size_hint`] to optimize memory reallocations, but will work even with
>>> +    /// imprecise implementations - albeit in a non-optimal way.
>>> +    ///
>>> +    /// This method returns an error if a memory reallocation required to accommodate the new items
>>> +    /// failed. In this case, callers must assume that some (but not all) elements of `iter` might
>>> +    /// have been added to the vector.
>>> +    ///
>>> +    /// # Note on optimal behavior and correctness
>>> +    ///
>>> +    /// The efficiency of this method depends on how reliable the [`Iterator::size_hint`]
>>> +    /// implementation of the `iter` is.
>>> +    ///
>>> +    /// It performs optimally with at most a single memory reallocation if the lower bound of
>>> +    /// `size_hint` is the exact number of items actually yielded.
>>> +    ///
>>> +    /// If `size_hint` is more vague, there may be as many memory reallocations as necessary to
>>> +    /// cover the whole iterator from the successive lower bounds returned by `size_hint`.
>>> +    ///
>>> +    /// If `size_hint` signals more items than actually yielded by the iterator, some unused memory
>>> +    /// might be reserved.
>>> +    ///
>>> +    /// Finally, whenever `size_hint` returns `(0, Some(0))`, the method assumes that no more items
>>> +    /// are yielded by the iterator and returns. This may result in some items not being added if
>>> +    /// there were still some remaining.
>>> +    ///
>>> +    /// In the kernel most iterators are expected to have a precise and correct `size_hint`
>>> +    /// implementation, so this should nicely optimize out for these cases.
>>
>> I agree, hence I think we should enforce to be provided with a guaranteed
>> correct size hint and simplify the code. I think we should extend the signature.
>>
>>      pub fn extend<I>(&mut self, iter: I, flags: Flags) -> Result<(), AllocError>
>>      where
>>          I: IntoIterator<Item = T>,
>>          I::IntoIter: ExactSizeIterator,
>>
>> And implement ExactSizeIterator for IntoIter.
>>
>> The only thing that bothers me a bit is that the documentation [1] of
>> ExactSizeIterator sounds a bit ambiguous.
>>
>> It says: "When implementing an ExactSizeIterator, you must also implement
>> Iterator. When doing so, the implementation of Iterator::size_hint *must*
>> return the exact size of the iterator."
>>
>> But it also says: "Note that this trait is a safe trait and as such does not and
>> cannot guarantee that the returned length is correct. This means that unsafe
>> code must not rely on the correctness of Iterator::size_hint. The unstable and
>> unsafe TrustedLen trait gives this additional guarantee."
>
> Yeah ExactSizeIterator is not the solution to this, since it can be
> implemented without an unsafe block and the implementer is perfectly
> free to provide an incorrect value - so we cannot trust its result.
>
>>
>> Acknowledging the latter, I think we should implement our own trait for this
>> instead. Our own version of TrustedLen seems reasonable to me.
>
> That sounds reasonable and would greatly simplify the code (and remove
> most of my fears about its optimization). Let me explore that direction.

Well, that turned out to be an interesting rabbit hole.

Leveraging the existing traits seems a bit difficult:

- `ExactSizeIterator` cannot be implemented for adapters that increase the
  length of their iterators, because if one of them is already `usize::MAX` long
  then the size wouldn't be exact anymore. [1]

- And `TrustedLen` cannot be implemented for adapters that make an iterator
  shorter, because if the iterator returns more than `usize::MAX` items (i.e.
  has an upper bound set to `None`) then the adapter can't predict the actual
  length. [2]

So in both cases, the model breaks at the limit. OTOH, in our case we want to
gather items into some collection, meaning that we are quite unlikely to ever
reach that limit, as doing so would likely trigger an OOM anyway.

Which means that we need to come with our own unsafe trait
(`ExactSizeCollectible`?), which will have its own limits. It shall only be
used to collect things (because we are unlikely to reach a size of `usize::MAX`
in that context), and will take the lower bound of `size_hint` at face value,
meaning it might collect less than the whole collection if the lower bound of
the iterator or one of its adapters ever reaches `usize::MAX`. Again in the
context of a collection this should never happen, but it's still a limitation.

If we can live with this, then with a bit of code (because the new trait would
need to be implemented for every iterator and adapter we want to collect out
there) we should be able to provide an efficient, one-pass `collect()` method.

Thoughts?

[1] https://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html#when-shouldnt-an-adapter-be-exactsizeiterator
[2] https://doc.rust-lang.org/core/iter/trait.TrustedLen.html#when-shouldnt-an-adapter-be-trustedlen


  reply	other threads:[~2025-04-21  8:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-06 13:01 [PATCH v3] rust: alloc: implement `extend` for `Vec` Alexandre Courbot
2025-04-07 11:01 ` Danilo Krummrich
2025-04-08 13:34   ` Alexandre Courbot
2025-04-21  8:15     ` Alexandre Courbot [this message]
2025-04-22 17:03       ` Danilo Krummrich
2025-04-23  1:02         ` Alexandre Courbot
2025-04-23  8:51           ` Alice Ryhl
2025-04-23  9:40             ` Alexandre Courbot
2025-04-23 16:03               ` Boqun Feng
2025-04-24 11:50                 ` Alice Ryhl
2025-04-24 13:36                   ` Boqun Feng
2025-04-23  9:47           ` Danilo Krummrich
2025-04-23 13:15             ` Alexandre Courbot
  -- strict thread matches above, loose matches on Subject: below --
2025-04-07 16:33 Benno Lossin
2025-04-08 14:00 ` Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D9C61DDI99JX.31T59XPQGYBB1@nvidia.com \
    --to=acourbot@nvidia.com \
    --cc=a.hindborg@kernel.org \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=benno.lossin@proton.me \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=dakr@kernel.org \
    --cc=gary@garyguo.net \
    --cc=jhubbard@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=tmgross@umich.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox