From: Andreas Hindborg <a.hindborg@kernel.org>
To: "Alice Ryhl" <aliceryhl@google.com>
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Arnd Bergmann" <arnd@arndb.de>,
"Miguel Ojeda" <ojeda@kernel.org>,
"Boqun Feng" <boqun.feng@gmail.com>,
"Gary Guo" <gary@garyguo.net>,
"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
"Trevor Gross" <tmgross@umich.edu>,
"Danilo Krummrich" <dakr@kernel.org>,
"Matthew Maurer" <mmaurer@google.com>,
"Lee Jones" <lee@kernel.org>,
linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
"Benno Lossin" <lossin@kernel.org>
Subject: Re: [PATCH v2 1/4] rust: iov: add iov_iter abstractions for ITER_SOURCE
Date: Wed, 09 Jul 2025 13:56:37 +0200 [thread overview]
Message-ID: <87ecuplgqy.fsf@kernel.org> (raw)
In-Reply-To: <aG5NdqmUdvUHqUju@google.com> (Alice Ryhl's message of "Wed, 09 Jul 2025 11:07:34 +0000")
"Alice Ryhl" <aliceryhl@google.com> writes:
> On Tue, Jul 08, 2025 at 04:45:14PM +0200, Andreas Hindborg wrote:
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>> > +/// # Invariants
>> > +///
>> > +/// Must hold a valid `struct iov_iter` with `data_source` set to `ITER_SOURCE`. For the duration
>> > +/// of `'data`, it must be safe to read the data in this IO vector.
>>
>> In my opinion, the phrasing you had in v1 was better:
>>
>> The buffers referenced by the IO vector must be valid for reading for
>> the duration of `'data`.
>>
>> That is, I would prefer "must be valid for reading" over "it must be
>> safe to read ...".
>
> If it's backed by userspace data, then technically there aren't any
> buffers that are valid for reading in the usual sense. We need to call
> into special assembly to read it, and a normal pointer dereference would
> be illegal.
If you go with "safe to read" for this reason, I think you should expand
the statement along the lines you used here.
What is the special assembly that is used to read this data? From a
quick scan it looks like that if `CONFIG_UACCESS_MEMCPY` is enabled, a
regular `memcpy` call is used.
>
>> > + /// Returns the number of bytes available in this IO vector.
>> > + ///
>> > + /// Note that this may overestimate the number of bytes. For example, reading from userspace
>> > + /// memory could fail with `EFAULT`, which will be treated as the end of the IO vector.
>> > + #[inline]
>> > + pub fn len(&self) -> usize {
>> > + // SAFETY: It is safe to access the `count` field.
>>
>> Reiterating my comment from v1: Why?
>
> It's the same reason as why this is safe:
>
> struct HasLength {
> length: usize,
> }
> impl HasLength {
> fn len(&self) -> usize {
> // why is this safe?
> self.length
> }
> }
>
> I'm not sure how to say it concisely. I guess it's because all access to
> the iov_iter goes through the &IovIterSource.
So "By existence of a shared reference to `self`, `count` is valid for read."?
>
>> > + unsafe {
>> > + (*self.iov.get())
>> > + .__bindgen_anon_1
>> > + .__bindgen_anon_1
>> > + .as_ref()
>> > + .count
>> > + }
>> > + }
>> > +
>> > + /// Returns whether there are any bytes left in this IO vector.
>> > + ///
>> > + /// This may return `true` even if there are no more bytes available. For example, reading from
>> > + /// userspace memory could fail with `EFAULT`, which will be treated as the end of the IO vector.
>> > + #[inline]
>> > + pub fn is_empty(&self) -> bool {
>> > + self.len() == 0
>> > + }
>> > +
>> > + /// Advance this IO vector by `bytes` bytes.
>> > + ///
>> > + /// If `bytes` is larger than the size of this IO vector, it is advanced to the end.
>> > + #[inline]
>> > + pub fn advance(&mut self, bytes: usize) {
>> > + // SAFETY: `self.iov` is a valid IO vector.
>> > + unsafe { bindings::iov_iter_advance(self.as_raw(), bytes) };
>> > + }
>> > +
>> > + /// Advance this IO vector backwards by `bytes` bytes.
>> > + ///
>> > + /// # Safety
>> > + ///
>> > + /// The IO vector must not be reverted to before its beginning.
>> > + #[inline]
>> > + pub unsafe fn revert(&mut self, bytes: usize) {
>> > + // SAFETY: `self.iov` is a valid IO vector, and `bytes` is in bounds.
>> > + unsafe { bindings::iov_iter_revert(self.as_raw(), bytes) };
>> > + }
>> > +
>> > + /// Read data from this IO vector.
>> > + ///
>> > + /// Returns the number of bytes that have been copied.
>> > + #[inline]
>> > + pub fn copy_from_iter(&mut self, out: &mut [u8]) -> usize {
>> > + // SAFETY: We will not write uninitialized bytes to `out`.
>>
>> Can you provide something to back this claim?
>
> I guess the logic could go along these lines:
>
> * If the iov_iter reads from userspace, then it's because we always
> consider such reads to produce initialized data.
I don't think it is enough to just state that we consider the reads to
produce initialized data.
> * If the iov_iter reads from a kernel buffer, then the creator of the
> iov_iter must provide an initialized buffer.
>
> Ultimately, if we don't know that the bytes are initialized, then it's
> impossible to use the API correctly because you can never inspect the
> bytes in any way. I.e., any implementation of copy_from_iter that
> produces uninit data is necessarily buggy.
I would agree. How do we fix that? You are more knowledgeable than me in
this field, so you probably have a better shot than me, at finding a
solution.
As far as I can tell, we need to read from a place unknown to the rust
abstract machine, and we need to be able to have the abstract machine
consider the data initialized after the read.
Is this volatile memcpy [1], or would that only solve the data race
problem, not uninitialized data problem?
Best regards,
Andreas Hindborg
[1] https://lore.kernel.org/all/25e7e425-ae72-4370-ae95-958882a07df9@ralfj.de
next prev parent reply other threads:[~2025-07-09 11:56 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-04 9:25 [PATCH v2 0/4] Rust support for `struct iov_iter` Alice Ryhl
2025-07-04 9:26 ` [PATCH v2 1/4] rust: iov: add iov_iter abstractions for ITER_SOURCE Alice Ryhl
2025-07-08 14:45 ` Andreas Hindborg
2025-07-09 11:07 ` Alice Ryhl
2025-07-09 11:56 ` Andreas Hindborg [this message]
2025-07-09 12:35 ` Alice Ryhl
2025-07-09 17:05 ` Andreas Hindborg
2025-07-14 12:18 ` Alice Ryhl
2025-08-05 10:48 ` Andreas Hindborg
2025-07-04 9:26 ` [PATCH v2 2/4] rust: iov: add iov_iter abstractions for ITER_DEST Alice Ryhl
2025-07-08 14:47 ` Andreas Hindborg
2025-07-09 10:58 ` Alice Ryhl
2025-07-04 9:26 ` [PATCH v2 3/4] rust: miscdevice: Provide additional abstractions for iov_iter and kiocb structures Alice Ryhl
2025-07-08 14:51 ` Andreas Hindborg
2025-07-09 11:09 ` Alice Ryhl
2025-07-09 11:58 ` Andreas Hindborg
2025-07-08 14:53 ` Andreas Hindborg
2025-07-09 11:12 ` Alice Ryhl
2025-07-09 11:59 ` Andreas Hindborg
2025-07-04 9:26 ` [PATCH v2 4/4] samples: rust_misc_device: Expand the sample to support read()ing from userspace Alice Ryhl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ecuplgqy.fsf@kernel.org \
--to=a.hindborg@kernel.org \
--cc=aliceryhl@google.com \
--cc=arnd@arndb.de \
--cc=bjorn3_gh@protonmail.com \
--cc=boqun.feng@gmail.com \
--cc=dakr@kernel.org \
--cc=gary@garyguo.net \
--cc=gregkh@linuxfoundation.org \
--cc=lee@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lossin@kernel.org \
--cc=mmaurer@google.com \
--cc=ojeda@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=tmgross@umich.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).