rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Hindborg <a.hindborg@kernel.org>
To: "Alice Ryhl" <aliceryhl@google.com>
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Matthew Maurer" <mmaurer@google.com>,
	"Lee Jones" <lee@kernel.org>,
	linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	"Benno Lossin" <lossin@kernel.org>
Subject: Re: [PATCH v2 1/4] rust: iov: add iov_iter abstractions for ITER_SOURCE
Date: Wed, 09 Jul 2025 13:56:37 +0200	[thread overview]
Message-ID: <87ecuplgqy.fsf@kernel.org> (raw)
In-Reply-To: <aG5NdqmUdvUHqUju@google.com> (Alice Ryhl's message of "Wed, 09 Jul 2025 11:07:34 +0000")

"Alice Ryhl" <aliceryhl@google.com> writes:

> On Tue, Jul 08, 2025 at 04:45:14PM +0200, Andreas Hindborg wrote:
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>> > +/// # Invariants
>> > +///
>> > +/// Must hold a valid `struct iov_iter` with `data_source` set to `ITER_SOURCE`. For the duration
>> > +/// of `'data`, it must be safe to read the data in this IO vector.
>>
>> In my opinion, the phrasing you had in v1 was better:
>>
>>   The buffers referenced by the IO vector must be valid for reading for
>>   the duration of `'data`.
>>
>> That is, I would prefer "must be valid for reading" over "it must be
>> safe to read ...".
>
> If it's backed by userspace data, then technically there aren't any
> buffers that are valid for reading in the usual sense. We need to call
> into special assembly to read it, and a normal pointer dereference would
> be illegal.

If you go with "safe to read" for this reason, I think you should expand
the statement along the lines you used here.

What is the special assembly that is used to read this data? From a
quick scan it looks like that if `CONFIG_UACCESS_MEMCPY` is enabled, a
regular `memcpy` call is used.

>
>> > +    /// Returns the number of bytes available in this IO vector.
>> > +    ///
>> > +    /// Note that this may overestimate the number of bytes. For example, reading from userspace
>> > +    /// memory could fail with `EFAULT`, which will be treated as the end of the IO vector.
>> > +    #[inline]
>> > +    pub fn len(&self) -> usize {
>> > +        // SAFETY: It is safe to access the `count` field.
>>
>> Reiterating my comment from v1: Why?
>
> It's the same reason as why this is safe:
>
> struct HasLength {
>     length: usize,
> }
> impl HasLength {
>     fn len(&self) -> usize {
>         // why is this safe?
>         self.length
>     }
> }
>
> I'm not sure how to say it concisely. I guess it's because all access to
> the iov_iter goes through the &IovIterSource.

So "By existence of a shared reference to `self`, `count` is valid for read."?

>
>> > +        unsafe {
>> > +            (*self.iov.get())
>> > +                .__bindgen_anon_1
>> > +                .__bindgen_anon_1
>> > +                .as_ref()
>> > +                .count
>> > +        }
>> > +    }
>> > +
>> > +    /// Returns whether there are any bytes left in this IO vector.
>> > +    ///
>> > +    /// This may return `true` even if there are no more bytes available. For example, reading from
>> > +    /// userspace memory could fail with `EFAULT`, which will be treated as the end of the IO vector.
>> > +    #[inline]
>> > +    pub fn is_empty(&self) -> bool {
>> > +        self.len() == 0
>> > +    }
>> > +
>> > +    /// Advance this IO vector by `bytes` bytes.
>> > +    ///
>> > +    /// If `bytes` is larger than the size of this IO vector, it is advanced to the end.
>> > +    #[inline]
>> > +    pub fn advance(&mut self, bytes: usize) {
>> > +        // SAFETY: `self.iov` is a valid IO vector.
>> > +        unsafe { bindings::iov_iter_advance(self.as_raw(), bytes) };
>> > +    }
>> > +
>> > +    /// Advance this IO vector backwards by `bytes` bytes.
>> > +    ///
>> > +    /// # Safety
>> > +    ///
>> > +    /// The IO vector must not be reverted to before its beginning.
>> > +    #[inline]
>> > +    pub unsafe fn revert(&mut self, bytes: usize) {
>> > +        // SAFETY: `self.iov` is a valid IO vector, and `bytes` is in bounds.
>> > +        unsafe { bindings::iov_iter_revert(self.as_raw(), bytes) };
>> > +    }
>> > +
>> > +    /// Read data from this IO vector.
>> > +    ///
>> > +    /// Returns the number of bytes that have been copied.
>> > +    #[inline]
>> > +    pub fn copy_from_iter(&mut self, out: &mut [u8]) -> usize {
>> > +        // SAFETY: We will not write uninitialized bytes to `out`.
>>
>> Can you provide something to back this claim?
>
> I guess the logic could go along these lines:
>
> * If the iov_iter reads from userspace, then it's because we always
>   consider such reads to produce initialized data.

I don't think it is enough to just state that we consider the reads to
produce initialized data.

> * If the iov_iter reads from a kernel buffer, then the creator of the
>   iov_iter must provide an initialized buffer.
>
> Ultimately, if we don't know that the bytes are initialized, then it's
> impossible to use the API correctly because you can never inspect the
> bytes in any way. I.e., any implementation of copy_from_iter that
> produces uninit data is necessarily buggy.

I would agree. How do we fix that? You are more knowledgeable than me in
this field, so you probably have a better shot than me, at finding a
solution.

As far as I can tell, we need to read from a place unknown to the rust
abstract machine, and we need to be able to have the abstract machine
consider the data initialized after the read.

Is this volatile memcpy [1], or would that only solve the data race
problem, not uninitialized data problem?


Best regards,
Andreas Hindborg

[1] https://lore.kernel.org/all/25e7e425-ae72-4370-ae95-958882a07df9@ralfj.de


  reply	other threads:[~2025-07-09 11:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-04  9:25 [PATCH v2 0/4] Rust support for `struct iov_iter` Alice Ryhl
2025-07-04  9:26 ` [PATCH v2 1/4] rust: iov: add iov_iter abstractions for ITER_SOURCE Alice Ryhl
2025-07-08 14:45   ` Andreas Hindborg
2025-07-09 11:07     ` Alice Ryhl
2025-07-09 11:56       ` Andreas Hindborg [this message]
2025-07-09 12:35         ` Alice Ryhl
2025-07-09 17:05           ` Andreas Hindborg
2025-07-14 12:18             ` Alice Ryhl
2025-08-05 10:48               ` Andreas Hindborg
2025-07-04  9:26 ` [PATCH v2 2/4] rust: iov: add iov_iter abstractions for ITER_DEST Alice Ryhl
2025-07-08 14:47   ` Andreas Hindborg
2025-07-09 10:58     ` Alice Ryhl
2025-07-04  9:26 ` [PATCH v2 3/4] rust: miscdevice: Provide additional abstractions for iov_iter and kiocb structures Alice Ryhl
2025-07-08 14:51   ` Andreas Hindborg
2025-07-09 11:09     ` Alice Ryhl
2025-07-09 11:58       ` Andreas Hindborg
2025-07-08 14:53   ` Andreas Hindborg
2025-07-09 11:12     ` Alice Ryhl
2025-07-09 11:59       ` Andreas Hindborg
2025-07-04  9:26 ` [PATCH v2 4/4] samples: rust_misc_device: Expand the sample to support read()ing from userspace Alice Ryhl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ecuplgqy.fsf@kernel.org \
    --to=a.hindborg@kernel.org \
    --cc=aliceryhl@google.com \
    --cc=arnd@arndb.de \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=dakr@kernel.org \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=lee@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=mmaurer@google.com \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=tmgross@umich.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).