rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trevor Gross <tmgross@umich.edu>
To: Alice Ryhl <aliceryhl@google.com>
Cc: "Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Wedson Almeida Filho" <wedsonaf@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <benno.lossin@proton.me>,
	"Andreas Hindborg" <a.hindborg@samsung.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Christian Brauner" <brauner@kernel.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Arve Hjønnevåg" <arve@android.com>,
	"Todd Kjos" <tkjos@android.com>,
	"Martijn Coenen" <maco@android.com>,
	"Joel Fernandes" <joel@joelfernandes.org>,
	"Carlos Llamas" <cmllamas@google.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Daniel Xu" <dxu@dxuuu.xyz>,
	linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	"Martin Rodriguez Reboredo" <yakoyoku@gmail.com>
Subject: Re: [PATCH v5 8/9] rust: file: add `DeferredFdCloser`
Date: Sat, 10 Feb 2024 01:47:36 -0600	[thread overview]
Message-ID: <CALNs47vrm+kh=xH=57TWPuxXb_HL+ws2GqOmMFA3O4xLGS6yoQ@mail.gmail.com> (raw)
In-Reply-To: <20240209-alice-file-v5-8-a37886783025@google.com>

On Fri, Feb 9, 2024 at 5:22 AM Alice Ryhl <aliceryhl@google.com> wrote:
>
> To close an fd from kernel space, we could call `ksys_close`. However,
> if we do this to an fd that is held using `fdget`, then we may trigger a
> use-after-free. Introduce a helper that can be used to close an fd even
> if the fd is currently held with `fdget`. This is done by grabbing an
> extra refcount to the file and dropping it in a task work once we return
> to userspace.
>
> This is necessary for Rust Binder because otherwise the user might try
> to have Binder close its fd for /dev/binder, which would cause problems
> as this happens inside an ioctl on /dev/binder, and ioctls hold the fd
> using `fdget`.
>
> Additional motivation can be found in commit 80cd795630d6 ("binder: fix
> use-after-free due to ksys_close() during fdget()") and in the comments
> on `binder_do_fd_close`.
>
> If there is some way to detect whether an fd is currently held with
> `fdget`, then this could be optimized to skip the allocation and task
> work when this is not the case. Another possible optimization would be
> to combine several fds into a single task work, since this is used with
> fd arrays that might hold several fds.
>
> That said, it might not be necessary to optimize it, because Rust Binder
> has two ways to send fds: BINDER_TYPE_FD and BINDER_TYPE_FDA. With
> BINDER_TYPE_FD, it is userspace's responsibility to close the fd, so
> this mechanism is used only by BINDER_TYPE_FDA, but fd arrays are used
> rarely these days.
>
> Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---

Reviewed-by: Trevor Gross <tmgross@umich.edu>

> +    /// Schedule a task work that closes the file descriptor when this task returns to userspace.
> +    ///
> +    /// Fails if this is called from a context where we cannot run work when returning to
> +    /// userspace. (E.g., from a kthread.)
> +    pub fn close_fd(self, fd: u32) -> Result<(), DeferredFdCloseError> {
> +        use bindings::task_work_notify_mode_TWA_RESUME as TWA_RESUME;
> +
> +        // In this method, we schedule the task work before closing the file. This is because
> +        // scheduling a task work is fallible, and we need to know whether it will fail before we
> +        // attempt to close the file.
> +
> +        // Task works are not available on kthreads.
> +        let current = crate::current!();
> +        if current.is_kthread() {
> +            return Err(DeferredFdCloseError::TaskWorkUnavailable);
> +        }
> +
> +        // Transfer ownership of the box's allocation to a raw pointer. This disables the
> +        // destructor, so we must manually convert it back to a Box to drop it.
> +        //
> +        // Until we convert it back to a `Box`, there are no aliasing requirements on this
> +        // pointer.
> +        let inner = Box::into_raw(self.inner);
> +
> +        // The `callback_head` field is first in the struct, so this cast correctly gives us a
> +        // pointer to the field.
> +        let callback_head = inner.cast::<bindings::callback_head>();
> +        // SAFETY: This pointer offset operation does not go out-of-bounds.
> +        let file_field = unsafe { core::ptr::addr_of_mut!((*inner).file) };
> +
> +        let current = current.as_raw();
> +
> +        // SAFETY: This function currently has exclusive access to the `DeferredFdCloserInner`, so
> +        // it is okay for us to perform unsynchronized writes to its `callback_head` field.
> +        unsafe { bindings::init_task_work(callback_head, Some(Self::do_close_fd)) };
> +
> +        // SAFETY: This inserts the `DeferredFdCloserInner` into the task workqueue for the current
> +        // task. If this operation is successful, then this transfers exclusive ownership of the
> +        // `callback_head` field to the C side until it calls `do_close_fd`, and we don't touch or
> +        // invalidate the field during that time.
> +        //
> +        // When the C side calls `do_close_fd`, the safety requirements of that method are
> +        // satisfied because when a task work is executed, the callback is given ownership of the
> +        // pointer.
> +        //
> +        // The file pointer is currently null. If it is changed to be non-null before `do_close_fd`
> +        // is called, then that change happens due to the write at the end of this function, and
> +        // that write has a safety comment that explains why the refcount can be dropped when
> +        // `do_close_fd` runs.
> +        let res = unsafe { bindings::task_work_add(current, callback_head, TWA_RESUME) };
> +
> +        if res != 0 {
> +            // SAFETY: Scheduling the task work failed, so we still have ownership of the box, so
> +            // we may destroy it.
> +            unsafe { drop(Box::from_raw(inner)) };
> +
> +            return Err(DeferredFdCloseError::TaskWorkUnavailable);
> +        }
> +
> +        // This removes the fd from the fd table in `current`. The file is not fully closed until
> +        // `filp_close` is called. We are given ownership of one refcount to the file.
> +        //
> +        // SAFETY: This is safe no matter what `fd` is. If the `fd` is valid (that is, if the
> +        // pointer is non-null), then we call `filp_close` on the returned pointer as required by
> +        // `file_close_fd`.
> +        let file = unsafe { bindings::file_close_fd(fd) };
> +        if file.is_null() {
> +            // We don't clean up the task work since that might be expensive if the task work queue
> +            // is long. Just let it execute and let it clean up for itself.
> +            return Err(DeferredFdCloseError::BadFd);
> +        }
> +
> +        // Acquire a second refcount to the file.
> +        //
> +        // SAFETY: The `file` pointer points at a file with a non-zero refcount.
> +        unsafe { bindings::get_file(file) };
> +
> +        // This method closes the fd, consuming one of our two refcounts. There could be active
> +        // light refcounts created from that fd, so we must ensure that the file has a positive
> +        // refcount for the duration of those active light refcounts. We do that by holding on to
> +        // the second refcount until the current task returns to userspace.
> +        //
> +        // SAFETY: The `file` pointer is valid. Passing `current->files` as the file table to close
> +        // it in is correct, since we just got the `fd` from `file_close_fd` which also uses
> +        // `current->files`.
> +        //
> +        // Note: fl_owner_t is currently a void pointer.
> +        unsafe { bindings::filp_close(file, (*current).files as bindings::fl_owner_t) };
> +
> +        // We update the file pointer that the task work is supposed to fput. This transfers
> +        // ownership of our last refcount.
> +        //
> +        // INVARIANT: This changes the `file` field of a `DeferredFdCloserInner` from null to
> +        // non-null. This doesn't break the type invariant for `DeferredFdCloserInner` because we
> +        // still own a refcount to the file, so we can pass ownership of that refcount to the
> +        // `DeferredFdCloserInner`.
> +        //
> +        // When `do_close_fd` runs, it must be safe for it to `fput` the refcount. However, this is
> +        // the case because all light refcounts that are associated with the fd we closed
> +        // previously must be dropped when `do_close_fd`, since light refcounts must be dropped
> +        // before returning to userspace.
> +        //
> +        // SAFETY: Task works are executed on the current thread right before we return to
> +        // userspace, so this write is guaranteed to happen before `do_close_fd` is called, which
> +        // means that a race is not possible here.
> +        unsafe { *file_field = file };
> +
> +        Ok(())
> +    }

This documentation is quite a work of art :)

  reply	other threads:[~2024-02-10  7:47 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-09 11:18 [PATCH v5 0/9] File abstractions needed by Rust Binder Alice Ryhl
2024-02-09 11:18 ` [PATCH v5 1/9] rust: types: add `NotThreadSafe` Alice Ryhl
2024-02-09 11:49   ` Alice Ryhl
2024-02-09 11:18 ` [PATCH v5 2/9] rust: task: add `Task::current_raw` Alice Ryhl
2024-02-09 11:18 ` [PATCH v5 3/9] rust: file: add Rust abstraction for `struct file` Alice Ryhl
2024-03-20 15:20   ` Christian Brauner
2024-03-20 18:09     ` Alice Ryhl
2024-03-31 12:55       ` Christian Brauner
2024-04-01 12:09         ` Alice Ryhl
2024-04-01 15:26           ` Christian Brauner
2024-04-02  9:39             ` Alice Ryhl
2024-04-03  6:01               ` Christian Brauner
2024-04-03 11:33                 ` Alice Ryhl
2024-04-08  7:45                 ` Alice Ryhl
2024-04-17  8:22                   ` Alice Ryhl
2024-02-09 11:18 ` [PATCH v5 4/9] rust: cred: add Rust abstraction for `struct cred` Alice Ryhl
2024-02-09 11:18 ` [PATCH v5 5/9] rust: security: add abstraction for secctx Alice Ryhl
2024-02-09 11:18 ` [PATCH v5 6/9] rust: file: add `FileDescriptorReservation` Alice Ryhl
2024-02-10  7:41   ` Trevor Gross
2024-03-20 15:33   ` Christian Brauner
2024-02-09 11:18 ` [PATCH v5 7/9] rust: file: add `Kuid` wrapper Alice Ryhl
2024-02-10  7:43   ` Trevor Gross
2024-02-12 10:04     ` Alice Ryhl
2024-02-12 16:00       ` Boqun Feng
2024-02-09 11:18 ` [PATCH v5 8/9] rust: file: add `DeferredFdCloser` Alice Ryhl
2024-02-10  7:47   ` Trevor Gross [this message]
2024-03-20 14:21   ` Christian Brauner
2024-03-21 13:28     ` Alice Ryhl
2024-03-31 10:26       ` Christian Brauner
2024-02-09 11:18 ` [PATCH v5 9/9] rust: file: add abstraction for `poll_table` Alice Ryhl
2024-02-10  7:52   ` Trevor Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALNs47vrm+kh=xH=57TWPuxXb_HL+ws2GqOmMFA3O4xLGS6yoQ@mail.gmail.com' \
    --to=tmgross@umich.edu \
    --cc=a.hindborg@samsung.com \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=arve@android.com \
    --cc=benno.lossin@proton.me \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=brauner@kernel.org \
    --cc=cmllamas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dxu@dxuuu.xyz \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maco@android.com \
    --cc=ojeda@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=tkjos@android.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wedsonaf@gmail.com \
    --cc=willy@infradead.org \
    --cc=yakoyoku@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).