* [PATCH v10 0/8] File abstractions needed by Rust Binder
@ 2024-09-15 14:31 Alice Ryhl
2024-09-15 14:31 ` [PATCH v10 1/8] rust: types: add `NotThreadSafe` Alice Ryhl
` (8 more replies)
0 siblings, 9 replies; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 14:31 UTC (permalink / raw)
To: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner
Cc: Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Alice Ryhl,
Kees Cook
This patchset contains the file abstractions needed by the Rust
implementation of the Binder driver.
Please see the Rust Binder RFC for usage examples:
https://lore.kernel.org/rust-for-linux/20231101-rust-binder-v1-0-08ba9197f637@google.com/
Users of "rust: types: add `NotThreadSafe`":
[PATCH 5/9] rust: file: add `FileDescriptorReservation`
Users of "rust: task: add `Task::current_raw`":
[PATCH 7/9] rust: file: add `Kuid` wrapper
[PATCH 8/9] rust: file: add `DeferredFdCloser`
Users of "rust: file: add Rust abstraction for `struct file`":
[PATCH RFC 02/20] rust_binder: add binderfs support to Rust binder
[PATCH RFC 03/20] rust_binder: add threading support
Users of "rust: cred: add Rust abstraction for `struct cred`":
[PATCH RFC 05/20] rust_binder: add nodes and context managers
[PATCH RFC 06/20] rust_binder: add oneway transactions
[PATCH RFC 11/20] rust_binder: send nodes in transaction
[PATCH RFC 13/20] rust_binder: add BINDER_TYPE_FD support
Users of "rust: security: add abstraction for secctx":
[PATCH RFC 06/20] rust_binder: add oneway transactions
Users of "rust: file: add `FileDescriptorReservation`":
[PATCH RFC 13/20] rust_binder: add BINDER_TYPE_FD support
[PATCH RFC 14/20] rust_binder: add BINDER_TYPE_FDA support
Users of "rust: file: add `Kuid` wrapper":
[PATCH RFC 05/20] rust_binder: add nodes and context managers
[PATCH RFC 06/20] rust_binder: add oneway transactions
Users of "rust: file: add abstraction for `poll_table`":
[PATCH RFC 07/20] rust_binder: add epoll support
This patchset has some uses of read_volatile in place of READ_ONCE.
Please see the following rfc for context on this:
https://lore.kernel.org/all/20231025195339.1431894-1-boqun.feng@gmail.com/
LSM: Please see patches 4 and 5 that add Rust abstractions for cred and
secctx. I did not CC the LSM list on earlier versions of this patchset,
sorry about that.
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
Changes in v10:
- Rebase series on top of stuff in for 6.12.
- This required changes in rust/helpers.
- Add more info to commit messages of cred/secctx patches.
- Link to v9: https://lore.kernel.org/r/20240808-alice-file-v9-0-2cb7b934e0e1@google.com
Changes in v9:
- Rebase on top of v6.11-rc2.
- Reorder things in file.rs
- Fix minor typo in file.rs
- Add Reviewed-bys.
- Link to v8: https://lore.kernel.org/r/20240725-alice-file-v8-0-55a2e80deaa8@google.com
Changes in v8:
- Rename File::from_ptr to File::from_raw_file.
- Mention that NotThreadSafe also affects Sync.
- Fix copyright lines.
- Move rust/kernel/file.rs to rust/kernel/fs/file.rs to reduce conflicts
with Wedson's vfs patches.
- Link to v7: https://lore.kernel.org/r/20240628-alice-file-v7-0-4d701f6335f3@google.com
Changes in v7:
- Replace file sharing modes with File / LocalFile.
- Link to v6: https://lore.kernel.org/r/20240517-alice-file-v6-0-b25bafdc9b97@google.com
Changes in v6:
- Introduce file sharing modes.
- Rewrite most documentation for `struct file` wrapper.
- Drop `DeferredFdCloser`. It will be sent later when it can be placed
somewhere where only Rust Binder can use it.
- Rebase on top of rust-next: 97ab3e8eec0c ("rust: alloc: fix dangling pointer in VecExt<T>::reserve()")
- Link to v5: https://lore.kernel.org/r/20240209-alice-file-v5-0-a37886783025@google.com
Changes in v5:
- Pass a null pointer to task_tgid_nr_ns.
- Fix some typos and other formatting issues.
- Add Reviewed-by where appropriate.
- Link to v4: https://lore.kernel.org/r/20240202-alice-file-v4-0-fc9c2080663b@google.com
Changes in v4:
- Moved the two really simple patches to the beginning of the patchset.
- Update Send safety comments.
- Use srctree relative links.
- Mention that `Credential::euid` is immutable.
- Update some safety comments to mention the invariant on Self.
- Use new name for close_fd_get_file.
- Move safety comments on DeferredFdCloser around and be more explicit
about how many refcounts we own.
- Reword safety comments related to _qproc.
- Add Reviewed-by where appropriate.
- Link to v3: https://lore.kernel.org/r/20240118-alice-file-v3-0-9694b6f9580c@google.com
Changes in v3:
- Completely rewrite comments about refcounting in the first patch.
- And add a note to the documentation in fs/file.c.
- Discuss speculation gadgets in commit message for the Kuid wrapper.
- Introduce NotThreadSafe and Task::current_raw patches and use them in
later patches.
- Improve safety comments in DeferredFdCloser.
- Some other minor changes.
- Link to v2: https://lore.kernel.org/r/20231206-alice-file-v2-0-af617c0d9d94@google.com
Changes in v2:
- Update various docs and safety comments.
- Rename method names to match the C name.
- Use ordinary read instead of READ_ONCE in File::cred.
- Changed null check in secctx.
- Add type alias for PhantomData in FileDescriptorReservation.
- Use Kuid::from_raw in Kuid::current_euid.
- Make DeferredFdCloser fallible if it is unable to schedule a task
work. And also schedule the task work *before* closing the file.
- Moved PollCondVar to rust/kernel/sync.
- Updated PollCondVar to use wake_up_pollfree.
- Link to v1: https://lore.kernel.org/all/20231129-alice-file-v1-0-f81afe8c7261@google.com/
Link to RFC:
https://lore.kernel.org/all/20230720152820.3566078-1-aliceryhl@google.com/
---
Alice Ryhl (5):
rust: types: add `NotThreadSafe`
rust: task: add `Task::current_raw`
rust: security: add abstraction for secctx
rust: file: add `Kuid` wrapper
rust: file: add abstraction for `poll_table`
Wedson Almeida Filho (3):
rust: file: add Rust abstraction for `struct file`
rust: cred: add Rust abstraction for `struct cred`
rust: file: add `FileDescriptorReservation`
fs/file.c | 7 +
rust/bindings/bindings_helper.h | 6 +
rust/helpers/cred.c | 13 ++
rust/helpers/fs.c | 12 ++
rust/helpers/helpers.c | 3 +
rust/helpers/security.c | 20 ++
rust/helpers/task.c | 38 ++++
rust/kernel/cred.rs | 85 ++++++++
rust/kernel/fs.rs | 8 +
rust/kernel/fs/file.rs | 461 ++++++++++++++++++++++++++++++++++++++++
rust/kernel/lib.rs | 3 +
rust/kernel/security.rs | 74 +++++++
rust/kernel/sync.rs | 1 +
rust/kernel/sync/lock.rs | 13 +-
rust/kernel/sync/poll.rs | 121 +++++++++++
rust/kernel/task.rs | 91 +++++++-
rust/kernel/types.rs | 21 ++
17 files changed, 965 insertions(+), 12 deletions(-)
---
base-commit: d077242d68a31075ef5f5da041bf8f6fc19aa231
change-id: 20231123-alice-file-525b98e8a724
Best regards,
--
Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v10 1/8] rust: types: add `NotThreadSafe`
2024-09-15 14:31 [PATCH v10 0/8] File abstractions needed by Rust Binder Alice Ryhl
@ 2024-09-15 14:31 ` Alice Ryhl
2024-09-15 15:38 ` Gary Guo
2024-09-24 19:45 ` Serge E. Hallyn
2024-09-15 14:31 ` [PATCH v10 2/8] rust: task: add `Task::current_raw` Alice Ryhl
` (7 subsequent siblings)
8 siblings, 2 replies; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 14:31 UTC (permalink / raw)
To: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner
Cc: Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Alice Ryhl,
Kees Cook
This introduces a new marker type for types that shouldn't be thread
safe. By adding a field of this type to a struct, it becomes non-Send
and non-Sync, which means that it cannot be accessed in any way from
threads other than the one it was created on.
This is useful for APIs that require globals such as `current` to remain
constant while the value exists.
We update two existing users in the Kernel to use this helper:
* `Task::current()` - moving the return type of this value to a
different thread would not be safe as you can no longer be guaranteed
that the `current` pointer remains valid.
* Lock guards. Mutexes and spinlocks should be unlocked on the same
thread as where they were locked, so we enforce this using the Send
trait.
There are also additional users in later patches of this patchset. See
[1] and [2] for the discussion that led to the introduction of this
patch.
Link: https://lore.kernel.org/all/nFDPJFnzE9Q5cqY7FwSMByRH2OAn_BpI4H53NQfWIlN6I2qfmAqnkp2wRqn0XjMO65OyZY4h6P4K2nAGKJpAOSzksYXaiAK_FoH_8QbgBI4=@proton.me/ [1]
Link: https://lore.kernel.org/all/nFDPJFnzE9Q5cqY7FwSMByRH2OAn_BpI4H53NQfWIlN6I2qfmAqnkp2wRqn0XjMO65OyZY4h6P4K2nAGKJpAOSzksYXaiAK_FoH_8QbgBI4=@proton.me/ [2]
Suggested-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Björn Roy Baron <bjorn3_gh@protonmail.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/kernel/sync/lock.rs | 13 +++++++++----
rust/kernel/task.rs | 10 ++++++----
rust/kernel/types.rs | 21 +++++++++++++++++++++
3 files changed, 36 insertions(+), 8 deletions(-)
diff --git a/rust/kernel/sync/lock.rs b/rust/kernel/sync/lock.rs
index f6c34ca4d819..d6e9bab114b8 100644
--- a/rust/kernel/sync/lock.rs
+++ b/rust/kernel/sync/lock.rs
@@ -6,8 +6,13 @@
//! spinlocks, raw spinlocks) to be provided with minimal effort.
use super::LockClassKey;
-use crate::{init::PinInit, pin_init, str::CStr, types::Opaque, types::ScopeGuard};
-use core::{cell::UnsafeCell, marker::PhantomData, marker::PhantomPinned};
+use crate::{
+ init::PinInit,
+ pin_init,
+ str::CStr,
+ types::{NotThreadSafe, Opaque, ScopeGuard},
+};
+use core::{cell::UnsafeCell, marker::PhantomPinned};
use macros::pin_data;
pub mod mutex;
@@ -139,7 +144,7 @@ pub fn lock(&self) -> Guard<'_, T, B> {
pub struct Guard<'a, T: ?Sized, B: Backend> {
pub(crate) lock: &'a Lock<T, B>,
pub(crate) state: B::GuardState,
- _not_send: PhantomData<*mut ()>,
+ _not_send: NotThreadSafe,
}
// SAFETY: `Guard` is sync when the data protected by the lock is also sync.
@@ -191,7 +196,7 @@ pub(crate) unsafe fn new(lock: &'a Lock<T, B>, state: B::GuardState) -> Self {
Self {
lock,
state,
- _not_send: PhantomData,
+ _not_send: NotThreadSafe,
}
}
}
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 55dff7e088bf..278c623de0c6 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -4,10 +4,12 @@
//!
//! C header: [`include/linux/sched.h`](srctree/include/linux/sched.h).
-use crate::types::Opaque;
+use crate::{
+ bindings,
+ types::{NotThreadSafe, Opaque},
+};
use core::{
ffi::{c_int, c_long, c_uint},
- marker::PhantomData,
ops::Deref,
ptr,
};
@@ -106,7 +108,7 @@ impl Task {
pub unsafe fn current() -> impl Deref<Target = Task> {
struct TaskRef<'a> {
task: &'a Task,
- _not_send: PhantomData<*mut ()>,
+ _not_send: NotThreadSafe,
}
impl Deref for TaskRef<'_> {
@@ -125,7 +127,7 @@ fn deref(&self) -> &Self::Target {
// that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
// (where it could potentially outlive the caller).
task: unsafe { &*ptr.cast() },
- _not_send: PhantomData,
+ _not_send: NotThreadSafe,
}
}
diff --git a/rust/kernel/types.rs b/rust/kernel/types.rs
index 9e7ca066355c..3238ffaab031 100644
--- a/rust/kernel/types.rs
+++ b/rust/kernel/types.rs
@@ -532,3 +532,24 @@ unsafe impl AsBytes for str {}
// does not have any uninitialized portions either.
unsafe impl<T: AsBytes> AsBytes for [T] {}
unsafe impl<T: AsBytes, const N: usize> AsBytes for [T; N] {}
+
+/// Zero-sized type to mark types not [`Send`].
+///
+/// Add this type as a field to your struct if your type should not be sent to a different task.
+/// Since [`Send`] is an auto trait, adding a single field that is `!Send` will ensure that the
+/// whole type is `!Send`.
+///
+/// If a type is `!Send` it is impossible to give control over an instance of the type to another
+/// task. This is useful to include in types that store or reference task-local information. A file
+/// descriptor is an example of such task-local information.
+///
+/// This type also makes the type `!Sync`, which prevents immutable access to the value from
+/// several threads in parallel.
+pub type NotThreadSafe = PhantomData<*mut ()>;
+
+/// Used to construct instances of type [`NotThreadSafe`] similar to how `PhantomData` is
+/// constructed.
+///
+/// [`NotThreadSafe`]: type@NotThreadSafe
+#[allow(non_upper_case_globals)]
+pub const NotThreadSafe: NotThreadSafe = PhantomData;
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v10 2/8] rust: task: add `Task::current_raw`
2024-09-15 14:31 [PATCH v10 0/8] File abstractions needed by Rust Binder Alice Ryhl
2024-09-15 14:31 ` [PATCH v10 1/8] rust: types: add `NotThreadSafe` Alice Ryhl
@ 2024-09-15 14:31 ` Alice Ryhl
2024-09-15 14:31 ` [PATCH v10 3/8] rust: file: add Rust abstraction for `struct file` Alice Ryhl
` (6 subsequent siblings)
8 siblings, 0 replies; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 14:31 UTC (permalink / raw)
To: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner
Cc: Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Alice Ryhl,
Kees Cook
Introduces a safe function for getting a raw pointer to the current
task.
When writing bindings that need to access the current task, it is often
more convenient to call a method that directly returns a raw pointer
than to use the existing `Task::current` method. However, the only way
to do that is `bindings::get_current()` which is unsafe since it calls
into C. By introducing `Task::current_raw()`, it becomes possible to
obtain a pointer to the current task without using unsafe.
Link: https://lore.kernel.org/all/CAH5fLgjT48X-zYtidv31mox3C4_Ogoo_2cBOCmX0Ang3tAgGHA@mail.gmail.com/
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/kernel/task.rs | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 278c623de0c6..367b4bbddd9f 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -97,6 +97,15 @@ unsafe impl Sync for Task {}
type Pid = bindings::pid_t;
impl Task {
+ /// Returns a raw pointer to the current task.
+ ///
+ /// It is up to the user to use the pointer correctly.
+ #[inline]
+ pub fn current_raw() -> *mut bindings::task_struct {
+ // SAFETY: Getting the current pointer is always safe.
+ unsafe { bindings::get_current() }
+ }
+
/// Returns a task reference for the currently executing task/thread.
///
/// The recommended way to get the current task/thread is to use the
@@ -119,14 +128,12 @@ fn deref(&self) -> &Self::Target {
}
}
- // SAFETY: Just an FFI call with no additional safety requirements.
- let ptr = unsafe { bindings::get_current() };
-
+ let current = Task::current_raw();
TaskRef {
// SAFETY: If the current thread is still running, the current task is valid. Given
// that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
// (where it could potentially outlive the caller).
- task: unsafe { &*ptr.cast() },
+ task: unsafe { &*current.cast() },
_not_send: NotThreadSafe,
}
}
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v10 3/8] rust: file: add Rust abstraction for `struct file`
2024-09-15 14:31 [PATCH v10 0/8] File abstractions needed by Rust Binder Alice Ryhl
2024-09-15 14:31 ` [PATCH v10 1/8] rust: types: add `NotThreadSafe` Alice Ryhl
2024-09-15 14:31 ` [PATCH v10 2/8] rust: task: add `Task::current_raw` Alice Ryhl
@ 2024-09-15 14:31 ` Alice Ryhl
2024-09-15 21:51 ` Gary Guo
2024-09-15 14:31 ` [PATCH v10 4/8] rust: cred: add Rust abstraction for `struct cred` Alice Ryhl
` (5 subsequent siblings)
8 siblings, 1 reply; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 14:31 UTC (permalink / raw)
To: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner
Cc: Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Alice Ryhl,
Kees Cook
From: Wedson Almeida Filho <wedsonaf@gmail.com>
This abstraction makes it possible to manipulate the open files for a
process. The new `File` struct wraps the C `struct file`. When accessing
it using the smart pointer `ARef<File>`, the pointer will own a
reference count to the file. When accessing it as `&File`, then the
reference does not own a refcount, but the borrow checker will ensure
that the reference count does not hit zero while the `&File` is live.
Since this is intended to manipulate the open files of a process, we
introduce an `fget` constructor that corresponds to the C `fget`
method. In future patches, it will become possible to create a new fd in
a process and bind it to a `File`. Rust Binder will use these to send
fds from one process to another.
We also provide a method for accessing the file's flags. Rust Binder
will use this to access the flags of the Binder fd to check whether the
non-blocking flag is set, which affects what the Binder ioctl does.
This introduces a struct for the EBADF error type, rather than just
using the Error type directly. This has two advantages:
* `File::fget` returns a `Result<ARef<File>, BadFdError>`, which the
compiler will represent as a single pointer, with null being an error.
This is possible because the compiler understands that `BadFdError`
has only one possible value, and it also understands that the
`ARef<File>` smart pointer is guaranteed non-null.
* Additionally, we promise to users of the method that the method can
only fail with EBADF, which means that they can rely on this promise
without having to inspect its implementation.
That said, there are also two disadvantages:
* Defining additional error types involves boilerplate.
* The question mark operator will only utilize the `From` trait once,
which prevents you from using the question mark operator on
`BadFdError` in methods that return some third error type that the
kernel `Error` is convertible into. (However, it works fine in methods
that return `Error`.)
Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Co-developed-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Co-developed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
fs/file.c | 7 +
rust/bindings/bindings_helper.h | 2 +
rust/helpers/fs.c | 12 ++
rust/helpers/helpers.c | 1 +
rust/kernel/fs.rs | 8 +
rust/kernel/fs/file.rs | 375 ++++++++++++++++++++++++++++++++++++++++
rust/kernel/lib.rs | 1 +
7 files changed, 406 insertions(+)
diff --git a/fs/file.c b/fs/file.c
index 655338effe9c..fc14209cf3e9 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1123,6 +1123,13 @@ EXPORT_SYMBOL(task_lookup_next_fdget_rcu);
*
* The fput_needed flag returned by fget_light should be passed to the
* corresponding fput_light.
+ *
+ * (As an exception to rule 2, you can call filp_close between fget_light and
+ * fput_light provided that you capture a real refcount with get_file before
+ * the call to filp_close, and ensure that this real refcount is fput *after*
+ * the fput_light call.)
+ *
+ * See also the documentation in rust/kernel/file.rs.
*/
static unsigned long __fget_light(unsigned int fd, fmode_t mask)
{
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index ae82e9c941af..4a400a954979 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -12,7 +12,9 @@
#include <linux/blkdev.h>
#include <linux/errname.h>
#include <linux/ethtool.h>
+#include <linux/file.h>
#include <linux/firmware.h>
+#include <linux/fs.h>
#include <linux/jiffies.h>
#include <linux/mdio.h>
#include <linux/phy.h>
diff --git a/rust/helpers/fs.c b/rust/helpers/fs.c
new file mode 100644
index 000000000000..a75c96763372
--- /dev/null
+++ b/rust/helpers/fs.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (C) 2024 Google LLC.
+ */
+
+#include <linux/fs.h>
+
+struct file *rust_helper_get_file(struct file *f)
+{
+ return get_file(f);
+}
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 30f40149f3a9..3f2d0d0c8017 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -12,6 +12,7 @@
#include "build_assert.c"
#include "build_bug.c"
#include "err.c"
+#include "fs.c"
#include "kunit.c"
#include "mutex.c"
#include "page.c"
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
new file mode 100644
index 000000000000..0121b38c59e6
--- /dev/null
+++ b/rust/kernel/fs.rs
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Kernel file systems.
+//!
+//! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h)
+
+pub mod file;
+pub use self::file::{File, LocalFile};
diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
new file mode 100644
index 000000000000..6adb7a7199ec
--- /dev/null
+++ b/rust/kernel/fs/file.rs
@@ -0,0 +1,375 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Files and file descriptors.
+//!
+//! C headers: [`include/linux/fs.h`](srctree/include/linux/fs.h) and
+//! [`include/linux/file.h`](srctree/include/linux/file.h)
+
+use crate::{
+ bindings,
+ error::{code::*, Error, Result},
+ types::{ARef, AlwaysRefCounted, Opaque},
+};
+use core::ptr;
+
+/// Flags associated with a [`File`].
+pub mod flags {
+ /// File is opened in append mode.
+ pub const O_APPEND: u32 = bindings::O_APPEND;
+
+ /// Signal-driven I/O is enabled.
+ pub const O_ASYNC: u32 = bindings::FASYNC;
+
+ /// Close-on-exec flag is set.
+ pub const O_CLOEXEC: u32 = bindings::O_CLOEXEC;
+
+ /// File was created if it didn't already exist.
+ pub const O_CREAT: u32 = bindings::O_CREAT;
+
+ /// Direct I/O is enabled for this file.
+ pub const O_DIRECT: u32 = bindings::O_DIRECT;
+
+ /// File must be a directory.
+ pub const O_DIRECTORY: u32 = bindings::O_DIRECTORY;
+
+ /// Like [`O_SYNC`] except metadata is not synced.
+ pub const O_DSYNC: u32 = bindings::O_DSYNC;
+
+ /// Ensure that this file is created with the `open(2)` call.
+ pub const O_EXCL: u32 = bindings::O_EXCL;
+
+ /// Large file size enabled (`off64_t` over `off_t`).
+ pub const O_LARGEFILE: u32 = bindings::O_LARGEFILE;
+
+ /// Do not update the file last access time.
+ pub const O_NOATIME: u32 = bindings::O_NOATIME;
+
+ /// File should not be used as process's controlling terminal.
+ pub const O_NOCTTY: u32 = bindings::O_NOCTTY;
+
+ /// If basename of path is a symbolic link, fail open.
+ pub const O_NOFOLLOW: u32 = bindings::O_NOFOLLOW;
+
+ /// File is using nonblocking I/O.
+ pub const O_NONBLOCK: u32 = bindings::O_NONBLOCK;
+
+ /// File is using nonblocking I/O.
+ ///
+ /// This is effectively the same flag as [`O_NONBLOCK`] on all architectures
+ /// except SPARC64.
+ pub const O_NDELAY: u32 = bindings::O_NDELAY;
+
+ /// Used to obtain a path file descriptor.
+ pub const O_PATH: u32 = bindings::O_PATH;
+
+ /// Write operations on this file will flush data and metadata.
+ pub const O_SYNC: u32 = bindings::O_SYNC;
+
+ /// This file is an unnamed temporary regular file.
+ pub const O_TMPFILE: u32 = bindings::O_TMPFILE;
+
+ /// File should be truncated to length 0.
+ pub const O_TRUNC: u32 = bindings::O_TRUNC;
+
+ /// Bitmask for access mode flags.
+ ///
+ /// # Examples
+ ///
+ /// ```
+ /// use kernel::fs::file;
+ /// # fn do_something() {}
+ /// # let flags = 0;
+ /// if (flags & file::flags::O_ACCMODE) == file::flags::O_RDONLY {
+ /// do_something();
+ /// }
+ /// ```
+ pub const O_ACCMODE: u32 = bindings::O_ACCMODE;
+
+ /// File is read only.
+ pub const O_RDONLY: u32 = bindings::O_RDONLY;
+
+ /// File is write only.
+ pub const O_WRONLY: u32 = bindings::O_WRONLY;
+
+ /// File can be both read and written.
+ pub const O_RDWR: u32 = bindings::O_RDWR;
+}
+
+/// Wraps the kernel's `struct file`. Thread safe.
+///
+/// This represents an open file rather than a file on a filesystem. Processes generally reference
+/// open files using file descriptors. However, file descriptors are not the same as files. A file
+/// descriptor is just an integer that corresponds to a file, and a single file may be referenced
+/// by multiple file descriptors.
+///
+/// # Refcounting
+///
+/// Instances of this type are reference-counted. The reference count is incremented by the
+/// `fget`/`get_file` functions and decremented by `fput`. The Rust type `ARef<File>` represents a
+/// pointer that owns a reference count on the file.
+///
+/// Whenever a process opens a file descriptor (fd), it stores a pointer to the file in its fd
+/// table (`struct files_struct`). This pointer owns a reference count to the file, ensuring the
+/// file isn't prematurely deleted while the file descriptor is open. In Rust terminology, the
+/// pointers in `struct files_struct` are `ARef<File>` pointers.
+///
+/// ## Light refcounts
+///
+/// Whenever a process has an fd to a file, it may use something called a "light refcount" as a
+/// performance optimization. Light refcounts are acquired by calling `fdget` and released with
+/// `fdput`. The idea behind light refcounts is that if the fd is not closed between the calls to
+/// `fdget` and `fdput`, then the refcount cannot hit zero during that time, as the `struct
+/// files_struct` holds a reference until the fd is closed. This means that it's safe to access the
+/// file even if `fdget` does not increment the refcount.
+///
+/// The requirement that the fd is not closed during a light refcount applies globally across all
+/// threads - not just on the thread using the light refcount. For this reason, light refcounts are
+/// only used when the `struct files_struct` is not shared with other threads, since this ensures
+/// that other unrelated threads cannot suddenly start using the fd and close it. Therefore,
+/// calling `fdget` on a shared `struct files_struct` creates a normal refcount instead of a light
+/// refcount.
+///
+/// Light reference counts must be released with `fdput` before the system call returns to
+/// userspace. This means that if you wait until the current system call returns to userspace, then
+/// all light refcounts that existed at the time have gone away.
+///
+/// ### The file position
+///
+/// Each `struct file` has a position integer, which is protected by the `f_pos_lock` mutex.
+/// However, if the `struct file` is not shared, then the kernel may avoid taking the lock as a
+/// performance optimization.
+///
+/// The condition for avoiding the `f_pos_lock` mutex is different from the condition for using
+/// `fdget`. With `fdget`, you may avoid incrementing the refcount as long as the current fd table
+/// is not shared; it is okay if there are other fd tables that also reference the same `struct
+/// file`. However, `fdget_pos` can only avoid taking the `f_pos_lock` if the entire `struct file`
+/// is not shared, as different processes with an fd to the same `struct file` share the same
+/// position.
+///
+/// To represent files that are not thread safe due to this optimization, the [`LocalFile`] type is
+/// used.
+///
+/// ## Rust references
+///
+/// The reference type `&File` is similar to light refcounts:
+///
+/// * `&File` references don't own a reference count. They can only exist as long as the reference
+/// count stays positive, and can only be created when there is some mechanism in place to ensure
+/// this.
+///
+/// * The Rust borrow-checker normally ensures this by enforcing that the `ARef<File>` from which
+/// a `&File` is created outlives the `&File`.
+///
+/// * Using the unsafe [`File::from_raw_file`] means that it is up to the caller to ensure that the
+/// `&File` only exists while the reference count is positive.
+///
+/// * You can think of `fdget` as using an fd to look up an `ARef<File>` in the `struct
+/// files_struct` and create an `&File` from it. The "fd cannot be closed" rule is like the Rust
+/// rule "the `ARef<File>` must outlive the `&File`".
+///
+/// # Invariants
+///
+/// * All instances of this type are refcounted using the `f_count` field.
+/// * There must not be any active calls to `fdget_pos` on this file that did not take the
+/// `f_pos_lock` mutex.
+#[repr(transparent)]
+pub struct File {
+ inner: Opaque<bindings::file>,
+}
+
+// SAFETY: This file is known to not have any active `fdget_pos` calls that did not take the
+// `f_pos_lock` mutex, so it is safe to transfer it between threads.
+unsafe impl Send for File {}
+
+// SAFETY: This file is known to not have any active `fdget_pos` calls that did not take the
+// `f_pos_lock` mutex, so it is safe to access its methods from several threads in parallel.
+unsafe impl Sync for File {}
+
+// SAFETY: The type invariants guarantee that `File` is always ref-counted. This implementation
+// makes `ARef<File>` own a normal refcount.
+unsafe impl AlwaysRefCounted for File {
+ #[inline]
+ fn inc_ref(&self) {
+ // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+ unsafe { bindings::get_file(self.as_ptr()) };
+ }
+
+ #[inline]
+ unsafe fn dec_ref(obj: ptr::NonNull<File>) {
+ // SAFETY: To call this method, the caller passes us ownership of a normal refcount, so we
+ // may drop it. The cast is okay since `File` has the same representation as `struct file`.
+ unsafe { bindings::fput(obj.cast().as_ptr()) }
+ }
+}
+
+/// Wraps the kernel's `struct file`. Not thread safe.
+///
+/// This type represents a file that is not known to be safe to transfer across thread boundaries.
+/// To obtain a thread-safe [`File`], use the [`assume_no_fdget_pos`] conversion.
+///
+/// See the documentation for [`File`] for more information.
+///
+/// # Invariants
+///
+/// * All instances of this type are refcounted using the `f_count` field.
+/// * If there is an active call to `fdget_pos` that did not take the `f_pos_lock` mutex, then it
+/// must be on the same thread as this file.
+///
+/// [`assume_no_fdget_pos`]: LocalFile::assume_no_fdget_pos
+pub struct LocalFile {
+ inner: Opaque<bindings::file>,
+}
+
+// SAFETY: The type invariants guarantee that `LocalFile` is always ref-counted. This implementation
+// makes `ARef<File>` own a normal refcount.
+unsafe impl AlwaysRefCounted for LocalFile {
+ #[inline]
+ fn inc_ref(&self) {
+ // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+ unsafe { bindings::get_file(self.as_ptr()) };
+ }
+
+ #[inline]
+ unsafe fn dec_ref(obj: ptr::NonNull<LocalFile>) {
+ // SAFETY: To call this method, the caller passes us ownership of a normal refcount, so we
+ // may drop it. The cast is okay since `File` has the same representation as `struct file`.
+ unsafe { bindings::fput(obj.cast().as_ptr()) }
+ }
+}
+
+impl LocalFile {
+ /// Constructs a new `struct file` wrapper from a file descriptor.
+ ///
+ /// The file descriptor belongs to the current process, and there might be active local calls
+ /// to `fdget_pos` on the same file.
+ ///
+ /// To obtain an `ARef<File>`, use the [`assume_no_fdget_pos`] function to convert.
+ ///
+ /// [`assume_no_fdget_pos`]: LocalFile::assume_no_fdget_pos
+ #[inline]
+ pub fn fget(fd: u32) -> Result<ARef<LocalFile>, BadFdError> {
+ // SAFETY: FFI call, there are no requirements on `fd`.
+ let ptr = ptr::NonNull::new(unsafe { bindings::fget(fd) }).ok_or(BadFdError)?;
+
+ // SAFETY: `bindings::fget` created a refcount, and we pass ownership of it to the `ARef`.
+ //
+ // INVARIANT: This file is in the fd table on this thread, so either all `fdget_pos` calls
+ // are on this thread, or the file is shared, in which case `fdget_pos` calls took the
+ // `f_pos_lock` mutex.
+ Ok(unsafe { ARef::from_raw(ptr.cast()) })
+ }
+
+ /// Creates a reference to a [`LocalFile`] from a valid pointer.
+ ///
+ /// # Safety
+ ///
+ /// * The caller must ensure that `ptr` points at a valid file and that the file's refcount is
+ /// positive for the duration of 'a.
+ /// * The caller must ensure that if there is an active call to `fdget_pos` that did not take
+ /// the `f_pos_lock` mutex, then that call is on the current thread.
+ #[inline]
+ pub unsafe fn from_raw_file<'a>(ptr: *const bindings::file) -> &'a LocalFile {
+ // SAFETY: The caller guarantees that the pointer is not dangling and stays valid for the
+ // duration of 'a. The cast is okay because `File` is `repr(transparent)`.
+ //
+ // INVARIANT: The caller guarantees that there are no problematic `fdget_pos` calls.
+ unsafe { &*ptr.cast() }
+ }
+
+ /// Assume that there are no active `fdget_pos` calls that prevent us from sharing this file.
+ ///
+ /// This makes it safe to transfer this file to other threads. No checks are performed, and
+ /// using it incorrectly may lead to a data race on the file position if the file is shared
+ /// with another thread.
+ ///
+ /// This method is intended to be used together with [`LocalFile::fget`] when the caller knows
+ /// statically that there are no `fdget_pos` calls on the current thread. For example, you
+ /// might use it when calling `fget` from an ioctl, since ioctls usually do not touch the file
+ /// position.
+ ///
+ /// # Safety
+ ///
+ /// There must not be any active `fdget_pos` calls on the current thread.
+ #[inline]
+ pub unsafe fn assume_no_fdget_pos(me: ARef<LocalFile>) -> ARef<File> {
+ // INVARIANT: There are no `fdget_pos` calls on the current thread, and by the type
+ // invariants, if there is a `fdget_pos` call on another thread, then it took the
+ // `f_pos_lock` mutex.
+ //
+ // SAFETY: `LocalFile` and `File` have the same layout.
+ unsafe { ARef::from_raw(ARef::into_raw(me).cast()) }
+ }
+
+ /// Returns a raw pointer to the inner C struct.
+ #[inline]
+ pub fn as_ptr(&self) -> *mut bindings::file {
+ self.inner.get()
+ }
+
+ /// Returns the flags associated with the file.
+ ///
+ /// The flags are a combination of the constants in [`flags`].
+ #[inline]
+ pub fn flags(&self) -> u32 {
+ // This `read_volatile` is intended to correspond to a READ_ONCE call.
+ //
+ // SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.
+ //
+ // FIXME(read_once): Replace with `read_once` when available on the Rust side.
+ unsafe { core::ptr::addr_of!((*self.as_ptr()).f_flags).read_volatile() }
+ }
+}
+
+impl File {
+ /// Creates a reference to a [`File`] from a valid pointer.
+ ///
+ /// # Safety
+ ///
+ /// * The caller must ensure that `ptr` points at a valid file and that the file's refcount is
+ /// positive for the duration of 'a.
+ /// * The caller must ensure that if there are active `fdget_pos` calls on this file, then they
+ /// took the `f_pos_lock` mutex.
+ #[inline]
+ pub unsafe fn from_raw_file<'a>(ptr: *const bindings::file) -> &'a File {
+ // SAFETY: The caller guarantees that the pointer is not dangling and stays valid for the
+ // duration of 'a. The cast is okay because `File` is `repr(transparent)`.
+ //
+ // INVARIANT: The caller guarantees that there are no problematic `fdget_pos` calls.
+ unsafe { &*ptr.cast() }
+ }
+}
+
+// Make LocalFile methods available on File.
+impl core::ops::Deref for File {
+ type Target = LocalFile;
+ #[inline]
+ fn deref(&self) -> &LocalFile {
+ // SAFETY: The caller provides a `&File`, and since it is a reference, it must point at a
+ // valid file for the desired duration.
+ //
+ // By the type invariants, there are no `fdget_pos` calls that did not take the
+ // `f_pos_lock` mutex.
+ unsafe { LocalFile::from_raw_file(self as *const File as *const bindings::file) }
+ }
+}
+
+/// Represents the `EBADF` error code.
+///
+/// Used for methods that can only fail with `EBADF`.
+#[derive(Copy, Clone, Eq, PartialEq)]
+pub struct BadFdError;
+
+impl From<BadFdError> for Error {
+ #[inline]
+ fn from(_: BadFdError) -> Error {
+ EBADF
+ }
+}
+
+impl core::fmt::Debug for BadFdError {
+ fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
+ f.pad("EBADF")
+ }
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index f10b06a78b9d..c7d50f245f58 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -34,6 +34,7 @@
pub mod error;
#[cfg(CONFIG_RUST_FW_LOADER_ABSTRACTIONS)]
pub mod firmware;
+pub mod fs;
pub mod init;
pub mod ioctl;
#[cfg(CONFIG_KUNIT)]
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v10 4/8] rust: cred: add Rust abstraction for `struct cred`
2024-09-15 14:31 [PATCH v10 0/8] File abstractions needed by Rust Binder Alice Ryhl
` (2 preceding siblings ...)
2024-09-15 14:31 ` [PATCH v10 3/8] rust: file: add Rust abstraction for `struct file` Alice Ryhl
@ 2024-09-15 14:31 ` Alice Ryhl
2024-09-15 20:24 ` Kees Cook
2024-09-19 7:57 ` Paul Moore
2024-09-15 14:31 ` [PATCH v10 5/8] rust: security: add abstraction for secctx Alice Ryhl
` (4 subsequent siblings)
8 siblings, 2 replies; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 14:31 UTC (permalink / raw)
To: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner
Cc: Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Alice Ryhl,
Kees Cook
From: Wedson Almeida Filho <wedsonaf@gmail.com>
Add a wrapper around `struct cred` called `Credential`, and provide
functionality to get the `Credential` associated with a `File`.
Rust Binder must check the credentials of processes when they attempt to
perform various operations, and these checks usually take a
`&Credential` as parameter. The security_binder_set_context_mgr function
would be one example. This patch is necessary to access these security_*
methods from Rust.
This Rust abstraction makes the following assumptions about the C side:
* `struct cred` is refcounted with `get_cred`/`put_cred`.
* It's okay to transfer a `struct cred` across threads, that is, you do
not need to call `put_cred` on the same thread as where you called
`get_cred`.
* The `euid` field of a `struct cred` never changes after
initialization.
* The `f_cred` field of a `struct file` never changes after
initialization.
Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Co-developed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/bindings/bindings_helper.h | 1 +
rust/helpers/cred.c | 13 +++++++
rust/helpers/helpers.c | 1 +
rust/kernel/cred.rs | 76 +++++++++++++++++++++++++++++++++++++++++
rust/kernel/fs/file.rs | 13 +++++++
rust/kernel/lib.rs | 1 +
6 files changed, 105 insertions(+)
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 4a400a954979..f74247205cb5 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -10,6 +10,7 @@
#include <linux/blk-mq.h>
#include <linux/blk_types.h>
#include <linux/blkdev.h>
+#include <linux/cred.h>
#include <linux/errname.h>
#include <linux/ethtool.h>
#include <linux/file.h>
diff --git a/rust/helpers/cred.c b/rust/helpers/cred.c
new file mode 100644
index 000000000000..fde7ae20cdd1
--- /dev/null
+++ b/rust/helpers/cred.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/cred.h>
+
+const struct cred *rust_helper_get_cred(const struct cred *cred)
+{
+ return get_cred(cred);
+}
+
+void rust_helper_put_cred(const struct cred *cred)
+{
+ put_cred(cred);
+}
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 3f2d0d0c8017..16e5de352dab 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -11,6 +11,7 @@
#include "bug.c"
#include "build_assert.c"
#include "build_bug.c"
+#include "cred.c"
#include "err.c"
#include "fs.c"
#include "kunit.c"
diff --git a/rust/kernel/cred.rs b/rust/kernel/cred.rs
new file mode 100644
index 000000000000..acee04768927
--- /dev/null
+++ b/rust/kernel/cred.rs
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Credentials management.
+//!
+//! C header: [`include/linux/cred.h`](srctree/include/linux/cred.h).
+//!
+//! Reference: <https://www.kernel.org/doc/html/latest/security/credentials.html>
+
+use crate::{
+ bindings,
+ types::{AlwaysRefCounted, Opaque},
+};
+
+/// Wraps the kernel's `struct cred`.
+///
+/// Credentials are used for various security checks in the kernel.
+///
+/// Most fields of credentials are immutable. When things have their credentials changed, that
+/// happens by replacing the credential instead of changing an existing credential. See the [kernel
+/// documentation][ref] for more info on this.
+///
+/// # Invariants
+///
+/// Instances of this type are always ref-counted, that is, a call to `get_cred` ensures that the
+/// allocation remains valid at least until the matching call to `put_cred`.
+///
+/// [ref]: https://www.kernel.org/doc/html/latest/security/credentials.html
+#[repr(transparent)]
+pub struct Credential(Opaque<bindings::cred>);
+
+// SAFETY:
+// - `Credential::dec_ref` can be called from any thread.
+// - It is okay to send ownership of `Credential` across thread boundaries.
+unsafe impl Send for Credential {}
+
+// SAFETY: It's OK to access `Credential` through shared references from other threads because
+// we're either accessing properties that don't change or that are properly synchronised by C code.
+unsafe impl Sync for Credential {}
+
+impl Credential {
+ /// Creates a reference to a [`Credential`] from a valid pointer.
+ ///
+ /// # Safety
+ ///
+ /// The caller must ensure that `ptr` is valid and remains valid for the lifetime of the
+ /// returned [`Credential`] reference.
+ pub unsafe fn from_ptr<'a>(ptr: *const bindings::cred) -> &'a Credential {
+ // SAFETY: The safety requirements guarantee the validity of the dereference, while the
+ // `Credential` type being transparent makes the cast ok.
+ unsafe { &*ptr.cast() }
+ }
+
+ /// Returns the effective UID of the given credential.
+ pub fn euid(&self) -> bindings::kuid_t {
+ // SAFETY: By the type invariant, we know that `self.0` is valid. Furthermore, the `euid`
+ // field of a credential is never changed after initialization, so there is no potential
+ // for data races.
+ unsafe { (*self.0.get()).euid }
+ }
+}
+
+// SAFETY: The type invariants guarantee that `Credential` is always ref-counted.
+unsafe impl AlwaysRefCounted for Credential {
+ fn inc_ref(&self) {
+ // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+ unsafe { bindings::get_cred(self.0.get()) };
+ }
+
+ unsafe fn dec_ref(obj: core::ptr::NonNull<Credential>) {
+ // SAFETY: The safety requirements guarantee that the refcount is nonzero. The cast is okay
+ // because `Credential` has the same representation as `struct cred`.
+ unsafe { bindings::put_cred(obj.cast().as_ptr()) };
+ }
+}
diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index 6adb7a7199ec..3c1f51719804 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -9,6 +9,7 @@
use crate::{
bindings,
+ cred::Credential,
error::{code::*, Error, Result},
types::{ARef, AlwaysRefCounted, Opaque},
};
@@ -308,6 +309,18 @@ pub fn as_ptr(&self) -> *mut bindings::file {
self.inner.get()
}
+ /// Returns the credentials of the task that originally opened the file.
+ pub fn cred(&self) -> &Credential {
+ // SAFETY: It's okay to read the `f_cred` field without synchronization because `f_cred` is
+ // never changed after initialization of the file.
+ let ptr = unsafe { (*self.as_ptr()).f_cred };
+
+ // SAFETY: The signature of this function ensures that the caller will only access the
+ // returned credential while the file is still valid, and the C side ensures that the
+ // credential stays valid at least as long as the file.
+ unsafe { Credential::from_ptr(ptr) }
+ }
+
/// Returns the flags associated with the file.
///
/// The flags are a combination of the constants in [`flags`].
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index c7d50f245f58..c537d17c6db9 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -30,6 +30,7 @@
#[cfg(CONFIG_BLOCK)]
pub mod block;
mod build_assert;
+pub mod cred;
pub mod device;
pub mod error;
#[cfg(CONFIG_RUST_FW_LOADER_ABSTRACTIONS)]
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v10 5/8] rust: security: add abstraction for secctx
2024-09-15 14:31 [PATCH v10 0/8] File abstractions needed by Rust Binder Alice Ryhl
` (3 preceding siblings ...)
2024-09-15 14:31 ` [PATCH v10 4/8] rust: cred: add Rust abstraction for `struct cred` Alice Ryhl
@ 2024-09-15 14:31 ` Alice Ryhl
2024-09-15 20:58 ` Kees Cook
2024-09-19 7:56 ` Paul Moore
2024-09-15 14:31 ` [PATCH v10 6/8] rust: file: add `FileDescriptorReservation` Alice Ryhl
` (3 subsequent siblings)
8 siblings, 2 replies; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 14:31 UTC (permalink / raw)
To: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner
Cc: Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Alice Ryhl,
Kees Cook
Add an abstraction for viewing the string representation of a security
context.
This is needed by Rust Binder because it has a feature where a process
can view the string representation of the security context for incoming
transactions. The process can use that to authenticate incoming
transactions, and since the feature is provided by the kernel, the
process can trust that the security context is legitimate.
This abstraction makes the following assumptions about the C side:
* When a call to `security_secid_to_secctx` is successful, it returns a
pointer and length. The pointer references a byte string and is valid
for reading for that many bytes.
* The string may be referenced until `security_release_secctx` is
called.
* If CONFIG_SECURITY is set, then the three methods mentioned in
rust/helpers are available without a helper. (That is, they are not a
#define or `static inline`.)
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/bindings/bindings_helper.h | 1 +
rust/helpers/helpers.c | 1 +
rust/helpers/security.c | 20 +++++++++++
rust/kernel/cred.rs | 8 +++++
rust/kernel/lib.rs | 1 +
rust/kernel/security.rs | 74 +++++++++++++++++++++++++++++++++++++++++
6 files changed, 105 insertions(+)
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index f74247205cb5..51ec78c355c0 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -21,6 +21,7 @@
#include <linux/phy.h>
#include <linux/refcount.h>
#include <linux/sched.h>
+#include <linux/security.h>
#include <linux/slab.h>
#include <linux/wait.h>
#include <linux/workqueue.h>
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 16e5de352dab..62022b18caf5 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -19,6 +19,7 @@
#include "page.c"
#include "rbtree.c"
#include "refcount.c"
+#include "security.c"
#include "signal.c"
#include "slab.c"
#include "spinlock.c"
diff --git a/rust/helpers/security.c b/rust/helpers/security.c
new file mode 100644
index 000000000000..239e5b4745fe
--- /dev/null
+++ b/rust/helpers/security.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/security.h>
+
+#ifndef CONFIG_SECURITY
+void rust_helper_security_cred_getsecid(const struct cred *c, u32 *secid)
+{
+ security_cred_getsecid(c, secid);
+}
+
+int rust_helper_security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
+{
+ return security_secid_to_secctx(secid, secdata, seclen);
+}
+
+void rust_helper_security_release_secctx(char *secdata, u32 seclen)
+{
+ security_release_secctx(secdata, seclen);
+}
+#endif
diff --git a/rust/kernel/cred.rs b/rust/kernel/cred.rs
index acee04768927..92659649e932 100644
--- a/rust/kernel/cred.rs
+++ b/rust/kernel/cred.rs
@@ -52,6 +52,14 @@ pub unsafe fn from_ptr<'a>(ptr: *const bindings::cred) -> &'a Credential {
unsafe { &*ptr.cast() }
}
+ /// Get the id for this security context.
+ pub fn get_secid(&self) -> u32 {
+ let mut secid = 0;
+ // SAFETY: The invariants of this type ensures that the pointer is valid.
+ unsafe { bindings::security_cred_getsecid(self.0.get(), &mut secid) };
+ secid
+ }
+
/// Returns the effective UID of the given credential.
pub fn euid(&self) -> bindings::kuid_t {
// SAFETY: By the type invariant, we know that `self.0` is valid. Furthermore, the `euid`
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index c537d17c6db9..e088c94a5a14 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -47,6 +47,7 @@
pub mod prelude;
pub mod print;
pub mod rbtree;
+pub mod security;
mod static_assert;
#[doc(hidden)]
pub mod std_vendor;
diff --git a/rust/kernel/security.rs b/rust/kernel/security.rs
new file mode 100644
index 000000000000..2522868862a1
--- /dev/null
+++ b/rust/kernel/security.rs
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Linux Security Modules (LSM).
+//!
+//! C header: [`include/linux/security.h`](srctree/include/linux/security.h).
+
+use crate::{
+ bindings,
+ error::{to_result, Result},
+};
+
+/// A security context string.
+///
+/// # Invariants
+///
+/// The `secdata` and `seclen` fields correspond to a valid security context as returned by a
+/// successful call to `security_secid_to_secctx`, that has not yet been destroyed by calling
+/// `security_release_secctx`.
+pub struct SecurityCtx {
+ secdata: *mut core::ffi::c_char,
+ seclen: usize,
+}
+
+impl SecurityCtx {
+ /// Get the security context given its id.
+ pub fn from_secid(secid: u32) -> Result<Self> {
+ let mut secdata = core::ptr::null_mut();
+ let mut seclen = 0u32;
+ // SAFETY: Just a C FFI call. The pointers are valid for writes.
+ to_result(unsafe { bindings::security_secid_to_secctx(secid, &mut secdata, &mut seclen) })?;
+
+ // INVARIANT: If the above call did not fail, then we have a valid security context.
+ Ok(Self {
+ secdata,
+ seclen: seclen as usize,
+ })
+ }
+
+ /// Returns whether the security context is empty.
+ pub fn is_empty(&self) -> bool {
+ self.seclen == 0
+ }
+
+ /// Returns the length of this security context.
+ pub fn len(&self) -> usize {
+ self.seclen
+ }
+
+ /// Returns the bytes for this security context.
+ pub fn as_bytes(&self) -> &[u8] {
+ let ptr = self.secdata;
+ if ptr.is_null() {
+ debug_assert_eq!(self.seclen, 0);
+ // We can't pass a null pointer to `slice::from_raw_parts` even if the length is zero.
+ return &[];
+ }
+
+ // SAFETY: The call to `security_secid_to_secctx` guarantees that the pointer is valid for
+ // `seclen` bytes. Furthermore, if the length is zero, then we have ensured that the
+ // pointer is not null.
+ unsafe { core::slice::from_raw_parts(ptr.cast(), self.seclen) }
+ }
+}
+
+impl Drop for SecurityCtx {
+ fn drop(&mut self) {
+ // SAFETY: By the invariant of `Self`, this frees a pointer that came from a successful
+ // call to `security_secid_to_secctx` and has not yet been destroyed by
+ // `security_release_secctx`.
+ unsafe { bindings::security_release_secctx(self.secdata, self.seclen as u32) };
+ }
+}
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v10 6/8] rust: file: add `FileDescriptorReservation`
2024-09-15 14:31 [PATCH v10 0/8] File abstractions needed by Rust Binder Alice Ryhl
` (4 preceding siblings ...)
2024-09-15 14:31 ` [PATCH v10 5/8] rust: security: add abstraction for secctx Alice Ryhl
@ 2024-09-15 14:31 ` Alice Ryhl
2024-09-15 18:39 ` Al Viro
2024-09-15 14:31 ` [PATCH v10 7/8] rust: file: add `Kuid` wrapper Alice Ryhl
` (2 subsequent siblings)
8 siblings, 1 reply; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 14:31 UTC (permalink / raw)
To: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner
Cc: Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Alice Ryhl,
Kees Cook
From: Wedson Almeida Filho <wedsonaf@gmail.com>
Allow for the creation of a file descriptor in two steps: first, we
reserve a slot for it, then we commit or drop the reservation. The first
step may fail (e.g., the current process ran out of available slots),
but commit and drop never fail (and are mutually exclusive).
This is needed by Rust Binder when fds are sent from one process to
another. It has to be a two-step process to properly handle the case
where multiple fds are sent: The operation must fail or succeed
atomically, which we achieve by first reserving the fds we need, and
only installing the files once we have reserved enough fds to send the
files.
Fd reservations assume that the value of `current` does not change
between the call to get_unused_fd_flags and the call to fd_install (or
put_unused_fd). By not implementing the Send trait, this abstraction
ensures that the `FileDescriptorReservation` cannot be moved into a
different process.
Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Co-developed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/kernel/fs/file.rs | 75 +++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 74 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index 3c1f51719804..e03dbe14d62a 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -11,7 +11,7 @@
bindings,
cred::Credential,
error::{code::*, Error, Result},
- types::{ARef, AlwaysRefCounted, Opaque},
+ types::{ARef, AlwaysRefCounted, NotThreadSafe, Opaque},
};
use core::ptr;
@@ -368,6 +368,79 @@ fn deref(&self) -> &LocalFile {
}
}
+/// A file descriptor reservation.
+///
+/// This allows the creation of a file descriptor in two steps: first, we reserve a slot for it,
+/// then we commit or drop the reservation. The first step may fail (e.g., the current process ran
+/// out of available slots), but commit and drop never fail (and are mutually exclusive).
+///
+/// Dropping the reservation happens in the destructor of this type.
+///
+/// # Invariants
+///
+/// The fd stored in this struct must correspond to a reserved file descriptor of the current task.
+pub struct FileDescriptorReservation {
+ fd: u32,
+ /// Prevent values of this type from being moved to a different task.
+ ///
+ /// The `fd_install` and `put_unused_fd` functions assume that the value of `current` is
+ /// unchanged since the call to `get_unused_fd_flags`. By adding this marker to this type, we
+ /// prevent it from being moved across task boundaries, which ensures that `current` does not
+ /// change while this value exists.
+ _not_send: NotThreadSafe,
+}
+
+impl FileDescriptorReservation {
+ /// Creates a new file descriptor reservation.
+ pub fn get_unused_fd_flags(flags: u32) -> Result<Self> {
+ // SAFETY: FFI call, there are no safety requirements on `flags`.
+ let fd: i32 = unsafe { bindings::get_unused_fd_flags(flags) };
+ if fd < 0 {
+ return Err(Error::from_errno(fd));
+ }
+ Ok(Self {
+ fd: fd as u32,
+ _not_send: NotThreadSafe,
+ })
+ }
+
+ /// Returns the file descriptor number that was reserved.
+ pub fn reserved_fd(&self) -> u32 {
+ self.fd
+ }
+
+ /// Commits the reservation.
+ ///
+ /// The previously reserved file descriptor is bound to `file`. This method consumes the
+ /// [`FileDescriptorReservation`], so it will not be usable after this call.
+ pub fn fd_install(self, file: ARef<File>) {
+ // SAFETY: `self.fd` was previously returned by `get_unused_fd_flags`. We have not yet used
+ // the fd, so it is still valid, and `current` still refers to the same task, as this type
+ // cannot be moved across task boundaries.
+ //
+ // Furthermore, the file pointer is guaranteed to own a refcount by its type invariants,
+ // and we take ownership of that refcount by not running the destructor below.
+ // Additionally, the file is known to not have any non-shared `fdget_pos` calls, so even if
+ // this process starts using the file position, this will not result in a data race on the
+ // file position.
+ unsafe { bindings::fd_install(self.fd, file.as_ptr()) };
+
+ // `fd_install` consumes both the file descriptor and the file reference, so we cannot run
+ // the destructors.
+ core::mem::forget(self);
+ core::mem::forget(file);
+ }
+}
+
+impl Drop for FileDescriptorReservation {
+ fn drop(&mut self) {
+ // SAFETY: By the type invariants of this type, `self.fd` was previously returned by
+ // `get_unused_fd_flags`. We have not yet used the fd, so it is still valid, and `current`
+ // still refers to the same task, as this type cannot be moved across task boundaries.
+ unsafe { bindings::put_unused_fd(self.fd) };
+ }
+}
+
/// Represents the `EBADF` error code.
///
/// Used for methods that can only fail with `EBADF`.
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v10 7/8] rust: file: add `Kuid` wrapper
2024-09-15 14:31 [PATCH v10 0/8] File abstractions needed by Rust Binder Alice Ryhl
` (5 preceding siblings ...)
2024-09-15 14:31 ` [PATCH v10 6/8] rust: file: add `FileDescriptorReservation` Alice Ryhl
@ 2024-09-15 14:31 ` Alice Ryhl
2024-09-15 22:02 ` Gary Guo
2024-09-15 14:31 ` [PATCH v10 8/8] rust: file: add abstraction for `poll_table` Alice Ryhl
2024-09-27 9:28 ` [PATCH v10 0/8] File abstractions needed by Rust Binder Christian Brauner
8 siblings, 1 reply; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 14:31 UTC (permalink / raw)
To: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner
Cc: Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Alice Ryhl,
Kees Cook
Adds a wrapper around `kuid_t` called `Kuid`. This allows us to define
various operations on kuids such as equality and current_euid. It also
lets us provide conversions from kuid into userspace values.
Rust Binder needs these operations because it needs to compare kuids for
equality, and it needs to tell userspace about the pid and uid of
incoming transactions.
To read kuids from a `struct task_struct`, you must currently use
various #defines that perform the appropriate field access under an RCU
read lock. Currently, we do not have a Rust wrapper for rcu_read_lock,
which means that for this patch, there are two ways forward:
1. Inline the methods into Rust code, and use __rcu_read_lock directly
rather than the rcu_read_lock wrapper. This gives up lockdep for
these usages of RCU.
2. Wrap the various #defines in helpers and call the helpers from Rust.
This patch uses the second option. One possible disadvantage of the
second option is the possible introduction of speculation gadgets, but
as discussed in [1], the risk appears to be acceptable.
Of course, once a wrapper for rcu_read_lock is available, it is
preferable to use that over either of the two above approaches.
Link: https://lore.kernel.org/all/202312080947.674CD2DC7@keescook/ [1]
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/bindings/bindings_helper.h | 1 +
rust/helpers/task.c | 38 ++++++++++++++++++++++++
rust/kernel/cred.rs | 5 ++--
rust/kernel/task.rs | 66 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 108 insertions(+), 2 deletions(-)
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 51ec78c355c0..e854ccddecee 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -19,6 +19,7 @@
#include <linux/jiffies.h>
#include <linux/mdio.h>
#include <linux/phy.h>
+#include <linux/pid_namespace.h>
#include <linux/refcount.h>
#include <linux/sched.h>
#include <linux/security.h>
diff --git a/rust/helpers/task.c b/rust/helpers/task.c
index 7ac789232d11..7d66487db831 100644
--- a/rust/helpers/task.c
+++ b/rust/helpers/task.c
@@ -17,3 +17,41 @@ void rust_helper_put_task_struct(struct task_struct *t)
{
put_task_struct(t);
}
+
+kuid_t rust_helper_task_uid(struct task_struct *task)
+{
+ return task_uid(task);
+}
+
+kuid_t rust_helper_task_euid(struct task_struct *task)
+{
+ return task_euid(task);
+}
+
+#ifndef CONFIG_USER_NS
+uid_t rust_helper_from_kuid(struct user_namespace *to, kuid_t uid)
+{
+ return from_kuid(to, uid);
+}
+#endif /* CONFIG_USER_NS */
+
+bool rust_helper_uid_eq(kuid_t left, kuid_t right)
+{
+ return uid_eq(left, right);
+}
+
+kuid_t rust_helper_current_euid(void)
+{
+ return current_euid();
+}
+
+struct user_namespace *rust_helper_current_user_ns(void)
+{
+ return current_user_ns();
+}
+
+pid_t rust_helper_task_tgid_nr_ns(struct task_struct *tsk,
+ struct pid_namespace *ns)
+{
+ return task_tgid_nr_ns(tsk, ns);
+}
diff --git a/rust/kernel/cred.rs b/rust/kernel/cred.rs
index 92659649e932..81d67789b16f 100644
--- a/rust/kernel/cred.rs
+++ b/rust/kernel/cred.rs
@@ -10,6 +10,7 @@
use crate::{
bindings,
+ task::Kuid,
types::{AlwaysRefCounted, Opaque},
};
@@ -61,11 +62,11 @@ pub fn get_secid(&self) -> u32 {
}
/// Returns the effective UID of the given credential.
- pub fn euid(&self) -> bindings::kuid_t {
+ pub fn euid(&self) -> Kuid {
// SAFETY: By the type invariant, we know that `self.0` is valid. Furthermore, the `euid`
// field of a credential is never changed after initialization, so there is no potential
// for data races.
- unsafe { (*self.0.get()).euid }
+ Kuid::from_raw(unsafe { (*self.0.get()).euid })
}
}
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 367b4bbddd9f..1a36a9f19368 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -9,6 +9,7 @@
types::{NotThreadSafe, Opaque},
};
use core::{
+ cmp::{Eq, PartialEq},
ffi::{c_int, c_long, c_uint},
ops::Deref,
ptr,
@@ -96,6 +97,12 @@ unsafe impl Sync for Task {}
/// The type of process identifiers (PIDs).
type Pid = bindings::pid_t;
+/// The type of user identifiers (UIDs).
+#[derive(Copy, Clone)]
+pub struct Kuid {
+ kuid: bindings::kuid_t,
+}
+
impl Task {
/// Returns a raw pointer to the current task.
///
@@ -157,12 +164,31 @@ pub fn pid(&self) -> Pid {
unsafe { *ptr::addr_of!((*self.0.get()).pid) }
}
+ /// Returns the UID of the given task.
+ pub fn uid(&self) -> Kuid {
+ // SAFETY: By the type invariant, we know that `self.0` is valid.
+ Kuid::from_raw(unsafe { bindings::task_uid(self.0.get()) })
+ }
+
+ /// Returns the effective UID of the given task.
+ pub fn euid(&self) -> Kuid {
+ // SAFETY: By the type invariant, we know that `self.0` is valid.
+ Kuid::from_raw(unsafe { bindings::task_euid(self.0.get()) })
+ }
+
/// Determines whether the given task has pending signals.
pub fn signal_pending(&self) -> bool {
// SAFETY: By the type invariant, we know that `self.0` is valid.
unsafe { bindings::signal_pending(self.0.get()) != 0 }
}
+ /// Returns the given task's pid in the current pid namespace.
+ pub fn pid_in_current_ns(&self) -> Pid {
+ // SAFETY: We know that `self.0.get()` is valid by the type invariant, and passing a null
+ // pointer as the namespace is correct for using the current namespace.
+ unsafe { bindings::task_tgid_nr_ns(self.0.get(), ptr::null_mut()) }
+ }
+
/// Wakes up the task.
pub fn wake_up(&self) {
// SAFETY: By the type invariant, we know that `self.0.get()` is non-null and valid.
@@ -184,3 +210,43 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
unsafe { bindings::put_task_struct(obj.cast().as_ptr()) }
}
}
+
+impl Kuid {
+ /// Get the current euid.
+ #[inline]
+ pub fn current_euid() -> Kuid {
+ // SAFETY: Just an FFI call.
+ Self::from_raw(unsafe { bindings::current_euid() })
+ }
+
+ /// Create a `Kuid` given the raw C type.
+ #[inline]
+ pub fn from_raw(kuid: bindings::kuid_t) -> Self {
+ Self { kuid }
+ }
+
+ /// Turn this kuid into the raw C type.
+ #[inline]
+ pub fn into_raw(self) -> bindings::kuid_t {
+ self.kuid
+ }
+
+ /// Converts this kernel UID into a userspace UID.
+ ///
+ /// Uses the namespace of the current task.
+ #[inline]
+ pub fn into_uid_in_current_ns(self) -> bindings::uid_t {
+ // SAFETY: Just an FFI call.
+ unsafe { bindings::from_kuid(bindings::current_user_ns(), self.kuid) }
+ }
+}
+
+impl PartialEq for Kuid {
+ #[inline]
+ fn eq(&self, other: &Kuid) -> bool {
+ // SAFETY: Just an FFI call.
+ unsafe { bindings::uid_eq(self.kuid, other.kuid) }
+ }
+}
+
+impl Eq for Kuid {}
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v10 8/8] rust: file: add abstraction for `poll_table`
2024-09-15 14:31 [PATCH v10 0/8] File abstractions needed by Rust Binder Alice Ryhl
` (6 preceding siblings ...)
2024-09-15 14:31 ` [PATCH v10 7/8] rust: file: add `Kuid` wrapper Alice Ryhl
@ 2024-09-15 14:31 ` Alice Ryhl
2024-09-15 22:24 ` Gary Guo
2024-09-27 9:28 ` [PATCH v10 0/8] File abstractions needed by Rust Binder Christian Brauner
8 siblings, 1 reply; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 14:31 UTC (permalink / raw)
To: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner
Cc: Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Alice Ryhl,
Kees Cook
The existing `CondVar` abstraction is a wrapper around
`wait_queue_head`, but it does not support all use-cases of the C
`wait_queue_head` type. To be specific, a `CondVar` cannot be registered
with a `struct poll_table`. This limitation has the advantage that you
do not need to call `synchronize_rcu` when destroying a `CondVar`.
However, we need the ability to register a `poll_table` with a
`wait_queue_head` in Rust Binder. To enable this, introduce a type
called `PollCondVar`, which is like `CondVar` except that you can
register a `poll_table`. We also introduce `PollTable`, which is a safe
wrapper around `poll_table` that is intended to be used with
`PollCondVar`.
The destructor of `PollCondVar` unconditionally calls `synchronize_rcu`
to ensure that the removal of epoll waiters has fully completed before
the `wait_queue_head` is destroyed.
That said, `synchronize_rcu` is rather expensive and is not needed in
all cases: If we have never registered a `poll_table` with the
`wait_queue_head`, then we don't need to call `synchronize_rcu`. (And
this is a common case in Binder - not all processes use Binder with
epoll.) The current implementation does not account for this, but if we
find that it is necessary to improve this, a future patch could store a
boolean next to the `wait_queue_head` to keep track of whether a
`poll_table` has ever been registered.
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/bindings/bindings_helper.h | 1 +
rust/kernel/sync.rs | 1 +
rust/kernel/sync/poll.rs | 121 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 123 insertions(+)
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index e854ccddecee..ca13659ded4c 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -20,6 +20,7 @@
#include <linux/mdio.h>
#include <linux/phy.h>
#include <linux/pid_namespace.h>
+#include <linux/poll.h>
#include <linux/refcount.h>
#include <linux/sched.h>
#include <linux/security.h>
diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
index 0ab20975a3b5..bae4a5179c72 100644
--- a/rust/kernel/sync.rs
+++ b/rust/kernel/sync.rs
@@ -11,6 +11,7 @@
mod condvar;
pub mod lock;
mod locked_by;
+pub mod poll;
pub use arc::{Arc, ArcBorrow, UniqueArc};
pub use condvar::{new_condvar, CondVar, CondVarTimeoutResult};
diff --git a/rust/kernel/sync/poll.rs b/rust/kernel/sync/poll.rs
new file mode 100644
index 000000000000..d5f17153b424
--- /dev/null
+++ b/rust/kernel/sync/poll.rs
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Utilities for working with `struct poll_table`.
+
+use crate::{
+ bindings,
+ fs::File,
+ prelude::*,
+ sync::{CondVar, LockClassKey},
+ types::Opaque,
+};
+use core::ops::Deref;
+
+/// Creates a [`PollCondVar`] initialiser with the given name and a newly-created lock class.
+#[macro_export]
+macro_rules! new_poll_condvar {
+ ($($name:literal)?) => {
+ $crate::sync::poll::PollCondVar::new(
+ $crate::optional_name!($($name)?), $crate::static_lock_class!()
+ )
+ };
+}
+
+/// Wraps the kernel's `struct poll_table`.
+///
+/// # Invariants
+///
+/// This struct contains a valid `struct poll_table`.
+///
+/// For a `struct poll_table` to be valid, its `_qproc` function must follow the safety
+/// requirements of `_qproc` functions:
+///
+/// * The `_qproc` function is given permission to enqueue a waiter to the provided `poll_table`
+/// during the call. Once the waiter is removed and an rcu grace period has passed, it must no
+/// longer access the `wait_queue_head`.
+#[repr(transparent)]
+pub struct PollTable(Opaque<bindings::poll_table>);
+
+impl PollTable {
+ /// Creates a reference to a [`PollTable`] from a valid pointer.
+ ///
+ /// # Safety
+ ///
+ /// The caller must ensure that for the duration of 'a, the pointer will point at a valid poll
+ /// table (as defined in the type invariants).
+ ///
+ /// The caller must also ensure that the `poll_table` is only accessed via the returned
+ /// reference for the duration of 'a.
+ pub unsafe fn from_ptr<'a>(ptr: *mut bindings::poll_table) -> &'a mut PollTable {
+ // SAFETY: The safety requirements guarantee the validity of the dereference, while the
+ // `PollTable` type being transparent makes the cast ok.
+ unsafe { &mut *ptr.cast() }
+ }
+
+ fn get_qproc(&self) -> bindings::poll_queue_proc {
+ let ptr = self.0.get();
+ // SAFETY: The `ptr` is valid because it originates from a reference, and the `_qproc`
+ // field is not modified concurrently with this call since we have an immutable reference.
+ unsafe { (*ptr)._qproc }
+ }
+
+ /// Register this [`PollTable`] with the provided [`PollCondVar`], so that it can be notified
+ /// using the condition variable.
+ pub fn register_wait(&mut self, file: &File, cv: &PollCondVar) {
+ if let Some(qproc) = self.get_qproc() {
+ // SAFETY: The pointers to `file` and `self` need to be valid for the duration of this
+ // call to `qproc`, which they are because they are references.
+ //
+ // The `cv.wait_queue_head` pointer must be valid until an rcu grace period after the
+ // waiter is removed. The `PollCondVar` is pinned, so before `cv.wait_queue_head` can
+ // be destroyed, the destructor must run. That destructor first removes all waiters,
+ // and then waits for an rcu grace period. Therefore, `cv.wait_queue_head` is valid for
+ // long enough.
+ unsafe { qproc(file.as_ptr() as _, cv.wait_queue_head.get(), self.0.get()) };
+ }
+ }
+}
+
+/// A wrapper around [`CondVar`] that makes it usable with [`PollTable`].
+///
+/// [`CondVar`]: crate::sync::CondVar
+#[pin_data(PinnedDrop)]
+pub struct PollCondVar {
+ #[pin]
+ inner: CondVar,
+}
+
+impl PollCondVar {
+ /// Constructs a new condvar initialiser.
+ pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self> {
+ pin_init!(Self {
+ inner <- CondVar::new(name, key),
+ })
+ }
+}
+
+// Make the `CondVar` methods callable on `PollCondVar`.
+impl Deref for PollCondVar {
+ type Target = CondVar;
+
+ fn deref(&self) -> &CondVar {
+ &self.inner
+ }
+}
+
+#[pinned_drop]
+impl PinnedDrop for PollCondVar {
+ fn drop(self: Pin<&mut Self>) {
+ // Clear anything registered using `register_wait`.
+ //
+ // SAFETY: The pointer points at a valid `wait_queue_head`.
+ unsafe { bindings::__wake_up_pollfree(self.inner.wait_queue_head.get()) };
+
+ // Wait for epoll items to be properly removed.
+ //
+ // SAFETY: Just an FFI call.
+ unsafe { bindings::synchronize_rcu() };
+ }
+}
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v10 1/8] rust: types: add `NotThreadSafe`
2024-09-15 14:31 ` [PATCH v10 1/8] rust: types: add `NotThreadSafe` Alice Ryhl
@ 2024-09-15 15:38 ` Gary Guo
2024-09-27 11:21 ` Miguel Ojeda
2024-09-24 19:45 ` Serge E. Hallyn
1 sibling, 1 reply; 52+ messages in thread
From: Gary Guo @ 2024-09-15 15:38 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, 15 Sep 2024 14:31:27 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> This introduces a new marker type for types that shouldn't be thread
> safe. By adding a field of this type to a struct, it becomes non-Send
> and non-Sync, which means that it cannot be accessed in any way from
> threads other than the one it was created on.
>
> This is useful for APIs that require globals such as `current` to remain
> constant while the value exists.
>
> We update two existing users in the Kernel to use this helper:
>
> * `Task::current()` - moving the return type of this value to a
> different thread would not be safe as you can no longer be guaranteed
> that the `current` pointer remains valid.
> * Lock guards. Mutexes and spinlocks should be unlocked on the same
> thread as where they were locked, so we enforce this using the Send
> trait.
>
> There are also additional users in later patches of this patchset. See
> [1] and [2] for the discussion that led to the introduction of this
> patch.
>
> Link: https://lore.kernel.org/all/nFDPJFnzE9Q5cqY7FwSMByRH2OAn_BpI4H53NQfWIlN6I2qfmAqnkp2wRqn0XjMO65OyZY4h6P4K2nAGKJpAOSzksYXaiAK_FoH_8QbgBI4=@proton.me/ [1]
> Link: https://lore.kernel.org/all/nFDPJFnzE9Q5cqY7FwSMByRH2OAn_BpI4H53NQfWIlN6I2qfmAqnkp2wRqn0XjMO65OyZY4h6P4K2nAGKJpAOSzksYXaiAK_FoH_8QbgBI4=@proton.me/ [2]
> Suggested-by: Benno Lossin <benno.lossin@proton.me>
> Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> Reviewed-by: Trevor Gross <tmgross@umich.edu>
> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
> Reviewed-by: Björn Roy Baron <bjorn3_gh@protonmail.com>
> Reviewed-by: Gary Guo <gary@garyguo.net>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Miguel, can we apply this patch now without having it wait on the rest
of file abstractions because it'll be useful to other?
Best,
Gary
> ---
> rust/kernel/sync/lock.rs | 13 +++++++++----
> rust/kernel/task.rs | 10 ++++++----
> rust/kernel/types.rs | 21 +++++++++++++++++++++
> 3 files changed, 36 insertions(+), 8 deletions(-)
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 6/8] rust: file: add `FileDescriptorReservation`
2024-09-15 14:31 ` [PATCH v10 6/8] rust: file: add `FileDescriptorReservation` Alice Ryhl
@ 2024-09-15 18:39 ` Al Viro
2024-09-15 19:34 ` Al Viro
2024-09-15 20:13 ` Alice Ryhl
0 siblings, 2 replies; 52+ messages in thread
From: Al Viro @ 2024-09-15 18:39 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Greg Kroah-Hartman, Arve Hjønnevåg,
Todd Kjos, Martijn Coenen, Joel Fernandes, Carlos Llamas,
Suren Baghdasaryan, Dan Williams, Matthew Wilcox, Thomas Gleixner,
Daniel Xu, Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, Sep 15, 2024 at 02:31:32PM +0000, Alice Ryhl wrote:
> +impl Drop for FileDescriptorReservation {
> + fn drop(&mut self) {
> + // SAFETY: By the type invariants of this type, `self.fd` was previously returned by
> + // `get_unused_fd_flags`. We have not yet used the fd, so it is still valid, and `current`
> + // still refers to the same task, as this type cannot be moved across task boundaries.
> + unsafe { bindings::put_unused_fd(self.fd) };
> + }
> +}
FWIW, it's a bit more delicate. The real rules for API users are
1) anything you get from get_unused_fd_flags() (well, alloc_fd(),
internally) must be passed either to put_unused_fd() or fd_install() before
you return from syscall. That should be done by the same thread and
all calls of put_unused_fd() or fd_install() should be paired with
some get_unused_fd_flags() in that manner (i.e. by the same thread,
within the same syscall, etc.)
2) calling thread MUST NOT unshare descriptor table while it has
any reserved descriptors. I.e.
fd = get_unused_fd();
unshare_files();
fd_install(fd, file);
is a bug. Reservations are discarded by that. Getting rid of that
constraint would require tracking the sets of reserved descriptors
separately for each thread that happens to share the descriptor table.
Conceptually they *are* per-thread - the same thread that has done
reservation must either discard it or use it. However, it's easier to
keep the "it's reserved by some thread" represented in descriptor table
itself (bit set in ->open_fds bitmap, file reference in ->fd[] array is
NULL) than try and keep track of who's reserved what. The constraint is
basically "all reservations can stay with the old copy", i.e. "caller has
no reservations of its own to transfer into the new private copy it gets".
It's not particularly onerous[*] and it simplifies things
quite a bit. However, if we are documenting thing, it needs to be
put explicitly. With respect to Rust, if you do e.g. binfmt-in-rust
support it will immediately become an issue - begin_new_exec() is calling
unshare_files(), so the example above can become an issue.
Internally (in fs/file.c, that is) we have additional safety
rule - anything that might be given an arbitrary descriptor (e.g.
do_dup2() destination can come directly from dup2(2) argument,
file_close_fd_locked() victim can come directly from close(2) one,
etc.) must leave reserved descriptors alone. Not an issue API users
need to watch out for, though.
[*] unsharing the descriptor table is done by
+ close_range(2), which has no reason to allocate any descriptors
and is only called by userland.
+ unshare(2), which has no reason to allocate any descriptors
and is only called by userland.
+ a place in early init that call ksys_unshare() while arranging
the environment for /linuxrc from initrd image to be run. Again, no
reserved descriptors there.
+ coredumping thread in the beginning of do_coredump().
The caller is at the point of signal delivery, which means that it had
already left whatever syscall it might have been in. Which means
that all reservations must have been undone by that point.
+ execve() at the point of no return (in begin_new_exec()).
That's the only place where violation of that constraint on some later
changes is plausible. That one needs to be watched out for.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 6/8] rust: file: add `FileDescriptorReservation`
2024-09-15 18:39 ` Al Viro
@ 2024-09-15 19:34 ` Al Viro
2024-09-16 4:18 ` Al Viro
2024-09-15 20:13 ` Alice Ryhl
1 sibling, 1 reply; 52+ messages in thread
From: Al Viro @ 2024-09-15 19:34 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Greg Kroah-Hartman, Arve Hjønnevåg,
Todd Kjos, Martijn Coenen, Joel Fernandes, Carlos Llamas,
Suren Baghdasaryan, Dan Williams, Matthew Wilcox, Thomas Gleixner,
Daniel Xu, Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, Sep 15, 2024 at 07:39:05PM +0100, Al Viro wrote:
> 2) calling thread MUST NOT unshare descriptor table while it has
> any reserved descriptors. I.e.
> fd = get_unused_fd();
> unshare_files();
> fd_install(fd, file);
> is a bug. Reservations are discarded by that. Getting rid of that
> constraint would require tracking the sets of reserved descriptors
> separately for each thread that happens to share the descriptor table.
> Conceptually they *are* per-thread - the same thread that has done
> reservation must either discard it or use it. However, it's easier to
> keep the "it's reserved by some thread" represented in descriptor table
> itself (bit set in ->open_fds bitmap, file reference in ->fd[] array is
> NULL) than try and keep track of who's reserved what. The constraint is
> basically "all reservations can stay with the old copy", i.e. "caller has
> no reservations of its own to transfer into the new private copy it gets".
FWIW, I toyed with the idea of having reservations kept per-thread;
it is possible and it simplifies some things, but I hadn't been able to
find a way to do that without buggering syscall latency for open() et.al.
It would keep track of the set of reservations in task_struct (count,
two-element array for the first two + page pointer for spillovers,
for the rare threads that need more than two reserved simultaneously).
Representation in fdtable:
state open_fds bit value in ->fd[] array
free clear 0UL
reserved set 0UL
uncommitted set 1UL|(unsigned long)file
open set (unsigned long)file
with file lookup treating any odd value as 0 (i.e. as reserved)
fd_install() switching reserved to uncommitted *AND* separate
"commit" operation that does this:
if current->reservation_count == 0
return
if failure
for each descriptor in our reserved set
v = fdtable->fd[descriptor]
if (v) {
fdtable->fd[descriptor] = 0;
fput((struct file *)(v & ~1);
}
clear bit in fdtable->open_fds[]
else
for each descriptor in our reserved set
v = fdtable->fd[descriptor]
if (v)
fdtable->fd[descriptor] = v & ~1;
else
BUG
current->reservation_count = 0
That "commit" thing would be called on return from syscall
for userland threads and would be called explicitly in
kernel threads that have a descriptor table and work with
it.
The benefit would be that fd_install() would *NOT* have to be done
after the last possible failure exit - something that installs
a lot of files wouldn't have to play convoluted games on cleanup.
Simply returning an error would do the right thing.
Two stores into ->fd[] instead of one is not a big deal;
however, anything like task_work_add() to arrange the call
of "commit" ends up being bloody awful.
We could have it called from syscall glue directly, but
that means touching assembler for quite a few architectures,
and I hadn't gotten around to that. Can be done, but it's
not a pleasant work...
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 6/8] rust: file: add `FileDescriptorReservation`
2024-09-15 18:39 ` Al Viro
2024-09-15 19:34 ` Al Viro
@ 2024-09-15 20:13 ` Alice Ryhl
2024-09-15 22:01 ` Al Viro
1 sibling, 1 reply; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 20:13 UTC (permalink / raw)
To: Al Viro
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Greg Kroah-Hartman, Arve Hjønnevåg,
Todd Kjos, Martijn Coenen, Joel Fernandes, Carlos Llamas,
Suren Baghdasaryan, Dan Williams, Matthew Wilcox, Thomas Gleixner,
Daniel Xu, Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, Sep 15, 2024 at 8:39 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, Sep 15, 2024 at 02:31:32PM +0000, Alice Ryhl wrote:
>
> > +impl Drop for FileDescriptorReservation {
> > + fn drop(&mut self) {
> > + // SAFETY: By the type invariants of this type, `self.fd` was previously returned by
> > + // `get_unused_fd_flags`. We have not yet used the fd, so it is still valid, and `current`
> > + // still refers to the same task, as this type cannot be moved across task boundaries.
> > + unsafe { bindings::put_unused_fd(self.fd) };
> > + }
> > +}
>
> FWIW, it's a bit more delicate. The real rules for API users are
>
> 1) anything you get from get_unused_fd_flags() (well, alloc_fd(),
> internally) must be passed either to put_unused_fd() or fd_install() before
> you return from syscall. That should be done by the same thread and
> all calls of put_unused_fd() or fd_install() should be paired with
> some get_unused_fd_flags() in that manner (i.e. by the same thread,
> within the same syscall, etc.)
Ok, we have to use it before returning from the syscall. That's fine.
What happens if I call `get_unused_fd_flags`, and then never call
`put_unused_fd`? Assume that I don't try to use the fd in the future,
and that I just forgot to clean up after myself.
> 2) calling thread MUST NOT unshare descriptor table while it has
> any reserved descriptors. I.e.
> fd = get_unused_fd();
> unshare_files();
> fd_install(fd, file);
> is a bug. Reservations are discarded by that. Getting rid of that
> constraint would require tracking the sets of reserved descriptors
> separately for each thread that happens to share the descriptor table.
> Conceptually they *are* per-thread - the same thread that has done
> reservation must either discard it or use it. However, it's easier to
> keep the "it's reserved by some thread" represented in descriptor table
> itself (bit set in ->open_fds bitmap, file reference in ->fd[] array is
> NULL) than try and keep track of who's reserved what. The constraint is
> basically "all reservations can stay with the old copy", i.e. "caller has
> no reservations of its own to transfer into the new private copy it gets".
> It's not particularly onerous[*] and it simplifies things
> quite a bit. However, if we are documenting thing, it needs to be
> put explicitly. With respect to Rust, if you do e.g. binfmt-in-rust
> support it will immediately become an issue - begin_new_exec() is calling
> unshare_files(), so the example above can become an issue.
>
> Internally (in fs/file.c, that is) we have additional safety
> rule - anything that might be given an arbitrary descriptor (e.g.
> do_dup2() destination can come directly from dup2(2) argument,
> file_close_fd_locked() victim can come directly from close(2) one,
> etc.) must leave reserved descriptors alone. Not an issue API users
> need to watch out for, though.
>
> [*] unsharing the descriptor table is done by
> + close_range(2), which has no reason to allocate any descriptors
> and is only called by userland.
> + unshare(2), which has no reason to allocate any descriptors
> and is only called by userland.
> + a place in early init that call ksys_unshare() while arranging
> the environment for /linuxrc from initrd image to be run. Again, no
> reserved descriptors there.
> + coredumping thread in the beginning of do_coredump().
> The caller is at the point of signal delivery, which means that it had
> already left whatever syscall it might have been in. Which means
> that all reservations must have been undone by that point.
> + execve() at the point of no return (in begin_new_exec()).
> That's the only place where violation of that constraint on some later
> changes is plausible. That one needs to be watched out for.
Thanks for going through that. From a Rust perspective, it sounds
easiest to just declare that execve() is an unsafe operation, and that
one of the conditions for using it is that you don't hold any fd
reservations. Trying to encode this in the fd reservation logic seems
too onerous, and I'm guessing execve is not used particularly often
anyway.
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 4/8] rust: cred: add Rust abstraction for `struct cred`
2024-09-15 14:31 ` [PATCH v10 4/8] rust: cred: add Rust abstraction for `struct cred` Alice Ryhl
@ 2024-09-15 20:24 ` Kees Cook
2024-09-15 20:55 ` Alice Ryhl
2024-09-19 7:57 ` Paul Moore
1 sibling, 1 reply; 52+ messages in thread
From: Kees Cook @ 2024-09-15 20:24 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel
On Sun, Sep 15, 2024 at 02:31:30PM +0000, Alice Ryhl wrote:
> From: Wedson Almeida Filho <wedsonaf@gmail.com>
>
> Add a wrapper around `struct cred` called `Credential`, and provide
> functionality to get the `Credential` associated with a `File`.
>
> Rust Binder must check the credentials of processes when they attempt to
> perform various operations, and these checks usually take a
> `&Credential` as parameter. The security_binder_set_context_mgr function
> would be one example. This patch is necessary to access these security_*
> methods from Rust.
>
> This Rust abstraction makes the following assumptions about the C side:
> * `struct cred` is refcounted with `get_cred`/`put_cred`.
Yes
> * It's okay to transfer a `struct cred` across threads, that is, you do
> not need to call `put_cred` on the same thread as where you called
> `get_cred`.
Yes
> * The `euid` field of a `struct cred` never changes after
> initialization.
"after initialization", yes. The bprm cred during exec is special in
that it gets updated (bprm_fill_uid) before it is installed into current
via commit_creds() in begin_new_exec() (the point of no return for
exec).
> * The `f_cred` field of a `struct file` never changes after
> initialization.
Yes.
>
> Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> Co-developed-by: Alice Ryhl <aliceryhl@google.com>
> Reviewed-by: Trevor Gross <tmgross@umich.edu>
> Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
> Reviewed-by: Gary Guo <gary@garyguo.net>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 4/8] rust: cred: add Rust abstraction for `struct cred`
2024-09-15 20:24 ` Kees Cook
@ 2024-09-15 20:55 ` Alice Ryhl
0 siblings, 0 replies; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 20:55 UTC (permalink / raw)
To: Kees Cook
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel
On Sun, Sep 15, 2024 at 10:24 PM Kees Cook <kees@kernel.org> wrote:
>
> On Sun, Sep 15, 2024 at 02:31:30PM +0000, Alice Ryhl wrote:
> > From: Wedson Almeida Filho <wedsonaf@gmail.com>
> >
> > Add a wrapper around `struct cred` called `Credential`, and provide
> > functionality to get the `Credential` associated with a `File`.
> >
> > Rust Binder must check the credentials of processes when they attempt to
> > perform various operations, and these checks usually take a
> > `&Credential` as parameter. The security_binder_set_context_mgr function
> > would be one example. This patch is necessary to access these security_*
> > methods from Rust.
> >
> > This Rust abstraction makes the following assumptions about the C side:
> > * `struct cred` is refcounted with `get_cred`/`put_cred`.
>
> Yes
>
> > * It's okay to transfer a `struct cred` across threads, that is, you do
> > not need to call `put_cred` on the same thread as where you called
> > `get_cred`.
>
> Yes
>
> > * The `euid` field of a `struct cred` never changes after
> > initialization.
>
> "after initialization", yes. The bprm cred during exec is special in
> that it gets updated (bprm_fill_uid) before it is installed into current
> via commit_creds() in begin_new_exec() (the point of no return for
> exec).
I think it will be pretty normal to need different Rust types for pre-
and post-initialization of a C value. When a value is not yet fully
initialized, you usually get some extra powers (modify otherwise
immutable fields), but probably also lose some powers (you can't share
it with other threads yet). I can document that this type should not
be used with the bprm cred during exec.
> > * The `f_cred` field of a `struct file` never changes after
> > initialization.
>
> Yes.
>
> >
> > Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> > Co-developed-by: Alice Ryhl <aliceryhl@google.com>
> > Reviewed-by: Trevor Gross <tmgross@umich.edu>
> > Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> > Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
> > Reviewed-by: Gary Guo <gary@garyguo.net>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>
> Reviewed-by: Kees Cook <kees@kernel.org>
Thanks for the review!
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 5/8] rust: security: add abstraction for secctx
2024-09-15 14:31 ` [PATCH v10 5/8] rust: security: add abstraction for secctx Alice Ryhl
@ 2024-09-15 20:58 ` Kees Cook
2024-09-15 21:07 ` Alice Ryhl
2024-09-19 7:56 ` Paul Moore
1 sibling, 1 reply; 52+ messages in thread
From: Kees Cook @ 2024-09-15 20:58 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel
On Sun, Sep 15, 2024 at 02:31:31PM +0000, Alice Ryhl wrote:
> Add an abstraction for viewing the string representation of a security
> context.
Hm, this may collide with "LSM: Move away from secids" is going to happen.
https://lore.kernel.org/all/20240830003411.16818-1-casey@schaufler-ca.com/
This series is not yet landed, but in the future, the API changes should
be something like this, though the "lsmblob" name is likely to change to
"lsmprop"?
security_cred_getsecid() -> security_cred_getlsmblob()
security_secid_to_secctx() -> security_lsmblob_to_secctx()
> This is needed by Rust Binder because it has a feature where a process
> can view the string representation of the security context for incoming
> transactions. The process can use that to authenticate incoming
> transactions, and since the feature is provided by the kernel, the
> process can trust that the security context is legitimate.
>
> This abstraction makes the following assumptions about the C side:
> * When a call to `security_secid_to_secctx` is successful, it returns a
> pointer and length. The pointer references a byte string and is valid
> for reading for that many bytes.
Yes. (len includes trailing C-String NUL character.)
> * The string may be referenced until `security_release_secctx` is
> called.
Yes.
> * If CONFIG_SECURITY is set, then the three methods mentioned in
> rust/helpers are available without a helper. (That is, they are not a
> #define or `static inline`.)
Yes.
>
> Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
> Reviewed-by: Trevor Gross <tmgross@umich.edu>
> Reviewed-by: Gary Guo <gary@garyguo.net>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 5/8] rust: security: add abstraction for secctx
2024-09-15 20:58 ` Kees Cook
@ 2024-09-15 21:07 ` Alice Ryhl
2024-09-16 15:40 ` Casey Schaufler
0 siblings, 1 reply; 52+ messages in thread
From: Alice Ryhl @ 2024-09-15 21:07 UTC (permalink / raw)
To: Kees Cook
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel
On Sun, Sep 15, 2024 at 10:58 PM Kees Cook <kees@kernel.org> wrote:
>
> On Sun, Sep 15, 2024 at 02:31:31PM +0000, Alice Ryhl wrote:
> > Add an abstraction for viewing the string representation of a security
> > context.
>
> Hm, this may collide with "LSM: Move away from secids" is going to happen.
> https://lore.kernel.org/all/20240830003411.16818-1-casey@schaufler-ca.com/
>
> This series is not yet landed, but in the future, the API changes should
> be something like this, though the "lsmblob" name is likely to change to
> "lsmprop"?
> security_cred_getsecid() -> security_cred_getlsmblob()
> security_secid_to_secctx() -> security_lsmblob_to_secctx()
Thanks for the heads up. I'll make sure to look into how this
interacts with those changes.
> > This is needed by Rust Binder because it has a feature where a process
> > can view the string representation of the security context for incoming
> > transactions. The process can use that to authenticate incoming
> > transactions, and since the feature is provided by the kernel, the
> > process can trust that the security context is legitimate.
> >
> > This abstraction makes the following assumptions about the C side:
> > * When a call to `security_secid_to_secctx` is successful, it returns a
> > pointer and length. The pointer references a byte string and is valid
> > for reading for that many bytes.
>
> Yes. (len includes trailing C-String NUL character.)
I suppose the NUL character implies that this API always returns a
non-zero length? I could simplify the patch a little bit by not
handling empty strings.
It looks like the CONFIG_SECURITY=n case returns -EOPNOTSUPP, so we
don't get an empty string from that case, at least.
> > * The string may be referenced until `security_release_secctx` is
> > called.
>
> Yes.
>
> > * If CONFIG_SECURITY is set, then the three methods mentioned in
> > rust/helpers are available without a helper. (That is, they are not a
> > #define or `static inline`.)
>
> Yes.
>
> >
> > Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> > Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
> > Reviewed-by: Trevor Gross <tmgross@umich.edu>
> > Reviewed-by: Gary Guo <gary@garyguo.net>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>
> Reviewed-by: Kees Cook <kees@kernel.org>
Thanks for the review!
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 3/8] rust: file: add Rust abstraction for `struct file`
2024-09-15 14:31 ` [PATCH v10 3/8] rust: file: add Rust abstraction for `struct file` Alice Ryhl
@ 2024-09-15 21:51 ` Gary Guo
0 siblings, 0 replies; 52+ messages in thread
From: Gary Guo @ 2024-09-15 21:51 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, 15 Sep 2024 14:31:29 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> From: Wedson Almeida Filho <wedsonaf@gmail.com>
>
> This abstraction makes it possible to manipulate the open files for a
> process. The new `File` struct wraps the C `struct file`. When accessing
> it using the smart pointer `ARef<File>`, the pointer will own a
> reference count to the file. When accessing it as `&File`, then the
> reference does not own a refcount, but the borrow checker will ensure
> that the reference count does not hit zero while the `&File` is live.
>
> Since this is intended to manipulate the open files of a process, we
> introduce an `fget` constructor that corresponds to the C `fget`
> method. In future patches, it will become possible to create a new fd in
> a process and bind it to a `File`. Rust Binder will use these to send
> fds from one process to another.
>
> We also provide a method for accessing the file's flags. Rust Binder
> will use this to access the flags of the Binder fd to check whether the
> non-blocking flag is set, which affects what the Binder ioctl does.
>
> This introduces a struct for the EBADF error type, rather than just
> using the Error type directly. This has two advantages:
> * `File::fget` returns a `Result<ARef<File>, BadFdError>`, which the
> compiler will represent as a single pointer, with null being an error.
> This is possible because the compiler understands that `BadFdError`
> has only one possible value, and it also understands that the
> `ARef<File>` smart pointer is guaranteed non-null.
> * Additionally, we promise to users of the method that the method can
> only fail with EBADF, which means that they can rely on this promise
> without having to inspect its implementation.
> That said, there are also two disadvantages:
> * Defining additional error types involves boilerplate.
> * The question mark operator will only utilize the `From` trait once,
> which prevents you from using the question mark operator on
> `BadFdError` in methods that return some third error type that the
> kernel `Error` is convertible into. (However, it works fine in methods
> that return `Error`.)
>
> Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> Co-developed-by: Daniel Xu <dxu@dxuuu.xyz>
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> Co-developed-by: Alice Ryhl <aliceryhl@google.com>
> Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
> ---
> fs/file.c | 7 +
> rust/bindings/bindings_helper.h | 2 +
> rust/helpers/fs.c | 12 ++
> rust/helpers/helpers.c | 1 +
> rust/kernel/fs.rs | 8 +
> rust/kernel/fs/file.rs | 375 ++++++++++++++++++++++++++++++++++++++++
> rust/kernel/lib.rs | 1 +
> 7 files changed, 406 insertions(+)
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 6/8] rust: file: add `FileDescriptorReservation`
2024-09-15 20:13 ` Alice Ryhl
@ 2024-09-15 22:01 ` Al Viro
2024-09-15 22:05 ` Al Viro
0 siblings, 1 reply; 52+ messages in thread
From: Al Viro @ 2024-09-15 22:01 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Greg Kroah-Hartman, Arve Hjønnevåg,
Todd Kjos, Martijn Coenen, Joel Fernandes, Carlos Llamas,
Suren Baghdasaryan, Dan Williams, Matthew Wilcox, Thomas Gleixner,
Daniel Xu, Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
> What happens if I call `get_unused_fd_flags`, and then never call
> `put_unused_fd`? Assume that I don't try to use the fd in the future,
> and that I just forgot to clean up after myself.
Moderate amount of bogosities while the thread exists. For one thing,
descriptor is leaked - for open() et.al. it will look like it's in use.
For close() it will look like it's _not_ open (i.e. trying to close it
from userland will quietly do nothing). Trying to overwrite it with
dup2() will keep failing with -EBUSY.
Kernel-side it definitely violates assertions, but currently nothing
will break. Might or might not remain true in the future. Doing that
again and again would lead to inflated descriptor table, but then
so would dup2(0, something_large).
IOW, it would be a bug, but it's probably not going to be high impact
security hole.
> > + execve() at the point of no return (in begin_new_exec()).
execve(2), sorry.
> > That's the only place where violation of that constraint on some later
> > changes is plausible. That one needs to be watched out for.
> Thanks for going through that.
I'm in the middle of writing documentation on the descriptor table and
struct file handling right now anyway...
> From a Rust perspective, it sounds
> easiest to just declare that execve() is an unsafe operation, and that
> one of the conditions for using it is that you don't hold any fd
> reservations. Trying to encode this in the fd reservation logic seems
> too onerous, and I'm guessing execve is not used particularly often
> anyway.
Sorry, bad editing on my part - I should've clearly marked execve() as a
syscall. It's not that it's an unsafe operation - it's only called from
userland anyway, so it's not going to happen in scope of any reserved
descriptor.
The problem is different:
* userland calls execve("/usr/bin/whatever", argv, envp)
* that proceeds through several wrappers to do_execveat_common().
There are several syscalls in that family and they all converge
to do_execveat_common() - wrappers just deal with difference
in calling conventions.
* do_execveat_common() set up exec context (struct linux_binprm).
That opens the binary to be executed, creates memory context
to be used, calculates argc, sets up argv and envp on what will
become the userland stack for the new program, etc. - basically,
all the work for marshalling the data from caller's memory.
Then it calls bprm_execve(), which is where the rest of the work
will be done.
* bprm_execve() eventually calls exec_binprm(). That calls
search_binary_handler(), which is where we finally get a look
at the binary we are trying to load. search_binary_handler()
goes through the known binary formats (ELF, script, etc.)
and tries to offer the exec context to ->load_binary() of
each.
* ->load_binary() instance looks at the binary (starting with the
first 256 bytes read for us by prepare_binprm() called in the
beginning of search_binary_handler()). If it doesn't have the
right magic values, ->load_binary() just returns -ENOEXEC,
so that the next format would be tried. If it *does* look like
something this format is going to deal with, more sanity checks
are done, things are set up, etc. - details depend upon the
binary format in question. See load_elf_binary() for some taste
of that. Eventually it decides to actually discard the old
memory and switch to new binary. Up to that point it can
return an error - -ENOEXEC for soft ones ("not mine, after all,
try other formats"), something like -EINVAL/-ENOMEM/-EPERM/-EIO/etc.
for hard ones ("fail execve(2) with that error"). _After_ that
point we have nothing to return to; old binary is not mapped
anymore, userland stack frame is gone, etc. Any errors past
that point are treated as "kill me".
At the point of no return we call begin_new_exec(). That
kills all other threads, unshares descriptor table in case it
had been shared wider than the thread group, switches memory
context, etc.
Once begin_new_exec() is done, we can safely map whatever we
want to map, handle relocations, etc. Among other things,
we modify the userland register values saved on syscall entry,
so that once we return from syscall we'll end up with the
right register state at the right userland entry point, etc.
If everything goes fine, ->load_binary() returns 0 into
search_binary_handler() and we are pretty much done - some
cleanups on the way out and off to the loaded binary.
Alternatively, we may decide to mangle the exec context -
that's what #! handling does (see load_script() - basically
it opens the interpreter and massages the arguments, so
that something like debian/rules build-arch turns into
/usr/bin/make -f debian/rules build-arch and tells the
caller to deal with that; this is what the loop in
search_binary_handler() is about).
There's not a lot of binary formats (5 of those currently -
all in fs/binmt_*.c), but there's nothing to prohibit more
of them. If somebody decides to add the infrastructure for
writing those in Rust, begin_new_exec() wrapper will need
to be documented as "never call that in scope of reserved
descriptor". Maybe by marking that wrapper unsafe and
telling the users about the restriction wrt descriptor
reservations, maybe by somehow telling the compiler to
watch out for that - or maybe the constraint will be gone
by that time.
In any case, the underlying constraint ("a thread with
reserved descriptors should not try to get a private
descriptor table until all those descriptors are disposed
of one way or another") needs to be documented.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 7/8] rust: file: add `Kuid` wrapper
2024-09-15 14:31 ` [PATCH v10 7/8] rust: file: add `Kuid` wrapper Alice Ryhl
@ 2024-09-15 22:02 ` Gary Guo
2024-09-23 9:13 ` Alice Ryhl
0 siblings, 1 reply; 52+ messages in thread
From: Gary Guo @ 2024-09-15 22:02 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, 15 Sep 2024 14:31:33 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> Adds a wrapper around `kuid_t` called `Kuid`. This allows us to define
> various operations on kuids such as equality and current_euid. It also
> lets us provide conversions from kuid into userspace values.
>
> Rust Binder needs these operations because it needs to compare kuids for
> equality, and it needs to tell userspace about the pid and uid of
> incoming transactions.
>
> To read kuids from a `struct task_struct`, you must currently use
> various #defines that perform the appropriate field access under an RCU
> read lock. Currently, we do not have a Rust wrapper for rcu_read_lock,
> which means that for this patch, there are two ways forward:
>
> 1. Inline the methods into Rust code, and use __rcu_read_lock directly
> rather than the rcu_read_lock wrapper. This gives up lockdep for
> these usages of RCU.
>
> 2. Wrap the various #defines in helpers and call the helpers from Rust.
>
> This patch uses the second option. One possible disadvantage of the
> second option is the possible introduction of speculation gadgets, but
> as discussed in [1], the risk appears to be acceptable.
>
> Of course, once a wrapper for rcu_read_lock is available, it is
> preferable to use that over either of the two above approaches.
>
> Link: https://lore.kernel.org/all/202312080947.674CD2DC7@keescook/ [1]
> Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
> Reviewed-by: Trevor Gross <tmgross@umich.edu>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/bindings/bindings_helper.h | 1 +
> rust/helpers/task.c | 38 ++++++++++++++++++++++++
> rust/kernel/cred.rs | 5 ++--
> rust/kernel/task.rs | 66 +++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 108 insertions(+), 2 deletions(-)
>
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 51ec78c355c0..e854ccddecee 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -19,6 +19,7 @@
> #include <linux/jiffies.h>
> #include <linux/mdio.h>
> #include <linux/phy.h>
> +#include <linux/pid_namespace.h>
> #include <linux/refcount.h>
> #include <linux/sched.h>
> #include <linux/security.h>
> diff --git a/rust/helpers/task.c b/rust/helpers/task.c
> index 7ac789232d11..7d66487db831 100644
> --- a/rust/helpers/task.c
> +++ b/rust/helpers/task.c
> @@ -17,3 +17,41 @@ void rust_helper_put_task_struct(struct task_struct *t)
> {
> put_task_struct(t);
> }
> +
> +kuid_t rust_helper_task_uid(struct task_struct *task)
> +{
> + return task_uid(task);
> +}
> +
> +kuid_t rust_helper_task_euid(struct task_struct *task)
> +{
> + return task_euid(task);
> +}
> +
> +#ifndef CONFIG_USER_NS
> +uid_t rust_helper_from_kuid(struct user_namespace *to, kuid_t uid)
> +{
> + return from_kuid(to, uid);
> +}
> +#endif /* CONFIG_USER_NS */
nit: it's fine to omit this `ifndef`, see what we do for `errname`.
> +
> +bool rust_helper_uid_eq(kuid_t left, kuid_t right)
> +{
> + return uid_eq(left, right);
> +}
> +
> +kuid_t rust_helper_current_euid(void)
> +{
> + return current_euid();
> +}
> +
> +struct user_namespace *rust_helper_current_user_ns(void)
> +{
> + return current_user_ns();
> +}
> +
> +pid_t rust_helper_task_tgid_nr_ns(struct task_struct *tsk,
> + struct pid_namespace *ns)
> +{
> + return task_tgid_nr_ns(tsk, ns);
> +}
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 367b4bbddd9f..1a36a9f19368 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -9,6 +9,7 @@
> types::{NotThreadSafe, Opaque},
> };
> use core::{
> + cmp::{Eq, PartialEq},
> ffi::{c_int, c_long, c_uint},
> ops::Deref,
> ptr,
> @@ -96,6 +97,12 @@ unsafe impl Sync for Task {}
> /// The type of process identifiers (PIDs).
> type Pid = bindings::pid_t;
>
> +/// The type of user identifiers (UIDs).
> +#[derive(Copy, Clone)]
> +pub struct Kuid {
> + kuid: bindings::kuid_t,
> +}
> +
> impl Task {
> /// Returns a raw pointer to the current task.
> ///
> @@ -157,12 +164,31 @@ pub fn pid(&self) -> Pid {
> unsafe { *ptr::addr_of!((*self.0.get()).pid) }
> }
>
> + /// Returns the UID of the given task.
> + pub fn uid(&self) -> Kuid {
> + // SAFETY: By the type invariant, we know that `self.0` is valid.
> + Kuid::from_raw(unsafe { bindings::task_uid(self.0.get()) })
> + }
> +
> + /// Returns the effective UID of the given task.
> + pub fn euid(&self) -> Kuid {
> + // SAFETY: By the type invariant, we know that `self.0` is valid.
> + Kuid::from_raw(unsafe { bindings::task_euid(self.0.get()) })
> + }
> +
> /// Determines whether the given task has pending signals.
> pub fn signal_pending(&self) -> bool {
> // SAFETY: By the type invariant, we know that `self.0` is valid.
> unsafe { bindings::signal_pending(self.0.get()) != 0 }
> }
>
> + /// Returns the given task's pid in the current pid namespace.
> + pub fn pid_in_current_ns(&self) -> Pid {
> + // SAFETY: We know that `self.0.get()` is valid by the type invariant, and passing a null
> + // pointer as the namespace is correct for using the current namespace.
> + unsafe { bindings::task_tgid_nr_ns(self.0.get(), ptr::null_mut()) }
Do we want to rely on the behaviour of `task_tgid_nr_ns` with null
pointer as namespace, or use `task_tgid_vnr`?
Best,
Gary
> + }
> +
> /// Wakes up the task.
> pub fn wake_up(&self) {
> // SAFETY: By the type invariant, we know that `self.0.get()` is non-null and valid.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 6/8] rust: file: add `FileDescriptorReservation`
2024-09-15 22:01 ` Al Viro
@ 2024-09-15 22:05 ` Al Viro
0 siblings, 0 replies; 52+ messages in thread
From: Al Viro @ 2024-09-15 22:05 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Greg Kroah-Hartman, Arve Hjønnevåg,
Todd Kjos, Martijn Coenen, Joel Fernandes, Carlos Llamas,
Suren Baghdasaryan, Dan Williams, Matthew Wilcox, Thomas Gleixner,
Daniel Xu, Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, Sep 15, 2024 at 11:01:26PM +0100, Al Viro wrote:
> There's not a lot of binary formats (5 of those currently -
> all in fs/binmt_*.c), but there's nothing to prohibit more
binfmt_*.c, sorry.
> of them. If somebody decides to add the infrastructure for
> writing those in Rust, begin_new_exec() wrapper will need
> to be documented as "never call that in scope of reserved
> descriptor". Maybe by marking that wrapper unsafe and
> telling the users about the restriction wrt descriptor
> reservations, maybe by somehow telling the compiler to
> watch out for that - or maybe the constraint will be gone
> by that time.
>
> In any case, the underlying constraint ("a thread with
> reserved descriptors should not try to get a private
> descriptor table until all those descriptors are disposed
> of one way or another") needs to be documented.
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 8/8] rust: file: add abstraction for `poll_table`
2024-09-15 14:31 ` [PATCH v10 8/8] rust: file: add abstraction for `poll_table` Alice Ryhl
@ 2024-09-15 22:24 ` Gary Guo
2024-09-23 9:10 ` Alice Ryhl
0 siblings, 1 reply; 52+ messages in thread
From: Gary Guo @ 2024-09-15 22:24 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, 15 Sep 2024 14:31:34 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> The existing `CondVar` abstraction is a wrapper around
> `wait_queue_head`, but it does not support all use-cases of the C
> `wait_queue_head` type. To be specific, a `CondVar` cannot be registered
> with a `struct poll_table`. This limitation has the advantage that you
> do not need to call `synchronize_rcu` when destroying a `CondVar`.
>
> However, we need the ability to register a `poll_table` with a
> `wait_queue_head` in Rust Binder. To enable this, introduce a type
> called `PollCondVar`, which is like `CondVar` except that you can
> register a `poll_table`. We also introduce `PollTable`, which is a safe
> wrapper around `poll_table` that is intended to be used with
> `PollCondVar`.
>
> The destructor of `PollCondVar` unconditionally calls `synchronize_rcu`
> to ensure that the removal of epoll waiters has fully completed before
> the `wait_queue_head` is destroyed.
>
> That said, `synchronize_rcu` is rather expensive and is not needed in
> all cases: If we have never registered a `poll_table` with the
> `wait_queue_head`, then we don't need to call `synchronize_rcu`. (And
> this is a common case in Binder - not all processes use Binder with
> epoll.) The current implementation does not account for this, but if we
> find that it is necessary to improve this, a future patch could store a
> boolean next to the `wait_queue_head` to keep track of whether a
> `poll_table` has ever been registered.
>
> Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
> Reviewed-by: Trevor Gross <tmgross@umich.edu>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/bindings/bindings_helper.h | 1 +
> rust/kernel/sync.rs | 1 +
> rust/kernel/sync/poll.rs | 121 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 123 insertions(+)
>
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index e854ccddecee..ca13659ded4c 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -20,6 +20,7 @@
> #include <linux/mdio.h>
> #include <linux/phy.h>
> #include <linux/pid_namespace.h>
> +#include <linux/poll.h>
> #include <linux/refcount.h>
> #include <linux/sched.h>
> #include <linux/security.h>
> diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
> index 0ab20975a3b5..bae4a5179c72 100644
> --- a/rust/kernel/sync.rs
> +++ b/rust/kernel/sync.rs
> @@ -11,6 +11,7 @@
> mod condvar;
> pub mod lock;
> mod locked_by;
> +pub mod poll;
>
> pub use arc::{Arc, ArcBorrow, UniqueArc};
> pub use condvar::{new_condvar, CondVar, CondVarTimeoutResult};
> diff --git a/rust/kernel/sync/poll.rs b/rust/kernel/sync/poll.rs
> new file mode 100644
> index 000000000000..d5f17153b424
> --- /dev/null
> +++ b/rust/kernel/sync/poll.rs
> @@ -0,0 +1,121 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +// Copyright (C) 2024 Google LLC.
> +
> +//! Utilities for working with `struct poll_table`.
> +
> +use crate::{
> + bindings,
> + fs::File,
> + prelude::*,
> + sync::{CondVar, LockClassKey},
> + types::Opaque,
> +};
> +use core::ops::Deref;
> +
> +/// Creates a [`PollCondVar`] initialiser with the given name and a newly-created lock class.
> +#[macro_export]
> +macro_rules! new_poll_condvar {
> + ($($name:literal)?) => {
> + $crate::sync::poll::PollCondVar::new(
> + $crate::optional_name!($($name)?), $crate::static_lock_class!()
> + )
> + };
> +}
> +
> +/// Wraps the kernel's `struct poll_table`.
> +///
> +/// # Invariants
> +///
> +/// This struct contains a valid `struct poll_table`.
> +///
> +/// For a `struct poll_table` to be valid, its `_qproc` function must follow the safety
> +/// requirements of `_qproc` functions:
> +///
> +/// * The `_qproc` function is given permission to enqueue a waiter to the provided `poll_table`
> +/// during the call. Once the waiter is removed and an rcu grace period has passed, it must no
> +/// longer access the `wait_queue_head`.
> +#[repr(transparent)]
> +pub struct PollTable(Opaque<bindings::poll_table>);
> +
> +impl PollTable {
> + /// Creates a reference to a [`PollTable`] from a valid pointer.
> + ///
> + /// # Safety
> + ///
> + /// The caller must ensure that for the duration of 'a, the pointer will point at a valid poll
> + /// table (as defined in the type invariants).
> + ///
> + /// The caller must also ensure that the `poll_table` is only accessed via the returned
> + /// reference for the duration of 'a.
> + pub unsafe fn from_ptr<'a>(ptr: *mut bindings::poll_table) -> &'a mut PollTable {
> + // SAFETY: The safety requirements guarantee the validity of the dereference, while the
> + // `PollTable` type being transparent makes the cast ok.
> + unsafe { &mut *ptr.cast() }
> + }
> +
> + fn get_qproc(&self) -> bindings::poll_queue_proc {
> + let ptr = self.0.get();
> + // SAFETY: The `ptr` is valid because it originates from a reference, and the `_qproc`
> + // field is not modified concurrently with this call since we have an immutable reference.
> + unsafe { (*ptr)._qproc }
> + }
> +
> + /// Register this [`PollTable`] with the provided [`PollCondVar`], so that it can be notified
> + /// using the condition variable.
> + pub fn register_wait(&mut self, file: &File, cv: &PollCondVar) {
> + if let Some(qproc) = self.get_qproc() {
> + // SAFETY: The pointers to `file` and `self` need to be valid for the duration of this
> + // call to `qproc`, which they are because they are references.
> + //
> + // The `cv.wait_queue_head` pointer must be valid until an rcu grace period after the
> + // waiter is removed. The `PollCondVar` is pinned, so before `cv.wait_queue_head` can
> + // be destroyed, the destructor must run. That destructor first removes all waiters,
> + // and then waits for an rcu grace period. Therefore, `cv.wait_queue_head` is valid for
> + // long enough.
> + unsafe { qproc(file.as_ptr() as _, cv.wait_queue_head.get(), self.0.get()) };
> + }
Should this be calling `poll_wait` instead?
> + }
> +}
> +
> +/// A wrapper around [`CondVar`] that makes it usable with [`PollTable`].
> +///
> +/// [`CondVar`]: crate::sync::CondVar
> +#[pin_data(PinnedDrop)]
> +pub struct PollCondVar {
> + #[pin]
> + inner: CondVar,
> +}
> +
> +impl PollCondVar {
> + /// Constructs a new condvar initialiser.
> + pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self> {
> + pin_init!(Self {
> + inner <- CondVar::new(name, key),
> + })
> + }
> +}
> +
> +// Make the `CondVar` methods callable on `PollCondVar`.
> +impl Deref for PollCondVar {
> + type Target = CondVar;
> +
> + fn deref(&self) -> &CondVar {
> + &self.inner
> + }
> +}
> +
> +#[pinned_drop]
> +impl PinnedDrop for PollCondVar {
> + fn drop(self: Pin<&mut Self>) {
> + // Clear anything registered using `register_wait`.
> + //
> + // SAFETY: The pointer points at a valid `wait_queue_head`.
> + unsafe { bindings::__wake_up_pollfree(self.inner.wait_queue_head.get()) };
Should this use `wake_up_pollfree` (without the leading __)?
> +
> + // Wait for epoll items to be properly removed.
> + //
> + // SAFETY: Just an FFI call.
> + unsafe { bindings::synchronize_rcu() };
> + }
> +}
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 6/8] rust: file: add `FileDescriptorReservation`
2024-09-15 19:34 ` Al Viro
@ 2024-09-16 4:18 ` Al Viro
0 siblings, 0 replies; 52+ messages in thread
From: Al Viro @ 2024-09-16 4:18 UTC (permalink / raw)
To: linux-fsdevel
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Greg Kroah-Hartman, Arve Hjønnevåg,
Todd Kjos, Martijn Coenen, Joel Fernandes, Carlos Llamas,
Suren Baghdasaryan, Dan Williams, Matthew Wilcox, Thomas Gleixner,
Daniel Xu, Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, Kees Cook, Alice Ryhl
On Sun, Sep 15, 2024 at 08:34:43PM +0100, Al Viro wrote:
> FWIW, I toyed with the idea of having reservations kept per-thread;
> it is possible and it simplifies some things, but I hadn't been able to
> find a way to do that without buggering syscall latency for open() et.al.
Hmm... How about the following:
* add an equivalent of array of pairs (fd, file) to task_struct;
representation could be e.g. (a _VERY_ preliminary variant)
unsigned fd_count;
int fds[2];
struct file *fp[2];
void *spillover;
with 'spillover' being a separately allocated array of pairs to deal with
the moments when we have more than 2 simultaneously reserved descriptors.
Initially NULL, allocated the first time we need more than 2. Always empty
outside of syscall.
* inline primitives:
count_reserved_fds()
reserved_descriptor(index)
reserved_file(index)
* int reserve_fd(flags)
returns -E... or index.
slot = current->fd_count
if (unlikely(slot == 2) && !current->spillover) {
allocate spillover
if failed
return -ENOMEM
set current->spillover
}
if slot is maximal allowed (2 + how much fits into allocated part?)
return -E<something>
fd = get_unused_fd_flags(flags);
if (unlikely(fd < 0))
return fd;
if (likely(slot < 2)) {
current->fds[slot] = fd;
current->fp[slot] = NULL;
} else {
store (fd, NULL) into element #(slot - 2) of current->spillover
}
current->fd_count = slot + 1;
* void install_file(index, file)
if (likely(slot < 2))
current->fp[slot] = file;
else
store file to element #(slot - 2) of current->spillover
* void __commit_reservations(unsigned count, bool failed)
// count == current->fd_count
while (count--) {
fd = reserved_descriptor(count);
file = reserved_file(count);
if (!file)
put_unused_fd(fd);
else if (!failed)
fd_install(fd, file);
else {
put_unused_fd(fd);
fput(file);
}
}
current->fd_count = 0;
* static inline void commit_fd_reservations(bool failed)
called in syscall glue, right after the syscall returns
unsigned slots = current->fd_count;
if (unlikely(slots))
__commit_reservations(slots, failed);
Then we can (in addition to the current use of get_unused_fd_flags() et.al. -
that still works) do e.g. things like
for (i = 0; i < 69; i++) {
index = reserve_fd(FD_CLOEXEC);
if (unlikely(index < 0))
return index;
file = some_driver_shite(some_shite, i);
if (IS_ERR(file))
return PTR_ERR(file);
install_file(index, file); // consumed file
ioctl_result.some_array[i] = reserved_descriptor(index);
....
}
...
if (copy_to_user(arg, &ioctl_result, sizeof(ioctl_result))
return -EFAULT;
...
return 0;
and have it DTRT on all failures, no matter how many files we have added,
etc. - on syscall return we will either commit all reservations
(on success) or release all reserved descriptors and drop all files we
had planned to put into descriptor table. Getting that right manually
is doable (drm has some examples), but it's _not_ pleasant.
The win here is in simpler cleanup code. And it can coexist with the
current API just fine. The PITA is in the need to add the call
of commit_fd_reservations() in syscall exit glue and have that done
on all architectures ;-/
FWIW, I suspect that it won't be slower than the current API, even
if used on hot paths. pipe(2) would be an interesting testcase
for that - converting it is easy, and there's a plenty of loads
where latency of pipe(2) would be visible.
Comments?
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 5/8] rust: security: add abstraction for secctx
2024-09-15 21:07 ` Alice Ryhl
@ 2024-09-16 15:40 ` Casey Schaufler
2024-09-17 13:18 ` Paul Moore
2024-09-22 15:08 ` Alice Ryhl
0 siblings, 2 replies; 52+ messages in thread
From: Casey Schaufler @ 2024-09-16 15:40 UTC (permalink / raw)
To: Alice Ryhl, Kees Cook
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel,
Casey Schaufler
On 9/15/2024 2:07 PM, Alice Ryhl wrote:
> On Sun, Sep 15, 2024 at 10:58 PM Kees Cook <kees@kernel.org> wrote:
>> On Sun, Sep 15, 2024 at 02:31:31PM +0000, Alice Ryhl wrote:
>>> Add an abstraction for viewing the string representation of a security
>>> context.
>> Hm, this may collide with "LSM: Move away from secids" is going to happen.
>> https://lore.kernel.org/all/20240830003411.16818-1-casey@schaufler-ca.com/
>>
>> This series is not yet landed, but in the future, the API changes should
>> be something like this, though the "lsmblob" name is likely to change to
>> "lsmprop"?
>> security_cred_getsecid() -> security_cred_getlsmblob()
>> security_secid_to_secctx() -> security_lsmblob_to_secctx()
The referenced patch set does not change security_cred_getsecid()
nor remove security_secid_to_secctx(). There remain networking interfaces
that are unlikely to ever be allowed to move away from secids. It will
be necessary to either retain some of the secid interfaces or introduce
scaffolding around the lsm_prop structure.
Binder is currently only supported in SELinux, so this isn't a real issue
today. The BPF LSM could conceivably support binder, but only in cases where
SELinux isn't enabled. Should there be additional LSMs that support binder
the hooks would have to be changed to use lsm_prop interfaces, but I have
not included that *yet*.
> Thanks for the heads up. I'll make sure to look into how this
> interacts with those changes.
There will be a follow on patch set as well that replaces the LSMs use
of string/length pairs with a structure. This becomes necessary in cases
where more than one active LSM uses secids and security contexts. This
will affect binder.
>
>>> This is needed by Rust Binder because it has a feature where a process
>>> can view the string representation of the security context for incoming
>>> transactions. The process can use that to authenticate incoming
>>> transactions, and since the feature is provided by the kernel, the
>>> process can trust that the security context is legitimate.
>>>
>>> This abstraction makes the following assumptions about the C side:
>>> * When a call to `security_secid_to_secctx` is successful, it returns a
>>> pointer and length. The pointer references a byte string and is valid
>>> for reading for that many bytes.
>> Yes. (len includes trailing C-String NUL character.)
> I suppose the NUL character implies that this API always returns a
> non-zero length? I could simplify the patch a little bit by not
> handling empty strings.
>
> It looks like the CONFIG_SECURITY=n case returns -EOPNOTSUPP, so we
> don't get an empty string from that case, at least.
>
>>> * The string may be referenced until `security_release_secctx` is
>>> called.
>> Yes.
>>
>>> * If CONFIG_SECURITY is set, then the three methods mentioned in
>>> rust/helpers are available without a helper. (That is, they are not a
>>> #define or `static inline`.)
>> Yes.
>>
>>> Reviewed-by: Benno Lossin <benno.lossin@proton.me>
>>> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
>>> Reviewed-by: Trevor Gross <tmgross@umich.edu>
>>> Reviewed-by: Gary Guo <gary@garyguo.net>
>>> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>> Reviewed-by: Kees Cook <kees@kernel.org>
> Thanks for the review!
>
> Alice
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 5/8] rust: security: add abstraction for secctx
2024-09-16 15:40 ` Casey Schaufler
@ 2024-09-17 13:18 ` Paul Moore
2024-09-22 15:01 ` Alice Ryhl
2024-09-22 15:08 ` Alice Ryhl
1 sibling, 1 reply; 52+ messages in thread
From: Paul Moore @ 2024-09-17 13:18 UTC (permalink / raw)
To: Alice Ryhl
Cc: Casey Schaufler, Kees Cook, James Morris, Serge E. Hallyn,
Miguel Ojeda, Christian Brauner, Alex Gaynor,
Wedson Almeida Filho, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Peter Zijlstra, Alexander Viro,
Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Carlos Llamas, Suren Baghdasaryan,
Dan Williams, Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel
On Mon, Sep 16, 2024 at 11:40 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 9/15/2024 2:07 PM, Alice Ryhl wrote:
> > On Sun, Sep 15, 2024 at 10:58 PM Kees Cook <kees@kernel.org> wrote:
> >> On Sun, Sep 15, 2024 at 02:31:31PM +0000, Alice Ryhl wrote:
> >>> Add an abstraction for viewing the string representation of a security
> >>> context.
> >> Hm, this may collide with "LSM: Move away from secids" is going to happen.
> >> https://lore.kernel.org/all/20240830003411.16818-1-casey@schaufler-ca.com/
> >>
> >> This series is not yet landed, but in the future, the API changes should
> >> be something like this, though the "lsmblob" name is likely to change to
> >> "lsmprop"?
> >> security_cred_getsecid() -> security_cred_getlsmblob()
> >> security_secid_to_secctx() -> security_lsmblob_to_secctx()
>
> The referenced patch set does not change security_cred_getsecid()
> nor remove security_secid_to_secctx(). There remain networking interfaces
> that are unlikely to ever be allowed to move away from secids. It will
> be necessary to either retain some of the secid interfaces or introduce
> scaffolding around the lsm_prop structure ...
First, thanks for CC'ing the LSM list Alice, I appreciate it.
As Kees and Casey already pointed out, there are relevant LSM changes
that are nearing inclusion which might be relevant to the Rust
abstractions. I don't think there is going to be anything too
painful, but I must admit that my Rust knowledge has sadly not
progressed much beyond the most basic "hello world" example.
This brings up the point I really want to discuss: what portions of
the LSM framework are currently accessible to Rust, and what do we
(the LSM devs) need to do to preserve the Rust LSM interfaces when the
LSM framework is modified? While the LSM framework does not change
often, we do modify both the LSM hooks (the security_XXX() calls that
serve as the LSM interface/API) and the LSM callbacks (the individual
LSM hook implementations) on occasion as they are intentionally not
part of any sort of stable API. In a perfect world we/I would have a
good enough understanding of the Rust kernel abstractions and would
submit patches to update the Rust code as appropriate, but that isn't
the current situation and I want to make sure the LSM framework and
the Rust interfaces don't fall out of sync. Do you watch the LSM list
or linux-next for patches that could affect the Rust abstractions? Is
there something else you would recommend?
--
paul-moore.com
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 5/8] rust: security: add abstraction for secctx
2024-09-15 14:31 ` [PATCH v10 5/8] rust: security: add abstraction for secctx Alice Ryhl
2024-09-15 20:58 ` Kees Cook
@ 2024-09-19 7:56 ` Paul Moore
1 sibling, 0 replies; 52+ messages in thread
From: Paul Moore @ 2024-09-19 7:56 UTC (permalink / raw)
To: Alice Ryhl
Cc: James Morris, Serge E. Hallyn, Miguel Ojeda, Christian Brauner,
Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, Sep 15, 2024 at 10:31 AM Alice Ryhl <aliceryhl@google.com> wrote:
>
> Add an abstraction for viewing the string representation of a security
> context.
>
> This is needed by Rust Binder because it has a feature where a process
> can view the string representation of the security context for incoming
> transactions. The process can use that to authenticate incoming
> transactions, and since the feature is provided by the kernel, the
> process can trust that the security context is legitimate.
>
> This abstraction makes the following assumptions about the C side:
> * When a call to `security_secid_to_secctx` is successful, it returns a
> pointer and length. The pointer references a byte string and is valid
> for reading for that many bytes.
> * The string may be referenced until `security_release_secctx` is
> called.
> * If CONFIG_SECURITY is set, then the three methods mentioned in
> rust/helpers are available without a helper. (That is, they are not a
> #define or `static inline`.)
>
> Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
> Reviewed-by: Trevor Gross <tmgross@umich.edu>
> Reviewed-by: Gary Guo <gary@garyguo.net>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/bindings/bindings_helper.h | 1 +
> rust/helpers/helpers.c | 1 +
> rust/helpers/security.c | 20 +++++++++++
> rust/kernel/cred.rs | 8 +++++
> rust/kernel/lib.rs | 1 +
> rust/kernel/security.rs | 74 +++++++++++++++++++++++++++++++++++++++++
> 6 files changed, 105 insertions(+)
I doubt my ACK is strictly necessary here since the Rust bindings
aren't actually modifying anything in the LSM, but just in case ...
Acked-by: Paul Moore <paul@paul-moore.com>
--
paul-moore.com
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 4/8] rust: cred: add Rust abstraction for `struct cred`
2024-09-15 14:31 ` [PATCH v10 4/8] rust: cred: add Rust abstraction for `struct cred` Alice Ryhl
2024-09-15 20:24 ` Kees Cook
@ 2024-09-19 7:57 ` Paul Moore
1 sibling, 0 replies; 52+ messages in thread
From: Paul Moore @ 2024-09-19 7:57 UTC (permalink / raw)
To: Alice Ryhl
Cc: James Morris, Serge E. Hallyn, Miguel Ojeda, Christian Brauner,
Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, Sep 15, 2024 at 10:31 AM Alice Ryhl <aliceryhl@google.com> wrote:
>
> From: Wedson Almeida Filho <wedsonaf@gmail.com>
>
> Add a wrapper around `struct cred` called `Credential`, and provide
> functionality to get the `Credential` associated with a `File`.
>
> Rust Binder must check the credentials of processes when they attempt to
> perform various operations, and these checks usually take a
> `&Credential` as parameter. The security_binder_set_context_mgr function
> would be one example. This patch is necessary to access these security_*
> methods from Rust.
>
> This Rust abstraction makes the following assumptions about the C side:
> * `struct cred` is refcounted with `get_cred`/`put_cred`.
> * It's okay to transfer a `struct cred` across threads, that is, you do
> not need to call `put_cred` on the same thread as where you called
> `get_cred`.
> * The `euid` field of a `struct cred` never changes after
> initialization.
> * The `f_cred` field of a `struct file` never changes after
> initialization.
>
> Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> Co-developed-by: Alice Ryhl <aliceryhl@google.com>
> Reviewed-by: Trevor Gross <tmgross@umich.edu>
> Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
> Reviewed-by: Gary Guo <gary@garyguo.net>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/bindings/bindings_helper.h | 1 +
> rust/helpers/cred.c | 13 +++++++
> rust/helpers/helpers.c | 1 +
> rust/kernel/cred.rs | 76 +++++++++++++++++++++++++++++++++++++++++
> rust/kernel/fs/file.rs | 13 +++++++
> rust/kernel/lib.rs | 1 +
> 6 files changed, 105 insertions(+)
Reviewed-by: Paul Moore <paul@paul-moore.com>
--
paul-moore.com
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 5/8] rust: security: add abstraction for secctx
2024-09-17 13:18 ` Paul Moore
@ 2024-09-22 15:01 ` Alice Ryhl
0 siblings, 0 replies; 52+ messages in thread
From: Alice Ryhl @ 2024-09-22 15:01 UTC (permalink / raw)
To: paul
Cc: a.hindborg, alex.gaynor, aliceryhl, arve, benno.lossin, bjorn3_gh,
boqun.feng, brauner, casey, cmllamas, dan.j.williams, dxu, gary,
gregkh, jmorris, joel, kees, linux-fsdevel, linux-kernel,
linux-security-module, maco, ojeda, peterz, rust-for-linux, serge,
surenb, tglx, tkjos, tmgross, viro, wedsonaf, willy, yakoyoku
On Tue, Sep 17, 2024 at 3:18 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Mon, Sep 16, 2024 at 11:40 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
> > On 9/15/2024 2:07 PM, Alice Ryhl wrote:
> > > On Sun, Sep 15, 2024 at 10:58 PM Kees Cook <kees@kernel.org> wrote:
> > >> On Sun, Sep 15, 2024 at 02:31:31PM +0000, Alice Ryhl wrote:
> > >>> Add an abstraction for viewing the string representation of a security
> > >>> context.
> > >> Hm, this may collide with "LSM: Move away from secids" is going to happen.
> > >> https://lore.kernel.org/all/20240830003411.16818-1-casey@schaufler-ca.com/
> > >>
> > >> This series is not yet landed, but in the future, the API changes should
> > >> be something like this, though the "lsmblob" name is likely to change to
> > >> "lsmprop"?
> > >> security_cred_getsecid() -> security_cred_getlsmblob()
> > >> security_secid_to_secctx() -> security_lsmblob_to_secctx()
> >
> > The referenced patch set does not change security_cred_getsecid()
> > nor remove security_secid_to_secctx(). There remain networking interfaces
> > that are unlikely to ever be allowed to move away from secids. It will
> > be necessary to either retain some of the secid interfaces or introduce
> > scaffolding around the lsm_prop structure ...
>
> First, thanks for CC'ing the LSM list Alice, I appreciate it.
>
> As Kees and Casey already pointed out, there are relevant LSM changes
> that are nearing inclusion which might be relevant to the Rust
> abstractions. I don't think there is going to be anything too
> painful, but I must admit that my Rust knowledge has sadly not
> progressed much beyond the most basic "hello world" example.
We discussed this email in-person at Plumbers. I'll outline what we
discussed here.
> This brings up the point I really want to discuss: what portions of
> the LSM framework are currently accessible to Rust,
It's relatively limited. I'm adding a way to access the secctx as a
string, and a way to manipulate `struct cred`. Basically it just lets
you take and drop refcounts on the credential and pass a credential to
functions.
Other than what is in this patch series, Binder also needs a few other
methods. Here are the signatures:
fn binder_set_context_mgr(mgr: &Credential) -> Result;
fn binder_transaction(from: &Credential, to: &Credential) -> Result;
fn binder_transfer_binder(from: &Credential, to: &Credential) -> Result;
fn binder_transfer_file(from: &Credential, to: &Credential, file: &File) -> Result;
These methods just call into the equivalent C functions. The `Result`
return type can hold either an "Ok" which indicates success, or an "Err"
which indicates an error. In the latter case, it will hold whichever
errno that the C api returns.
> and what do we
> (the LSM devs) need to do to preserve the Rust LSM interfaces when the
> LSM framework is modified? While the LSM framework does not change
> often, we do modify both the LSM hooks (the security_XXX() calls that
> serve as the LSM interface/API) and the LSM callbacks (the individual
> LSM hook implementations) on occasion as they are intentionally not
> part of any sort of stable API.
That's fine. None of the Rust APIs are stable either.
Rust uses the bindgen tool to convert C headers into Rust declarations,
so changes to the C api will result in a build failure. This makes it
easy to discover issues.
> In a perfect world we/I would have a
> good enough understanding of the Rust kernel abstractions and would
> submit patches to update the Rust code as appropriate, but that isn't
> the current situation and I want to make sure the LSM framework and
> the Rust interfaces don't fall out of sync. Do you watch the LSM list
> or linux-next for patches that could affect the Rust abstractions? Is
> there something else you would recommend?
Ideally, you would add a CONFIG_RUST build to your CI setup so that you
catch issues early. Of course, if something slips through, then we run
build tests on linux-next too, so anything that falls through the cracks
should get caught by that.
If anything needs Rust changes, you can CC the rust-for-linux list and
me, and we will take a look. Same applies to review of Rust code.
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 5/8] rust: security: add abstraction for secctx
2024-09-16 15:40 ` Casey Schaufler
2024-09-17 13:18 ` Paul Moore
@ 2024-09-22 15:08 ` Alice Ryhl
2024-09-22 16:50 ` Casey Schaufler
1 sibling, 1 reply; 52+ messages in thread
From: Alice Ryhl @ 2024-09-22 15:08 UTC (permalink / raw)
To: Casey Schaufler
Cc: Kees Cook, Paul Moore, James Morris, Serge E. Hallyn,
Miguel Ojeda, Christian Brauner, Alex Gaynor,
Wedson Almeida Filho, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Peter Zijlstra, Alexander Viro,
Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Carlos Llamas, Suren Baghdasaryan,
Dan Williams, Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel
On Mon, Sep 16, 2024 at 5:40 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 9/15/2024 2:07 PM, Alice Ryhl wrote:
> > On Sun, Sep 15, 2024 at 10:58 PM Kees Cook <kees@kernel.org> wrote:
> >> On Sun, Sep 15, 2024 at 02:31:31PM +0000, Alice Ryhl wrote:
> >>> Add an abstraction for viewing the string representation of a security
> >>> context.
> >> Hm, this may collide with "LSM: Move away from secids" is going to happen.
> >> https://lore.kernel.org/all/20240830003411.16818-1-casey@schaufler-ca.com/
> >>
> >> This series is not yet landed, but in the future, the API changes should
> >> be something like this, though the "lsmblob" name is likely to change to
> >> "lsmprop"?
> >> security_cred_getsecid() -> security_cred_getlsmblob()
> >> security_secid_to_secctx() -> security_lsmblob_to_secctx()
>
> The referenced patch set does not change security_cred_getsecid()
> nor remove security_secid_to_secctx(). There remain networking interfaces
> that are unlikely to ever be allowed to move away from secids. It will
> be necessary to either retain some of the secid interfaces or introduce
> scaffolding around the lsm_prop structure.
>
> Binder is currently only supported in SELinux, so this isn't a real issue
> today. The BPF LSM could conceivably support binder, but only in cases where
> SELinux isn't enabled. Should there be additional LSMs that support binder
> the hooks would have to be changed to use lsm_prop interfaces, but I have
> not included that *yet*.
>
> > Thanks for the heads up. I'll make sure to look into how this
> > interacts with those changes.
>
> There will be a follow on patch set as well that replaces the LSMs use
> of string/length pairs with a structure. This becomes necessary in cases
> where more than one active LSM uses secids and security contexts. This
> will affect binder.
When are these things expected to land? If this patch series gets
merged in the same kernel cycle as those changes, it'll probably need
special handling.
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 5/8] rust: security: add abstraction for secctx
2024-09-22 15:08 ` Alice Ryhl
@ 2024-09-22 16:50 ` Casey Schaufler
2024-09-22 17:04 ` Alice Ryhl
0 siblings, 1 reply; 52+ messages in thread
From: Casey Schaufler @ 2024-09-22 16:50 UTC (permalink / raw)
To: Alice Ryhl
Cc: Kees Cook, Paul Moore, James Morris, Serge E. Hallyn,
Miguel Ojeda, Christian Brauner, Alex Gaynor,
Wedson Almeida Filho, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Peter Zijlstra, Alexander Viro,
Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Carlos Llamas, Suren Baghdasaryan,
Dan Williams, Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel,
Casey Schaufler
On 9/22/2024 8:08 AM, Alice Ryhl wrote:
> On Mon, Sep 16, 2024 at 5:40 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 9/15/2024 2:07 PM, Alice Ryhl wrote:
>>> On Sun, Sep 15, 2024 at 10:58 PM Kees Cook <kees@kernel.org> wrote:
>>>> On Sun, Sep 15, 2024 at 02:31:31PM +0000, Alice Ryhl wrote:
>>>>> Add an abstraction for viewing the string representation of a security
>>>>> context.
>>>> Hm, this may collide with "LSM: Move away from secids" is going to happen.
>>>> https://lore.kernel.org/all/20240830003411.16818-1-casey@schaufler-ca.com/
>>>>
>>>> This series is not yet landed, but in the future, the API changes should
>>>> be something like this, though the "lsmblob" name is likely to change to
>>>> "lsmprop"?
>>>> security_cred_getsecid() -> security_cred_getlsmblob()
>>>> security_secid_to_secctx() -> security_lsmblob_to_secctx()
>> The referenced patch set does not change security_cred_getsecid()
>> nor remove security_secid_to_secctx(). There remain networking interfaces
>> that are unlikely to ever be allowed to move away from secids. It will
>> be necessary to either retain some of the secid interfaces or introduce
>> scaffolding around the lsm_prop structure.
>>
>> Binder is currently only supported in SELinux, so this isn't a real issue
>> today. The BPF LSM could conceivably support binder, but only in cases where
>> SELinux isn't enabled. Should there be additional LSMs that support binder
>> the hooks would have to be changed to use lsm_prop interfaces, but I have
>> not included that *yet*.
>>
>>> Thanks for the heads up. I'll make sure to look into how this
>>> interacts with those changes.
>> There will be a follow on patch set as well that replaces the LSMs use
>> of string/length pairs with a structure. This becomes necessary in cases
>> where more than one active LSM uses secids and security contexts. This
>> will affect binder.
> When are these things expected to land?
I would like them to land in 6.14, but history would lead me to think
it will be later than that. A lot will depend on how well the large set
of LSM changes that went into 6.12 are received.
> If this patch series gets
> merged in the same kernel cycle as those changes, it'll probably need
> special handling.
Yes, this is the fundamental downside of the tree merge development model.
> Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 5/8] rust: security: add abstraction for secctx
2024-09-22 16:50 ` Casey Schaufler
@ 2024-09-22 17:04 ` Alice Ryhl
0 siblings, 0 replies; 52+ messages in thread
From: Alice Ryhl @ 2024-09-22 17:04 UTC (permalink / raw)
To: Casey Schaufler
Cc: Kees Cook, Paul Moore, James Morris, Serge E. Hallyn,
Miguel Ojeda, Christian Brauner, Alex Gaynor,
Wedson Almeida Filho, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Peter Zijlstra, Alexander Viro,
Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Carlos Llamas, Suren Baghdasaryan,
Dan Williams, Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel
On Sun, Sep 22, 2024 at 6:50 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 9/22/2024 8:08 AM, Alice Ryhl wrote:
> > On Mon, Sep 16, 2024 at 5:40 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >> On 9/15/2024 2:07 PM, Alice Ryhl wrote:
> >>> On Sun, Sep 15, 2024 at 10:58 PM Kees Cook <kees@kernel.org> wrote:
> >>>> On Sun, Sep 15, 2024 at 02:31:31PM +0000, Alice Ryhl wrote:
> >>>>> Add an abstraction for viewing the string representation of a security
> >>>>> context.
> >>>> Hm, this may collide with "LSM: Move away from secids" is going to happen.
> >>>> https://lore.kernel.org/all/20240830003411.16818-1-casey@schaufler-ca.com/
> >>>>
> >>>> This series is not yet landed, but in the future, the API changes should
> >>>> be something like this, though the "lsmblob" name is likely to change to
> >>>> "lsmprop"?
> >>>> security_cred_getsecid() -> security_cred_getlsmblob()
> >>>> security_secid_to_secctx() -> security_lsmblob_to_secctx()
> >> The referenced patch set does not change security_cred_getsecid()
> >> nor remove security_secid_to_secctx(). There remain networking interfaces
> >> that are unlikely to ever be allowed to move away from secids. It will
> >> be necessary to either retain some of the secid interfaces or introduce
> >> scaffolding around the lsm_prop structure.
> >>
> >> Binder is currently only supported in SELinux, so this isn't a real issue
> >> today. The BPF LSM could conceivably support binder, but only in cases where
> >> SELinux isn't enabled. Should there be additional LSMs that support binder
> >> the hooks would have to be changed to use lsm_prop interfaces, but I have
> >> not included that *yet*.
> >>
> >>> Thanks for the heads up. I'll make sure to look into how this
> >>> interacts with those changes.
> >> There will be a follow on patch set as well that replaces the LSMs use
> >> of string/length pairs with a structure. This becomes necessary in cases
> >> where more than one active LSM uses secids and security contexts. This
> >> will affect binder.
> > When are these things expected to land?
>
> I would like them to land in 6.14, but history would lead me to think
> it will be later than that. A lot will depend on how well the large set
> of LSM changes that went into 6.12 are received.
>
> > If this patch series gets
> > merged in the same kernel cycle as those changes, it'll probably need
> > special handling.
>
> Yes, this is the fundamental downside of the tree merge development model.
Okay. I'm hoping to land this series in 6.13 so hopefully we won't
need to do anything special.
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 8/8] rust: file: add abstraction for `poll_table`
2024-09-15 22:24 ` Gary Guo
@ 2024-09-23 9:10 ` Alice Ryhl
0 siblings, 0 replies; 52+ messages in thread
From: Alice Ryhl @ 2024-09-23 9:10 UTC (permalink / raw)
To: Gary Guo
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Mon, Sep 16, 2024 at 12:24 AM Gary Guo <gary@garyguo.net> wrote:
>
> On Sun, 15 Sep 2024 14:31:34 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
> > + /// Register this [`PollTable`] with the provided [`PollCondVar`], so that it can be notified
> > + /// using the condition variable.
> > + pub fn register_wait(&mut self, file: &File, cv: &PollCondVar) {
> > + if let Some(qproc) = self.get_qproc() {
> > + // SAFETY: The pointers to `file` and `self` need to be valid for the duration of this
> > + // call to `qproc`, which they are because they are references.
> > + //
> > + // The `cv.wait_queue_head` pointer must be valid until an rcu grace period after the
> > + // waiter is removed. The `PollCondVar` is pinned, so before `cv.wait_queue_head` can
> > + // be destroyed, the destructor must run. That destructor first removes all waiters,
> > + // and then waits for an rcu grace period. Therefore, `cv.wait_queue_head` is valid for
> > + // long enough.
> > + unsafe { qproc(file.as_ptr() as _, cv.wait_queue_head.get(), self.0.get()) };
> > + }
>
> Should this be calling `poll_wait` instead?
>
> > +#[pinned_drop]
> > +impl PinnedDrop for PollCondVar {
> > + fn drop(self: Pin<&mut Self>) {
> > + // Clear anything registered using `register_wait`.
> > + //
> > + // SAFETY: The pointer points at a valid `wait_queue_head`.
> > + unsafe { bindings::__wake_up_pollfree(self.inner.wait_queue_head.get()) };
>
> Should this use `wake_up_pollfree` (without the leading __)?
For both cases, that would require a Rust helper. But I suppose we could do it.
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 7/8] rust: file: add `Kuid` wrapper
2024-09-15 22:02 ` Gary Guo
@ 2024-09-23 9:13 ` Alice Ryhl
2024-09-26 16:33 ` Christian Brauner
2024-09-26 16:35 ` [PATCH] [RFC] rust: add PidNamespace wrapper Christian Brauner
0 siblings, 2 replies; 52+ messages in thread
From: Alice Ryhl @ 2024-09-23 9:13 UTC (permalink / raw)
To: Gary Guo
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Mon, Sep 16, 2024 at 12:02 AM Gary Guo <gary@garyguo.net> wrote:
>
> On Sun, 15 Sep 2024 14:31:33 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
> > + /// Returns the given task's pid in the current pid namespace.
> > + pub fn pid_in_current_ns(&self) -> Pid {
> > + // SAFETY: We know that `self.0.get()` is valid by the type invariant, and passing a null
> > + // pointer as the namespace is correct for using the current namespace.
> > + unsafe { bindings::task_tgid_nr_ns(self.0.get(), ptr::null_mut()) }
>
> Do we want to rely on the behaviour of `task_tgid_nr_ns` with null
> pointer as namespace, or use `task_tgid_vnr`?
Hmm. Looks like C Binder actually does:
trd->sender_pid = task_tgid_nr_ns(sender, task_active_pid_ns(current));
Not sure why I'm using a null pointer here.
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 1/8] rust: types: add `NotThreadSafe`
2024-09-15 14:31 ` [PATCH v10 1/8] rust: types: add `NotThreadSafe` Alice Ryhl
2024-09-15 15:38 ` Gary Guo
@ 2024-09-24 19:45 ` Serge E. Hallyn
2024-09-25 11:06 ` Alice Ryhl
1 sibling, 1 reply; 52+ messages in thread
From: Serge E. Hallyn @ 2024-09-24 19:45 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, Sep 15, 2024 at 02:31:27PM +0000, Alice Ryhl wrote:
> This introduces a new marker type for types that shouldn't be thread
> safe. By adding a field of this type to a struct, it becomes non-Send
> and non-Sync, which means that it cannot be accessed in any way from
> threads other than the one it was created on.
>
> This is useful for APIs that require globals such as `current` to remain
> constant while the value exists.
>
> We update two existing users in the Kernel to use this helper:
>
> * `Task::current()` - moving the return type of this value to a
> different thread would not be safe as you can no longer be guaranteed
> that the `current` pointer remains valid.
> * Lock guards. Mutexes and spinlocks should be unlocked on the same
> thread as where they were locked, so we enforce this using the Send
> trait.
Hi,
this sounds useful, however from kernel side when I think thread-safe,
I think must not be used across a sleep. Would something like ThreadLocked
or LockedToThread make sense?
(I could be way off base here...)
thanks,
-serge
> There are also additional users in later patches of this patchset. See
> [1] and [2] for the discussion that led to the introduction of this
> patch.
>
> Link: https://lore.kernel.org/all/nFDPJFnzE9Q5cqY7FwSMByRH2OAn_BpI4H53NQfWIlN6I2qfmAqnkp2wRqn0XjMO65OyZY4h6P4K2nAGKJpAOSzksYXaiAK_FoH_8QbgBI4=@proton.me/ [1]
> Link: https://lore.kernel.org/all/nFDPJFnzE9Q5cqY7FwSMByRH2OAn_BpI4H53NQfWIlN6I2qfmAqnkp2wRqn0XjMO65OyZY4h6P4K2nAGKJpAOSzksYXaiAK_FoH_8QbgBI4=@proton.me/ [2]
> Suggested-by: Benno Lossin <benno.lossin@proton.me>
> Reviewed-by: Benno Lossin <benno.lossin@proton.me>
> Reviewed-by: Trevor Gross <tmgross@umich.edu>
> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
> Reviewed-by: Björn Roy Baron <bjorn3_gh@protonmail.com>
> Reviewed-by: Gary Guo <gary@garyguo.net>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/kernel/sync/lock.rs | 13 +++++++++----
> rust/kernel/task.rs | 10 ++++++----
> rust/kernel/types.rs | 21 +++++++++++++++++++++
> 3 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/rust/kernel/sync/lock.rs b/rust/kernel/sync/lock.rs
> index f6c34ca4d819..d6e9bab114b8 100644
> --- a/rust/kernel/sync/lock.rs
> +++ b/rust/kernel/sync/lock.rs
> @@ -6,8 +6,13 @@
> //! spinlocks, raw spinlocks) to be provided with minimal effort.
>
> use super::LockClassKey;
> -use crate::{init::PinInit, pin_init, str::CStr, types::Opaque, types::ScopeGuard};
> -use core::{cell::UnsafeCell, marker::PhantomData, marker::PhantomPinned};
> +use crate::{
> + init::PinInit,
> + pin_init,
> + str::CStr,
> + types::{NotThreadSafe, Opaque, ScopeGuard},
> +};
> +use core::{cell::UnsafeCell, marker::PhantomPinned};
> use macros::pin_data;
>
> pub mod mutex;
> @@ -139,7 +144,7 @@ pub fn lock(&self) -> Guard<'_, T, B> {
> pub struct Guard<'a, T: ?Sized, B: Backend> {
> pub(crate) lock: &'a Lock<T, B>,
> pub(crate) state: B::GuardState,
> - _not_send: PhantomData<*mut ()>,
> + _not_send: NotThreadSafe,
> }
>
> // SAFETY: `Guard` is sync when the data protected by the lock is also sync.
> @@ -191,7 +196,7 @@ pub(crate) unsafe fn new(lock: &'a Lock<T, B>, state: B::GuardState) -> Self {
> Self {
> lock,
> state,
> - _not_send: PhantomData,
> + _not_send: NotThreadSafe,
> }
> }
> }
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 55dff7e088bf..278c623de0c6 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -4,10 +4,12 @@
> //!
> //! C header: [`include/linux/sched.h`](srctree/include/linux/sched.h).
>
> -use crate::types::Opaque;
> +use crate::{
> + bindings,
> + types::{NotThreadSafe, Opaque},
> +};
> use core::{
> ffi::{c_int, c_long, c_uint},
> - marker::PhantomData,
> ops::Deref,
> ptr,
> };
> @@ -106,7 +108,7 @@ impl Task {
> pub unsafe fn current() -> impl Deref<Target = Task> {
> struct TaskRef<'a> {
> task: &'a Task,
> - _not_send: PhantomData<*mut ()>,
> + _not_send: NotThreadSafe,
> }
>
> impl Deref for TaskRef<'_> {
> @@ -125,7 +127,7 @@ fn deref(&self) -> &Self::Target {
> // that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
> // (where it could potentially outlive the caller).
> task: unsafe { &*ptr.cast() },
> - _not_send: PhantomData,
> + _not_send: NotThreadSafe,
> }
> }
>
> diff --git a/rust/kernel/types.rs b/rust/kernel/types.rs
> index 9e7ca066355c..3238ffaab031 100644
> --- a/rust/kernel/types.rs
> +++ b/rust/kernel/types.rs
> @@ -532,3 +532,24 @@ unsafe impl AsBytes for str {}
> // does not have any uninitialized portions either.
> unsafe impl<T: AsBytes> AsBytes for [T] {}
> unsafe impl<T: AsBytes, const N: usize> AsBytes for [T; N] {}
> +
> +/// Zero-sized type to mark types not [`Send`].
> +///
> +/// Add this type as a field to your struct if your type should not be sent to a different task.
> +/// Since [`Send`] is an auto trait, adding a single field that is `!Send` will ensure that the
> +/// whole type is `!Send`.
> +///
> +/// If a type is `!Send` it is impossible to give control over an instance of the type to another
> +/// task. This is useful to include in types that store or reference task-local information. A file
> +/// descriptor is an example of such task-local information.
> +///
> +/// This type also makes the type `!Sync`, which prevents immutable access to the value from
> +/// several threads in parallel.
> +pub type NotThreadSafe = PhantomData<*mut ()>;
> +
> +/// Used to construct instances of type [`NotThreadSafe`] similar to how `PhantomData` is
> +/// constructed.
> +///
> +/// [`NotThreadSafe`]: type@NotThreadSafe
> +#[allow(non_upper_case_globals)]
> +pub const NotThreadSafe: NotThreadSafe = PhantomData;
>
> --
> 2.46.0.662.g92d0881bb0-goog
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 1/8] rust: types: add `NotThreadSafe`
2024-09-24 19:45 ` Serge E. Hallyn
@ 2024-09-25 11:06 ` Alice Ryhl
2024-09-25 13:59 ` Serge E. Hallyn
0 siblings, 1 reply; 52+ messages in thread
From: Alice Ryhl @ 2024-09-25 11:06 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: Paul Moore, James Morris, Miguel Ojeda, Christian Brauner,
Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Tue, Sep 24, 2024 at 9:45 PM Serge E. Hallyn <serge@hallyn.com> wrote:
>
> On Sun, Sep 15, 2024 at 02:31:27PM +0000, Alice Ryhl wrote:
> > This introduces a new marker type for types that shouldn't be thread
> > safe. By adding a field of this type to a struct, it becomes non-Send
> > and non-Sync, which means that it cannot be accessed in any way from
> > threads other than the one it was created on.
> >
> > This is useful for APIs that require globals such as `current` to remain
> > constant while the value exists.
> >
> > We update two existing users in the Kernel to use this helper:
> >
> > * `Task::current()` - moving the return type of this value to a
> > different thread would not be safe as you can no longer be guaranteed
> > that the `current` pointer remains valid.
> > * Lock guards. Mutexes and spinlocks should be unlocked on the same
> > thread as where they were locked, so we enforce this using the Send
> > trait.
>
> Hi,
>
> this sounds useful, however from kernel side when I think thread-safe,
> I think must not be used across a sleep. Would something like ThreadLocked
> or LockedToThread make sense?
Hmm, those names seem pretty similar to the current name to me?
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 1/8] rust: types: add `NotThreadSafe`
2024-09-25 11:06 ` Alice Ryhl
@ 2024-09-25 13:59 ` Serge E. Hallyn
2024-09-27 10:20 ` Gary Guo
0 siblings, 1 reply; 52+ messages in thread
From: Serge E. Hallyn @ 2024-09-25 13:59 UTC (permalink / raw)
To: Alice Ryhl
Cc: Serge E. Hallyn, Paul Moore, James Morris, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Wed, Sep 25, 2024 at 01:06:10PM +0200, Alice Ryhl wrote:
> On Tue, Sep 24, 2024 at 9:45 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >
> > On Sun, Sep 15, 2024 at 02:31:27PM +0000, Alice Ryhl wrote:
> > > This introduces a new marker type for types that shouldn't be thread
> > > safe. By adding a field of this type to a struct, it becomes non-Send
> > > and non-Sync, which means that it cannot be accessed in any way from
> > > threads other than the one it was created on.
> > >
> > > This is useful for APIs that require globals such as `current` to remain
> > > constant while the value exists.
> > >
> > > We update two existing users in the Kernel to use this helper:
> > >
> > > * `Task::current()` - moving the return type of this value to a
> > > different thread would not be safe as you can no longer be guaranteed
> > > that the `current` pointer remains valid.
> > > * Lock guards. Mutexes and spinlocks should be unlocked on the same
> > > thread as where they were locked, so we enforce this using the Send
> > > trait.
> >
> > Hi,
> >
> > this sounds useful, however from kernel side when I think thread-safe,
> > I think must not be used across a sleep. Would something like ThreadLocked
> > or LockedToThread make sense?
>
> Hmm, those names seem pretty similar to the current name to me?
Seems very different to me:
If @foo is not threadsafe, it may be global or be usable by many
threads, but must be locked to one thread during access.
What you're describing here is (iiuc) that @foo must only be used
by one particular thread.
-serge
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 7/8] rust: file: add `Kuid` wrapper
2024-09-23 9:13 ` Alice Ryhl
@ 2024-09-26 16:33 ` Christian Brauner
2024-09-26 16:35 ` [PATCH] [RFC] rust: add PidNamespace wrapper Christian Brauner
1 sibling, 0 replies; 52+ messages in thread
From: Christian Brauner @ 2024-09-26 16:33 UTC (permalink / raw)
To: Alice Ryhl
Cc: Gary Guo, Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Mon, Sep 23, 2024 at 11:13:56AM GMT, Alice Ryhl wrote:
> On Mon, Sep 16, 2024 at 12:02 AM Gary Guo <gary@garyguo.net> wrote:
> >
> > On Sun, 15 Sep 2024 14:31:33 +0000
> > Alice Ryhl <aliceryhl@google.com> wrote:
> > > + /// Returns the given task's pid in the current pid namespace.
> > > + pub fn pid_in_current_ns(&self) -> Pid {
> > > + // SAFETY: We know that `self.0.get()` is valid by the type invariant, and passing a null
> > > + // pointer as the namespace is correct for using the current namespace.
> > > + unsafe { bindings::task_tgid_nr_ns(self.0.get(), ptr::null_mut()) }
> >
> > Do we want to rely on the behaviour of `task_tgid_nr_ns` with null
> > pointer as namespace, or use `task_tgid_vnr`?
>
> Hmm. Looks like C Binder actually does:
> trd->sender_pid = task_tgid_nr_ns(sender, task_active_pid_ns(current));
>
> Not sure why I'm using a null pointer here.
Passing a NULL pointer for task_tgid_nr_ns() is fine. Under the hood
it's just __task_pid_nr_ns(task, PIDTYPE_TGID, NULL) which causes
task_active_pid_ns(current) to be called internally. So it's equivalent.
In any case, I did add Rust wrappers for struct pid_namespace just to
see how far I would get as task_active_pid_ns() is rather subtle even if
it isn't obvious at first glance. Sending that in a second.
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH] [RFC] rust: add PidNamespace wrapper
2024-09-23 9:13 ` Alice Ryhl
2024-09-26 16:33 ` Christian Brauner
@ 2024-09-26 16:35 ` Christian Brauner
2024-09-27 12:04 ` Alice Ryhl
2024-10-01 9:43 ` [PATCH v2] rust: add PidNamespace Christian Brauner
1 sibling, 2 replies; 52+ messages in thread
From: Christian Brauner @ 2024-09-26 16:35 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Bjoern Roy Baron, Benno Lossin, Andreas Hindborg, Peter Zijlstra,
Alexander Viro, Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos,
Martijn Coenen, Joel Fernandes, Carlos Llamas, Suren Baghdasaryan,
Dan Williams, Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
Ok, so here's my feeble attempt at getting something going for wrapping
struct pid_namespace as struct pid_namespace indirectly came up in the
file abstraction thread.
The lifetime of a pid namespace is intimately tied to the lifetime of
task. The pid namespace of a task doesn't ever change. A
unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID) will not
change the task's pid namespace only the pid namespace of children
spawned by the task. This invariant is important to keep in mind.
After a task is reaped it will be detached from its associated struct
pids via __unhash_process(). This will also set task->thread_pid to
NULL.
In order to retrieve the pid namespace of a task task_active_pid_ns()
can be used. The helper works on both current and non-current taks but
the requirements are slightly different in both cases and it depends on
where the helper is called.
The rules for this are simple but difficult for me to translate into
Rust. If task_active_pid_ns() is called on current then no RCU locking
is needed as current is obviously alive. On the other hand calling
task_active_pid_ns() after release_task() would work but it would mean
task_active_pid_ns() will return NULL.
Calling task_active_pid_ns() on a non-current task, while valid, must be
under RCU or other protection mechanism as the task might be
release_task() and thus in __unhash_process().
Handling that in a single impl seemed cumbersome but that may just be
my lack of kernel Rust experience.
It would of course be possible to add an always refcounted PidNamespace
impl to Task but that would be pointless refcount bumping for the usual
case where the caller retrieves the pid namespace of current.
Instead I added a macro that gets the active pid namespace of current
and a task_get_pid_ns() impl that returns an Option<ARef<PidNamespace>>.
Returning an Option<ARef<PidNamespace>> forces the caller to make a
conscious decision instead of just silently translating a NULL to
current pid namespace when passed to e.g., task_tgid_nr_ns().
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
rust/helpers/helpers.c | 1 +
rust/helpers/pid_namespace.c | 26 ++++++++++++++
rust/kernel/lib.rs | 1 +
rust/kernel/pid_namespace.rs | 68 ++++++++++++++++++++++++++++++++++++
rust/kernel/task.rs | 56 +++++++++++++++++++++++++----
5 files changed, 146 insertions(+), 6 deletions(-)
create mode 100644 rust/helpers/pid_namespace.c
create mode 100644 rust/kernel/pid_namespace.rs
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 62022b18caf5..d553ad9361ce 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -17,6 +17,7 @@
#include "kunit.c"
#include "mutex.c"
#include "page.c"
+#include "pid_namespace.c"
#include "rbtree.c"
#include "refcount.c"
#include "security.c"
diff --git a/rust/helpers/pid_namespace.c b/rust/helpers/pid_namespace.c
new file mode 100644
index 000000000000..f41482bdec9a
--- /dev/null
+++ b/rust/helpers/pid_namespace.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/pid_namespace.h>
+#include <linux/cleanup.h>
+
+struct pid_namespace *rust_helper_get_pid_ns(struct pid_namespace *ns)
+{
+ return get_pid_ns(ns);
+}
+
+void rust_helper_put_pid_ns(struct pid_namespace *ns)
+{
+ put_pid_ns(ns);
+}
+
+/* Get a reference on a task's pid namespace. */
+struct pid_namespace *rust_helper_task_get_pid_ns(struct task_struct *task)
+{
+ struct pid_namespace *pid_ns;
+
+ guard(rcu)();
+ pid_ns = task_active_pid_ns(task);
+ if (pid_ns)
+ get_pid_ns(pid_ns);
+ return pid_ns;
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index ff7d88022c57..0e78ec9d06e0 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -44,6 +44,7 @@
#[cfg(CONFIG_NET)]
pub mod net;
pub mod page;
+pub mod pid_namespace;
pub mod prelude;
pub mod print;
pub mod sizes;
diff --git a/rust/kernel/pid_namespace.rs b/rust/kernel/pid_namespace.rs
new file mode 100644
index 000000000000..cd12c21a68cb
--- /dev/null
+++ b/rust/kernel/pid_namespace.rs
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Pid namespaces.
+//!
+//! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
+//! [`include/linux/pid.h`](srctree/include/linux/pid.h)
+
+use crate::{
+ bindings,
+ types::{AlwaysRefCounted, Opaque},
+};
+use core::{
+ ptr,
+};
+
+/// Wraps the kernel's `struct pid_namespace`. Thread safe.
+///
+/// This structure represents the Rust abstraction for a C `struct pid_namespace`. This
+/// implementation abstracts the usage of an already existing C `struct pid_namespace` within Rust
+/// code that we get passed from the C side.
+#[repr(transparent)]
+pub struct PidNamespace {
+ inner: Opaque<bindings::pid_namespace>,
+}
+
+impl PidNamespace {
+ /// Returns a raw pointer to the inner C struct.
+ #[inline]
+ pub fn as_ptr(&self) -> *mut bindings::pid_namespace {
+ self.inner.get()
+ }
+
+ /// Creates a reference to a [`PidNamespace`] from a valid pointer.
+ ///
+ /// # Safety
+ ///
+ /// The caller must ensure that `ptr` is valid and remains valid for the lifetime of the
+ /// returned [`PidNamespace`] reference.
+ pub unsafe fn from_ptr<'a>(ptr: *const bindings::pid_namespace) -> &'a Self {
+ // SAFETY: The safety requirements guarantee the validity of the dereference, while the
+ // `PidNamespace` type being transparent makes the cast ok.
+ unsafe { &*ptr.cast() }
+ }
+}
+
+// SAFETY: Instances of `PidNamespace` are always reference-counted.
+unsafe impl AlwaysRefCounted for PidNamespace {
+ #[inline]
+ fn inc_ref(&self) {
+ // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+ unsafe { bindings::get_pid_ns(self.as_ptr()) };
+ }
+
+ #[inline]
+ unsafe fn dec_ref(obj: ptr::NonNull<PidNamespace>) {
+ // SAFETY: The safety requirements guarantee that the refcount is non-zero.
+ unsafe { bindings::put_pid_ns(obj.cast().as_ptr()) }
+ }
+}
+
+// SAFETY:
+// - `PidNamespace::dec_ref` can be called from any thread.
+// - It is okay to send ownership of `PidNamespace` across thread boundaries.
+unsafe impl Send for PidNamespace {}
+
+// SAFETY: It's OK to access `PidNamespace` through shared references from other threads because
+// we're either accessing properties that don't change or that are properly synchronised by C code.
+unsafe impl Sync for PidNamespace {}
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 1a36a9f19368..89a431dfac5d 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -6,7 +6,8 @@
use crate::{
bindings,
- types::{NotThreadSafe, Opaque},
+ pid_namespace::PidNamespace,
+ types::{ARef, NotThreadSafe, Opaque},
};
use core::{
cmp::{Eq, PartialEq},
@@ -36,6 +37,37 @@ macro_rules! current {
};
}
+/// Returns the currently running task's pid namespace.
+///
+/// The lifetime of `PidNamespace` is intimately tied to the lifetime of `Task`. The pid namespace
+/// of a `Task` doesn't ever change. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd,
+/// CLONE_NEWPID)` will not change the task's pid namespace. This invariant is important to keep in
+/// mind.
+///
+/// After a task is reaped it will be detached from its associated `struct pid`s via
+/// __unhash_process(). This will specifically set `task->thread_pid` to `NULL`.
+///
+/// In order to retrieve the pid namespace of a task `task_active_pid_ns()` can be used. The rules
+/// for this are simple but difficult for me to translate into Rust. If `task_active_pid_ns()` is
+/// called from `current` then no RCU locking is needed as current is obviously alive. However,
+/// calling `task_active_pid_ns()` on a non-`current` task, while valid, must be under RCU or other
+/// protection as the task might be in __unhash_process().
+///
+/// We could add an always refcounted `PidNamespace` impl to `Task` but that would be pointless
+/// refcount bumping for the usual case where the caller retrieves the pid namespace of `current`.
+///
+/// So I added a macro that gets the active pid namespace of `current` and a `task_get_pid_ns()`
+/// impl that returns an `ARef<PidNamespace>` or `None` if the pid namespace is `NULL`. Returning
+/// an `Option<ARef<PidNamespace>>` forces the caller to make a conscious decision what instead of
+/// just silently translating a `NULL` to `current`'s pid namespace.
+#[macro_export]
+macro_rules! current_pid_ns {
+ () => {
+ let ptr = current()
+ unsafe { PidNamespace::from_ptr(bindings::task_active_pid_ns(ptr)) }
+ };
+}
+
/// Wraps the kernel's `struct task_struct`.
///
/// # Invariants
@@ -182,11 +214,23 @@ pub fn signal_pending(&self) -> bool {
unsafe { bindings::signal_pending(self.0.get()) != 0 }
}
- /// Returns the given task's pid in the current pid namespace.
- pub fn pid_in_current_ns(&self) -> Pid {
- // SAFETY: We know that `self.0.get()` is valid by the type invariant, and passing a null
- // pointer as the namespace is correct for using the current namespace.
- unsafe { bindings::task_tgid_nr_ns(self.0.get(), ptr::null_mut()) }
+ /// Returns task's pid namespace with elevated reference count
+ pub fn task_get_pid_ns(&self) -> Option<ARef<PidNamespace>> {
+ let ptr = unsafe { bindings::task_get_pid_ns(self.0.get()) };
+ if ptr.is_null() {
+ None
+ } else {
+ // SAFETY: `ptr` is valid by the safety requirements of this function. And we own a
+ // reference count via `task_get_pid_ns()`.
+ // CAST: `Self` is a `repr(transparent)` wrapper around `bindings::pid_namespace`.
+ Some(unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr.cast::<PidNamespace>())) })
+ }
+ }
+
+ /// Returns the given task's pid in the provided pid namespace.
+ pub fn task_tgid_nr_ns(&self, pidns: &PidNamespace) -> Pid {
+ // SAFETY: We know that `self.0.get()` is valid by the type invariant.
+ unsafe { bindings::task_tgid_nr_ns(self.0.get(), pidns.as_ptr()) }
}
/// Wakes up the task.
--
2.45.2
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v10 0/8] File abstractions needed by Rust Binder
2024-09-15 14:31 [PATCH v10 0/8] File abstractions needed by Rust Binder Alice Ryhl
` (7 preceding siblings ...)
2024-09-15 14:31 ` [PATCH v10 8/8] rust: file: add abstraction for `poll_table` Alice Ryhl
@ 2024-09-27 9:28 ` Christian Brauner
8 siblings, 0 replies; 52+ messages in thread
From: Christian Brauner @ 2024-09-27 9:28 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Peter Zijlstra,
Alexander Viro, Greg Kroah-Hartman, Arve Hjønnevåg,
Todd Kjos, Martijn Coenen, Joel Fernandes, Carlos Llamas,
Suren Baghdasaryan, Dan Williams, Matthew Wilcox, Thomas Gleixner,
Daniel Xu, Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook,
Andreas Hindborg, Paul Moore, James Morris, Serge E. Hallyn,
Miguel Ojeda
On Sun, 15 Sep 2024 14:31:26 +0000, Alice Ryhl wrote:
> This patchset contains the file abstractions needed by the Rust
> implementation of the Binder driver.
>
> Please see the Rust Binder RFC for usage examples:
> https://lore.kernel.org/rust-for-linux/20231101-rust-binder-v1-0-08ba9197f637@google.com/
>
> Users of "rust: types: add `NotThreadSafe`":
> [PATCH 5/9] rust: file: add `FileDescriptorReservation`
>
> [...]
Applied to the vfs.rust.file.v6.13 branch of the vfs/vfs.git tree.
Patches in the vfs.rust.file.v6.13 branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.rust.file.v6.13
[1/8] rust: types: add `NotThreadSafe`
https://git.kernel.org/vfs/vfs/c/cf9139a8a2ff
[2/8] rust: task: add `Task::current_raw`
https://git.kernel.org/vfs/vfs/c/16c7a0430f3a
[3/8] rust: file: add Rust abstraction for `struct file`
https://git.kernel.org/vfs/vfs/c/d403edaaee09
[4/8] rust: cred: add Rust abstraction for `struct cred`
https://git.kernel.org/vfs/vfs/c/fa4912bed836
[5/8] rust: security: add abstraction for secctx
https://git.kernel.org/vfs/vfs/c/34f391deba6d
[6/8] rust: file: add `FileDescriptorReservation`
https://git.kernel.org/vfs/vfs/c/054e1b6a797e
[7/8] rust: file: add `Kuid` wrapper
https://git.kernel.org/vfs/vfs/c/a78b176bfdc2
[8/8] rust: file: add abstraction for `poll_table`
https://git.kernel.org/vfs/vfs/c/e0cdb09b7100
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 1/8] rust: types: add `NotThreadSafe`
2024-09-25 13:59 ` Serge E. Hallyn
@ 2024-09-27 10:20 ` Gary Guo
0 siblings, 0 replies; 52+ messages in thread
From: Gary Guo @ 2024-09-27 10:20 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: Alice Ryhl, Paul Moore, James Morris, Miguel Ojeda,
Christian Brauner, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Wed, 25 Sep 2024 08:59:04 -0500
"Serge E. Hallyn" <serge@hallyn.com> wrote:
> On Wed, Sep 25, 2024 at 01:06:10PM +0200, Alice Ryhl wrote:
> > On Tue, Sep 24, 2024 at 9:45 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> > >
> > > On Sun, Sep 15, 2024 at 02:31:27PM +0000, Alice Ryhl wrote:
> > > > This introduces a new marker type for types that shouldn't be thread
> > > > safe. By adding a field of this type to a struct, it becomes non-Send
> > > > and non-Sync, which means that it cannot be accessed in any way from
> > > > threads other than the one it was created on.
> > > >
> > > > This is useful for APIs that require globals such as `current` to remain
> > > > constant while the value exists.
> > > >
> > > > We update two existing users in the Kernel to use this helper:
> > > >
> > > > * `Task::current()` - moving the return type of this value to a
> > > > different thread would not be safe as you can no longer be guaranteed
> > > > that the `current` pointer remains valid.
> > > > * Lock guards. Mutexes and spinlocks should be unlocked on the same
> > > > thread as where they were locked, so we enforce this using the Send
> > > > trait.
> > >
> > > Hi,
> > >
> > > this sounds useful, however from kernel side when I think thread-safe,
> > > I think must not be used across a sleep. Would something like ThreadLocked
> > > or LockedToThread make sense?
> >
> > Hmm, those names seem pretty similar to the current name to me?
>
> Seems very different to me:
>
> If @foo is not threadsafe, it may be global or be usable by many
> threads, but must be locked to one thread during access.
>
> What you're describing here is (iiuc) that @foo must only be used
> by one particular thread.
"locked to one thread during access" means it might be `Send` but not
`!Sync`.
What Alice has here is something is neither `Send` nor `Sync`, so I
think the `NotThreadSafe` is a good name here because it cancels both
guarantees.
Best,
Gary
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v10 1/8] rust: types: add `NotThreadSafe`
2024-09-15 15:38 ` Gary Guo
@ 2024-09-27 11:21 ` Miguel Ojeda
0 siblings, 0 replies; 52+ messages in thread
From: Miguel Ojeda @ 2024-09-27 11:21 UTC (permalink / raw)
To: Gary Guo
Cc: Alice Ryhl, Paul Moore, James Morris, Serge E. Hallyn,
Miguel Ojeda, Christian Brauner, Alex Gaynor,
Wedson Almeida Filho, Boqun Feng, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Peter Zijlstra, Alexander Viro,
Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Carlos Llamas, Suren Baghdasaryan,
Dan Williams, Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Sun, Sep 15, 2024 at 5:38 PM Gary Guo <gary@garyguo.net> wrote:
>
> Miguel, can we apply this patch now without having it wait on the rest
> of file abstractions because it'll be useful to other?
Sorry, I missed to reply to this during the conferences.
If we need this for something else that does not go through VFS during
this (same) cycle, then we can figure something out and apply it to
rust-next too.
Cheers,
Miguel
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [RFC] rust: add PidNamespace wrapper
2024-09-26 16:35 ` [PATCH] [RFC] rust: add PidNamespace wrapper Christian Brauner
@ 2024-09-27 12:04 ` Alice Ryhl
2024-09-27 14:21 ` Christian Brauner
2024-10-01 9:43 ` [PATCH v2] rust: add PidNamespace Christian Brauner
1 sibling, 1 reply; 52+ messages in thread
From: Alice Ryhl @ 2024-09-27 12:04 UTC (permalink / raw)
To: Christian Brauner
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Bjoern Roy Baron,
Benno Lossin, Andreas Hindborg, Peter Zijlstra, Alexander Viro,
Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Thu, Sep 26, 2024 at 6:36 PM Christian Brauner <brauner@kernel.org> wrote:
>
> Ok, so here's my feeble attempt at getting something going for wrapping
> struct pid_namespace as struct pid_namespace indirectly came up in the
> file abstraction thread.
This looks great!
> The lifetime of a pid namespace is intimately tied to the lifetime of
> task. The pid namespace of a task doesn't ever change. A
> unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID) will not
> change the task's pid namespace only the pid namespace of children
> spawned by the task. This invariant is important to keep in mind.
>
> After a task is reaped it will be detached from its associated struct
> pids via __unhash_process(). This will also set task->thread_pid to
> NULL.
>
> In order to retrieve the pid namespace of a task task_active_pid_ns()
> can be used. The helper works on both current and non-current taks but
> the requirements are slightly different in both cases and it depends on
> where the helper is called.
>
> The rules for this are simple but difficult for me to translate into
> Rust. If task_active_pid_ns() is called on current then no RCU locking
> is needed as current is obviously alive. On the other hand calling
> task_active_pid_ns() after release_task() would work but it would mean
> task_active_pid_ns() will return NULL.
>
> Calling task_active_pid_ns() on a non-current task, while valid, must be
> under RCU or other protection mechanism as the task might be
> release_task() and thus in __unhash_process().
Just to confirm, calling task_active_pid_ns() on a non-current task
requires the rcu lock even if you own a refcont on the task?
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [RFC] rust: add PidNamespace wrapper
2024-09-27 12:04 ` Alice Ryhl
@ 2024-09-27 14:21 ` Christian Brauner
2024-09-27 14:58 ` Alice Ryhl
0 siblings, 1 reply; 52+ messages in thread
From: Christian Brauner @ 2024-09-27 14:21 UTC (permalink / raw)
To: Alice Ryhl
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Bjoern Roy Baron,
Benno Lossin, Andreas Hindborg, Peter Zijlstra, Alexander Viro,
Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Fri, Sep 27, 2024 at 02:04:13PM GMT, Alice Ryhl wrote:
> On Thu, Sep 26, 2024 at 6:36 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > Ok, so here's my feeble attempt at getting something going for wrapping
> > struct pid_namespace as struct pid_namespace indirectly came up in the
> > file abstraction thread.
>
> This looks great!
Thanks!
>
> > The lifetime of a pid namespace is intimately tied to the lifetime of
> > task. The pid namespace of a task doesn't ever change. A
> > unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID) will not
> > change the task's pid namespace only the pid namespace of children
> > spawned by the task. This invariant is important to keep in mind.
> >
> > After a task is reaped it will be detached from its associated struct
> > pids via __unhash_process(). This will also set task->thread_pid to
> > NULL.
> >
> > In order to retrieve the pid namespace of a task task_active_pid_ns()
> > can be used. The helper works on both current and non-current taks but
> > the requirements are slightly different in both cases and it depends on
> > where the helper is called.
> >
> > The rules for this are simple but difficult for me to translate into
> > Rust. If task_active_pid_ns() is called on current then no RCU locking
> > is needed as current is obviously alive. On the other hand calling
> > task_active_pid_ns() after release_task() would work but it would mean
> > task_active_pid_ns() will return NULL.
> >
> > Calling task_active_pid_ns() on a non-current task, while valid, must be
> > under RCU or other protection mechanism as the task might be
> > release_task() and thus in __unhash_process().
>
> Just to confirm, calling task_active_pid_ns() on a non-current task
> requires the rcu lock even if you own a refcont on the task?
Interesting question. Afaik, yes. task_active_pid_ns() goes via
task->thread_pid which is a shorthand for task->pid_links[PIDTYPE_PID].
This will be NULLed when the task exits and is dead (so usually when
someone has waited on it - ignoring ptrace for sanity reasons and
autoreaping the latter amounts to the same thing just in-kernel):
T1 T2 T3
exit(0);
wait(T1)
-> wait_task_zombie()
-> release_task()
-> __exit_signals()
-> __unash_process()
// sets task->thread_pid == NULL task_active_pid_ns(T1)
// task->pid_links[PIDTYPE_PID] == NULL
So having a reference to struct task_struct doesn't prevent
task->thread_pid becoming NULL.
And you touch upon a very interesting point. The lifetime of struct
pid_namespace is actually tied to struct pid much tighter than it is to
struct task_struct. So when a task is released (transitions from zombie
to dead in the common case) the following happens:
release_task()
-> __exit_signals()
-> thread_pid = get_pid(task->thread_pid)
-> __unhash_process()
-> detach_pid(PIDTYPE_PID)
-> __change_pid()
{
task->thread_pid = NULL;
task->pid_links[PIDTYPE_PID] = NULL;
free_pid(thread_pid)
}
put_pid(thread_pid)
And the free_pid() in __change_pid() does a delayed_put_pid() via
call_rcu().
So afaiu, taking the rcu_read_lock() synchronizes against that
delayed_put_pid() in __change_pid() so the call_rcu() will wait until
everyone who does
rcu_read_lock()
task_active_pid_ns(task)
rcu_read_unlock()
and sees task->thread_pid non-NULL, is done. This way no additional
reference count on struct task_struct or struct pid is needed before
plucking the pid namespace from there. Does that make sense or have I
gotten it all wrong?
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [RFC] rust: add PidNamespace wrapper
2024-09-27 14:21 ` Christian Brauner
@ 2024-09-27 14:58 ` Alice Ryhl
0 siblings, 0 replies; 52+ messages in thread
From: Alice Ryhl @ 2024-09-27 14:58 UTC (permalink / raw)
To: Christian Brauner
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Bjoern Roy Baron,
Benno Lossin, Andreas Hindborg, Peter Zijlstra, Alexander Viro,
Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, rust-for-linux, linux-fsdevel, Kees Cook
On Fri, Sep 27, 2024 at 4:21 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Fri, Sep 27, 2024 at 02:04:13PM GMT, Alice Ryhl wrote:
> > On Thu, Sep 26, 2024 at 6:36 PM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > Ok, so here's my feeble attempt at getting something going for wrapping
> > > struct pid_namespace as struct pid_namespace indirectly came up in the
> > > file abstraction thread.
> >
> > This looks great!
>
> Thanks!
>
> >
> > > The lifetime of a pid namespace is intimately tied to the lifetime of
> > > task. The pid namespace of a task doesn't ever change. A
> > > unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID) will not
> > > change the task's pid namespace only the pid namespace of children
> > > spawned by the task. This invariant is important to keep in mind.
> > >
> > > After a task is reaped it will be detached from its associated struct
> > > pids via __unhash_process(). This will also set task->thread_pid to
> > > NULL.
> > >
> > > In order to retrieve the pid namespace of a task task_active_pid_ns()
> > > can be used. The helper works on both current and non-current taks but
> > > the requirements are slightly different in both cases and it depends on
> > > where the helper is called.
> > >
> > > The rules for this are simple but difficult for me to translate into
> > > Rust. If task_active_pid_ns() is called on current then no RCU locking
> > > is needed as current is obviously alive. On the other hand calling
> > > task_active_pid_ns() after release_task() would work but it would mean
> > > task_active_pid_ns() will return NULL.
> > >
> > > Calling task_active_pid_ns() on a non-current task, while valid, must be
> > > under RCU or other protection mechanism as the task might be
> > > release_task() and thus in __unhash_process().
> >
> > Just to confirm, calling task_active_pid_ns() on a non-current task
> > requires the rcu lock even if you own a refcont on the task?
>
> Interesting question. Afaik, yes. task_active_pid_ns() goes via
> task->thread_pid which is a shorthand for task->pid_links[PIDTYPE_PID].
>
> This will be NULLed when the task exits and is dead (so usually when
> someone has waited on it - ignoring ptrace for sanity reasons and
> autoreaping the latter amounts to the same thing just in-kernel):
>
> T1 T2 T3
> exit(0);
> wait(T1)
> -> wait_task_zombie()
> -> release_task()
> -> __exit_signals()
> -> __unash_process()
> // sets task->thread_pid == NULL task_active_pid_ns(T1)
> // task->pid_links[PIDTYPE_PID] == NULL
>
> So having a reference to struct task_struct doesn't prevent
> task->thread_pid becoming NULL.
>
> And you touch upon a very interesting point. The lifetime of struct
> pid_namespace is actually tied to struct pid much tighter than it is to
> struct task_struct. So when a task is released (transitions from zombie
> to dead in the common case) the following happens:
>
> release_task()
> -> __exit_signals()
> -> thread_pid = get_pid(task->thread_pid)
> -> __unhash_process()
> -> detach_pid(PIDTYPE_PID)
> -> __change_pid()
> {
> task->thread_pid = NULL;
> task->pid_links[PIDTYPE_PID] = NULL;
> free_pid(thread_pid)
> }
> put_pid(thread_pid)
>
> And the free_pid() in __change_pid() does a delayed_put_pid() via
> call_rcu().
>
> So afaiu, taking the rcu_read_lock() synchronizes against that
> delayed_put_pid() in __change_pid() so the call_rcu() will wait until
> everyone who does
>
> rcu_read_lock()
> task_active_pid_ns(task)
> rcu_read_unlock()
>
> and sees task->thread_pid non-NULL, is done. This way no additional
> reference count on struct task_struct or struct pid is needed before
> plucking the pid namespace from there. Does that make sense or have I
> gotten it all wrong?
Okay. I agree that the code you have is the best we can do; at least
until we get an rcu guard in Rust.
The macro doesn't quite work. You need to do something to constrain
the lifetime used by `PidNamespace::from_ptr`. Right now, there is no
constraint on the lifetime, so the caller can just pick the lifetime
'static which is the lifetime that never ends. We want to constrain it
to a lifetime that ends before the task dies. The easiest is to create
a local variable and use the lifetime of that local variable. That
way, the reference can never escape the current function, and hence,
can't escape the current task.
More generally, I'm sure there are lots of fields in current where we
can access them without rcu only because we know the current task
isn't going to die on us. I don't think we should have a macro for
every single one. I think we can put together a single macro for
getting a lifetime that ends before returning to userspace, and then
reuse that lifetime for both `current` and `current_pid_ns`, and
possibly also the `DeferredFd` patch.
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v2] rust: add PidNamespace
2024-09-26 16:35 ` [PATCH] [RFC] rust: add PidNamespace wrapper Christian Brauner
2024-09-27 12:04 ` Alice Ryhl
@ 2024-10-01 9:43 ` Christian Brauner
2024-10-01 10:26 ` Alice Ryhl
2024-10-01 19:10 ` Gary Guo
1 sibling, 2 replies; 52+ messages in thread
From: Christian Brauner @ 2024-10-01 9:43 UTC (permalink / raw)
To: Alice Ryhl, rust-for-linux
Cc: Paul Moore, James Morris, Serge E. Hallyn, Miguel Ojeda,
Alex Gaynor, Wedson Almeida Filho, Boqun Feng, Bjoern Roy Baron,
Benno Lossin, Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjonnevag, Todd Kjos, Martijn Coenen, Joel Fernandes,
Carlos Llamas, Suren Baghdasaryan, Dan Williams, Matthew Wilcox,
Thomas Gleixner, Daniel Xu, Martin Rodriguez Reboredo,
Trevor Gross, linux-kernel, linux-security-module, linux-fsdevel,
Kees Cook, Andreas Hindborg, Christian Brauner
The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
The `PidNamespace` of a `Task` doesn't ever change once the `Task` is
alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)`
will not have an effect on the calling `Task`'s pid namespace. It will
only effect the pid namespace of children created by the calling `Task`.
This invariant guarantees that after having acquired a reference to a
`Task`'s pid namespace it will remain unchanged.
When a task has exited and been reaped `release_task()` will be called.
This will set the `PidNamespace` of the task to `NULL`. So retrieving
the `PidNamespace` of a task that is dead will return `NULL`. Note, that
neither holding the RCU lock nor holding a referencing count to the
`Task` will prevent `release_task()` being called.
In order to retrieve the `PidNamespace` of a `Task` the
`task_active_pid_ns()` function can be used. There are two cases to
consider:
(1) retrieving the `PidNamespace` of the `current` task (2) retrieving
the `PidNamespace` of a non-`current` task
From system call context retrieving the `PidNamespace` for case (1) is
always safe and requires neither RCU locking nor a reference count to be
held. Retrieving the `PidNamespace` after `release_task()` for current
will return `NULL` but no codepath like that is exposed to Rust.
Retrieving the `PidNamespace` from system call context for (2) requires
RCU protection. Accessing `PidNamespace` outside of RCU protection
requires a reference count that must've been acquired while holding the
RCU lock. Note that accessing a non-`current` task means `NULL` can be
returned as the non-`current` task could have already passed through
`release_task()`.
To retrieve (1) the `current_pid_ns!()` macro should be used which
ensure that the returned `PidNamespace` cannot outlive the calling
scope. The associated `current_pid_ns()` function should not be called
directly as it could be abused to created an unbounded lifetime for
`PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the
common case of accessing `current`'s `PidNamespace` without RCU
protection and without having to acquire a reference count.
For (2) the `task_get_pid_ns()` method must be used. This will always
acquire a reference on `PidNamespace` and will return an `Option` to
force the caller to explicitly handle the case where `PidNamespace` is
`None`, something that tends to be forgotten when doing the equivalent
operation in `C`. Missing RCU primitives make it difficult to perform
operations that are otherwise safe without holding a reference count as
long as RCU protection is guaranteed. But it is not important currently.
But we do want it in the future.
Note for (2) the required RCU protection around calling
`task_active_pid_ns()` synchronizes against putting the last reference
of the associated `struct pid` of `task->thread_pid`. The `struct pid`
stored in that field is used to retrieve the `PidNamespace` of the
caller. When `release_task()` is called `task->thread_pid` will be
`NULL`ed and `put_pid()` on said `struct pid` will be delayed in
`free_pid()` via `call_rcu()` allowing everyone with an RCU protected
access to the `struct pid` acquired from `task->thread_pid` to finish.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
rust/helpers/helpers.c | 1 +
rust/helpers/pid_namespace.c | 26 ++++++++++
rust/kernel/lib.rs | 1 +
rust/kernel/pid_namespace.rs | 70 +++++++++++++++++++++++++
rust/kernel/task.rs | 119 ++++++++++++++++++++++++++++++++++++++++---
5 files changed, 211 insertions(+), 6 deletions(-)
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 62022b18caf5ec17231fd0e7be1234592d1146e3..d553ad9361ce17950d505c3b372a568730020e2f 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -17,6 +17,7 @@
#include "kunit.c"
#include "mutex.c"
#include "page.c"
+#include "pid_namespace.c"
#include "rbtree.c"
#include "refcount.c"
#include "security.c"
diff --git a/rust/helpers/pid_namespace.c b/rust/helpers/pid_namespace.c
new file mode 100644
index 0000000000000000000000000000000000000000..f41482bdec9a7c4e84b81ec141027fbd65251230
--- /dev/null
+++ b/rust/helpers/pid_namespace.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/pid_namespace.h>
+#include <linux/cleanup.h>
+
+struct pid_namespace *rust_helper_get_pid_ns(struct pid_namespace *ns)
+{
+ return get_pid_ns(ns);
+}
+
+void rust_helper_put_pid_ns(struct pid_namespace *ns)
+{
+ put_pid_ns(ns);
+}
+
+/* Get a reference on a task's pid namespace. */
+struct pid_namespace *rust_helper_task_get_pid_ns(struct task_struct *task)
+{
+ struct pid_namespace *pid_ns;
+
+ guard(rcu)();
+ pid_ns = task_active_pid_ns(task);
+ if (pid_ns)
+ get_pid_ns(pid_ns);
+ return pid_ns;
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index ff7d88022c57ca232dc028066dfa062f3fc84d1c..0e78ec9d06e0199dfafc40988a2ae86cd5df949c 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -44,6 +44,7 @@
#[cfg(CONFIG_NET)]
pub mod net;
pub mod page;
+pub mod pid_namespace;
pub mod prelude;
pub mod print;
pub mod sizes;
diff --git a/rust/kernel/pid_namespace.rs b/rust/kernel/pid_namespace.rs
new file mode 100644
index 0000000000000000000000000000000000000000..9a0509e802b4939ad853a802ee6d069a5f00c9df
--- /dev/null
+++ b/rust/kernel/pid_namespace.rs
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (c) 2024 Christian Brauner <brauner@kernel.org>
+
+//! Pid namespaces.
+//!
+//! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
+//! [`include/linux/pid.h`](srctree/include/linux/pid.h)
+
+use crate::{
+ bindings,
+ types::{AlwaysRefCounted, Opaque},
+};
+use core::{
+ ptr,
+};
+
+/// Wraps the kernel's `struct pid_namespace`. Thread safe.
+///
+/// This structure represents the Rust abstraction for a C `struct pid_namespace`. This
+/// implementation abstracts the usage of an already existing C `struct pid_namespace` within Rust
+/// code that we get passed from the C side.
+#[repr(transparent)]
+pub struct PidNamespace {
+ inner: Opaque<bindings::pid_namespace>,
+}
+
+impl PidNamespace {
+ /// Returns a raw pointer to the inner C struct.
+ #[inline]
+ pub fn as_ptr(&self) -> *mut bindings::pid_namespace {
+ self.inner.get()
+ }
+
+ /// Creates a reference to a [`PidNamespace`] from a valid pointer.
+ ///
+ /// # Safety
+ ///
+ /// The caller must ensure that `ptr` is valid and remains valid for the lifetime of the
+ /// returned [`PidNamespace`] reference.
+ pub unsafe fn from_ptr<'a>(ptr: *const bindings::pid_namespace) -> &'a Self {
+ // SAFETY: The safety requirements guarantee the validity of the dereference, while the
+ // `PidNamespace` type being transparent makes the cast ok.
+ unsafe { &*ptr.cast() }
+ }
+}
+
+// SAFETY: Instances of `PidNamespace` are always reference-counted.
+unsafe impl AlwaysRefCounted for PidNamespace {
+ #[inline]
+ fn inc_ref(&self) {
+ // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+ unsafe { bindings::get_pid_ns(self.as_ptr()) };
+ }
+
+ #[inline]
+ unsafe fn dec_ref(obj: ptr::NonNull<PidNamespace>) {
+ // SAFETY: The safety requirements guarantee that the refcount is non-zero.
+ unsafe { bindings::put_pid_ns(obj.cast().as_ptr()) }
+ }
+}
+
+// SAFETY:
+// - `PidNamespace::dec_ref` can be called from any thread.
+// - It is okay to send ownership of `PidNamespace` across thread boundaries.
+unsafe impl Send for PidNamespace {}
+
+// SAFETY: It's OK to access `PidNamespace` through shared references from other threads because
+// we're either accessing properties that don't change or that are properly synchronised by C code.
+unsafe impl Sync for PidNamespace {}
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 1a36a9f193685393e7211793b6e6dd7576af8bfd..92603cdb543d9617f1f7d092edb87ccb66c9f0c1 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -6,7 +6,8 @@
use crate::{
bindings,
- types::{NotThreadSafe, Opaque},
+ pid_namespace::PidNamespace,
+ types::{ARef, NotThreadSafe, Opaque},
};
use core::{
cmp::{Eq, PartialEq},
@@ -36,6 +37,65 @@ macro_rules! current {
};
}
+/// Returns the currently running task's pid namespace.
+///
+/// The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
+///
+/// The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
+/// `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect on the
+/// calling `Task`'s pid namespace. It will only effect the pid namespace of children created by
+/// the calling `Task`. This invariant guarantees that after having acquired a reference to a
+/// `Task`'s pid namespace it will remain unchanged.
+///
+/// When a task has exited and been reaped `release_task()` will be called. This will set the
+/// `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task that is dead
+/// will return `NULL`. Note, that neither holding the RCU lock nor holding a referencing count to
+/// the `Task` will prevent `release_task()` being called.
+///
+/// In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function can be
+/// used. There are two cases to consider:
+///
+/// (1) retrieving the `PidNamespace` of the `current` task
+/// (2) retrieving the `PidNamespace` of a non-`current` task
+///
+/// From system call context retrieving the `PidNamespace` for case (1) is always safe and requires
+/// neither RCU locking nor a reference count to be held. Retrieving the `PidNamespace` after
+/// `release_task()` for current will return `NULL` but no codepath like that is exposed to Rust.
+///
+/// Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
+/// Accessing `PidNamespace` outside of RCU protection requires a reference count that must've been
+/// acquired while holding the RCU lock. Note that accessing a non-`current` task means `NULL` can
+/// be returned as the non-`current` task could have already passed through `release_task()`.
+///
+/// To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the returned
+/// `PidNamespace` cannot outlive the calling scope. The associated `current_pid_ns()` function
+/// should not be called directly as it could be abused to created an unbounded lifetime for
+/// `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the common case of
+/// accessing `current`'s `PidNamespace` without RCU protection and without having to acquire a
+/// reference count.
+///
+/// For (2) the `task_get_pid_ns()` method must be used. This will always acquire a reference on
+/// `PidNamespace` and will return an `Option` to force the caller to explicitly handle the case
+/// where `PidNamespace` is `None`, something that tends to be forgotten when doing the equivalent
+/// operation in `C`. Missing RCU primitives make it difficult to perform operations that are
+/// otherwise safe without holding a reference count as long as RCU protection is guaranteed. But
+/// it is not important currently. But we do want it in the future.
+///
+/// Note for (2) the required RCU protection around calling `task_active_pid_ns()` synchronizes
+/// against putting the last reference of the associated `struct pid` of `task->thread_pid`.
+/// The `struct pid` stored in that field is used to retrieve the `PidNamespace` of the caller.
+/// When `release_task()` is called `task->thread_pid` will be `NULL`ed and `put_pid()` on said
+/// `struct pid` will be delayed in `free_pid()` via `call_rcu()` allowing everyone with an RCU
+/// protected access to the `struct pid` acquired from `task->thread_pid` to finish.
+#[macro_export]
+macro_rules! current_pid_ns {
+ () => {
+ // SAFETY: Deref + addr-of below create a temporary `PidNamespaceRef` that cannot outlive
+ // the caller.
+ unsafe { &*$crate::task::Task::current_pid_ns() }
+ };
+}
+
/// Wraps the kernel's `struct task_struct`.
///
/// # Invariants
@@ -145,6 +205,41 @@ fn deref(&self) -> &Self::Target {
}
}
+ /// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
+ ///
+ /// This function can be used to create an unbounded lifetime by e.g., storing the returned
+ /// PidNamespace in a global variable which would be a bug. So the recommended way to get the
+ /// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
+ /// safe.
+ ///
+ /// # Safety
+ ///
+ /// Callers must ensure that the returned object doesn't outlive the current task/thread.
+ pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
+ struct PidNamespaceRef<'a> {
+ task: &'a PidNamespace,
+ _not_send: NotThreadSafe,
+ }
+
+ impl Deref for PidNamespaceRef<'_> {
+ type Target = PidNamespace;
+
+ fn deref(&self) -> &Self::Target {
+ self.task
+ }
+ }
+
+ let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
+ PidNamespaceRef {
+ // SAFETY: If the current thread is still running, the current task and its associated
+ // pid namespace are valid. Given that `PidNamespaceRef` is not `Send`, we know it
+ // cannot be transferred to another thread (where it could potentially outlive the
+ // current `Task`).
+ task: unsafe { &*pidns.cast() },
+ _not_send: NotThreadSafe,
+ }
+ }
+
/// Returns the group leader of the given task.
pub fn group_leader(&self) -> &Task {
// SAFETY: By the type invariant, we know that `self.0` is a valid task. Valid tasks always
@@ -182,11 +277,23 @@ pub fn signal_pending(&self) -> bool {
unsafe { bindings::signal_pending(self.0.get()) != 0 }
}
- /// Returns the given task's pid in the current pid namespace.
- pub fn pid_in_current_ns(&self) -> Pid {
- // SAFETY: We know that `self.0.get()` is valid by the type invariant, and passing a null
- // pointer as the namespace is correct for using the current namespace.
- unsafe { bindings::task_tgid_nr_ns(self.0.get(), ptr::null_mut()) }
+ /// Returns task's pid namespace with elevated reference count
+ pub fn task_get_pid_ns(&self) -> Option<ARef<PidNamespace>> {
+ let ptr = unsafe { bindings::task_get_pid_ns(self.0.get()) };
+ if ptr.is_null() {
+ None
+ } else {
+ // SAFETY: `ptr` is valid by the safety requirements of this function. And we own a
+ // reference count via `task_get_pid_ns()`.
+ // CAST: `Self` is a `repr(transparent)` wrapper around `bindings::pid_namespace`.
+ Some(unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr.cast::<PidNamespace>())) })
+ }
+ }
+
+ /// Returns the given task's pid in the provided pid namespace.
+ pub fn task_tgid_nr_ns(&self, pidns: &PidNamespace) -> Pid {
+ // SAFETY: We know that `self.0.get()` is valid by the type invariant.
+ unsafe { bindings::task_tgid_nr_ns(self.0.get(), pidns.as_ptr()) }
}
/// Wakes up the task.
---
base-commit: e9980e40804730de33c1563d9ac74d5b51591ec0
change-id: 20241001-brauner-rust-pid_namespace-52b0c92c8359
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2] rust: add PidNamespace
2024-10-01 9:43 ` [PATCH v2] rust: add PidNamespace Christian Brauner
@ 2024-10-01 10:26 ` Alice Ryhl
2024-10-01 14:17 ` Christian Brauner
2024-10-01 19:10 ` Gary Guo
1 sibling, 1 reply; 52+ messages in thread
From: Alice Ryhl @ 2024-10-01 10:26 UTC (permalink / raw)
To: Christian Brauner
Cc: rust-for-linux, Paul Moore, James Morris, Serge E. Hallyn,
Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Bjoern Roy Baron, Benno Lossin, Peter Zijlstra, Alexander Viro,
Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, linux-fsdevel, Kees Cook, Andreas Hindborg
On Tue, Oct 1, 2024 at 11:44 AM Christian Brauner <brauner@kernel.org> wrote:
>
> The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
>
> The `PidNamespace` of a `Task` doesn't ever change once the `Task` is
> alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)`
> will not have an effect on the calling `Task`'s pid namespace. It will
> only effect the pid namespace of children created by the calling `Task`.
> This invariant guarantees that after having acquired a reference to a
> `Task`'s pid namespace it will remain unchanged.
>
> When a task has exited and been reaped `release_task()` will be called.
> This will set the `PidNamespace` of the task to `NULL`. So retrieving
> the `PidNamespace` of a task that is dead will return `NULL`. Note, that
> neither holding the RCU lock nor holding a referencing count to the
> `Task` will prevent `release_task()` being called.
>
> In order to retrieve the `PidNamespace` of a `Task` the
> `task_active_pid_ns()` function can be used. There are two cases to
> consider:
>
> (1) retrieving the `PidNamespace` of the `current` task (2) retrieving
> the `PidNamespace` of a non-`current` task
>
> From system call context retrieving the `PidNamespace` for case (1) is
> always safe and requires neither RCU locking nor a reference count to be
> held. Retrieving the `PidNamespace` after `release_task()` for current
> will return `NULL` but no codepath like that is exposed to Rust.
>
> Retrieving the `PidNamespace` from system call context for (2) requires
> RCU protection. Accessing `PidNamespace` outside of RCU protection
> requires a reference count that must've been acquired while holding the
> RCU lock. Note that accessing a non-`current` task means `NULL` can be
> returned as the non-`current` task could have already passed through
> `release_task()`.
>
> To retrieve (1) the `current_pid_ns!()` macro should be used which
> ensure that the returned `PidNamespace` cannot outlive the calling
> scope. The associated `current_pid_ns()` function should not be called
> directly as it could be abused to created an unbounded lifetime for
> `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the
> common case of accessing `current`'s `PidNamespace` without RCU
> protection and without having to acquire a reference count.
>
> For (2) the `task_get_pid_ns()` method must be used. This will always
> acquire a reference on `PidNamespace` and will return an `Option` to
> force the caller to explicitly handle the case where `PidNamespace` is
> `None`, something that tends to be forgotten when doing the equivalent
> operation in `C`. Missing RCU primitives make it difficult to perform
> operations that are otherwise safe without holding a reference count as
> long as RCU protection is guaranteed. But it is not important currently.
> But we do want it in the future.
>
> Note for (2) the required RCU protection around calling
> `task_active_pid_ns()` synchronizes against putting the last reference
> of the associated `struct pid` of `task->thread_pid`. The `struct pid`
> stored in that field is used to retrieve the `PidNamespace` of the
> caller. When `release_task()` is called `task->thread_pid` will be
> `NULL`ed and `put_pid()` on said `struct pid` will be delayed in
> `free_pid()` via `call_rcu()` allowing everyone with an RCU protected
> access to the `struct pid` acquired from `task->thread_pid` to finish.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
Overall looks good to me, but a few comments below.
Also, I think it would be fine to send the next version without it
being a reply to the file bindings thread.
> rust/helpers/helpers.c | 1 +
> rust/helpers/pid_namespace.c | 26 ++++++++++
> rust/kernel/lib.rs | 1 +
> rust/kernel/pid_namespace.rs | 70 +++++++++++++++++++++++++
> rust/kernel/task.rs | 119 ++++++++++++++++++++++++++++++++++++++++---
> 5 files changed, 211 insertions(+), 6 deletions(-)
>
> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> index 62022b18caf5ec17231fd0e7be1234592d1146e3..d553ad9361ce17950d505c3b372a568730020e2f 100644
> --- a/rust/helpers/helpers.c
> +++ b/rust/helpers/helpers.c
> @@ -17,6 +17,7 @@
> #include "kunit.c"
> #include "mutex.c"
> #include "page.c"
> +#include "pid_namespace.c"
> #include "rbtree.c"
> #include "refcount.c"
> #include "security.c"
> diff --git a/rust/helpers/pid_namespace.c b/rust/helpers/pid_namespace.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f41482bdec9a7c4e84b81ec141027fbd65251230
> --- /dev/null
> +++ b/rust/helpers/pid_namespace.c
> @@ -0,0 +1,26 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/pid_namespace.h>
> +#include <linux/cleanup.h>
> +
> +struct pid_namespace *rust_helper_get_pid_ns(struct pid_namespace *ns)
> +{
> + return get_pid_ns(ns);
> +}
> +
> +void rust_helper_put_pid_ns(struct pid_namespace *ns)
> +{
> + put_pid_ns(ns);
> +}
> +
> +/* Get a reference on a task's pid namespace. */
> +struct pid_namespace *rust_helper_task_get_pid_ns(struct task_struct *task)
> +{
> + struct pid_namespace *pid_ns;
> +
> + guard(rcu)();
> + pid_ns = task_active_pid_ns(task);
> + if (pid_ns)
> + get_pid_ns(pid_ns);
> + return pid_ns;
> +}
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index ff7d88022c57ca232dc028066dfa062f3fc84d1c..0e78ec9d06e0199dfafc40988a2ae86cd5df949c 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -44,6 +44,7 @@
> #[cfg(CONFIG_NET)]
> pub mod net;
> pub mod page;
> +pub mod pid_namespace;
> pub mod prelude;
> pub mod print;
> pub mod sizes;
> diff --git a/rust/kernel/pid_namespace.rs b/rust/kernel/pid_namespace.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..9a0509e802b4939ad853a802ee6d069a5f00c9df
> --- /dev/null
> +++ b/rust/kernel/pid_namespace.rs
> @@ -0,0 +1,70 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +// Copyright (c) 2024 Christian Brauner <brauner@kernel.org>
> +
> +//! Pid namespaces.
> +//!
> +//! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
> +//! [`include/linux/pid.h`](srctree/include/linux/pid.h)
> +
> +use crate::{
> + bindings,
> + types::{AlwaysRefCounted, Opaque},
> +};
> +use core::{
> + ptr,
> +};
This doesn't pass the rustfmt check.
$ rustfmt --check rust/kernel/pid_namespace.rs
Diff in /home/aliceryhl/rust-for-linux/rust/kernel/pid_namespace.rs:11:
bindings,
types::{AlwaysRefCounted, Opaque},
};
-use core::{
- ptr,
-};
+use core::ptr;
/// Wraps the kernel's `struct pid_namespace`. Thread safe.
///
> + /// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
> + ///
> + /// This function can be used to create an unbounded lifetime by e.g., storing the returned
> + /// PidNamespace in a global variable which would be a bug. So the recommended way to get the
> + /// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
> + /// safe.
> + ///
> + /// # Safety
> + ///
> + /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> + pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
> + struct PidNamespaceRef<'a> {
> + task: &'a PidNamespace,
> + _not_send: NotThreadSafe,
> + }
> +
> + impl Deref for PidNamespaceRef<'_> {
> + type Target = PidNamespace;
> +
> + fn deref(&self) -> &Self::Target {
> + self.task
> + }
> + }
> +
> + let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
> + PidNamespaceRef {
> + // SAFETY: If the current thread is still running, the current task and its associated
> + // pid namespace are valid. Given that `PidNamespaceRef` is not `Send`, we know it
> + // cannot be transferred to another thread (where it could potentially outlive the
> + // current `Task`).
> + task: unsafe { &*pidns.cast() },
This could use `PidNamespace::from_ptr` instead of the cast.
Also, the safety comment about it not being Send seems incomplete. The
real reason it's okay is that the caller must ensure that the
PidNamespaceRef doesn't outlive the current task/thread.
> + /// Returns the given task's pid in the provided pid namespace.
> + pub fn task_tgid_nr_ns(&self, pidns: &PidNamespace) -> Pid {
> + // SAFETY: We know that `self.0.get()` is valid by the type invariant.
> + unsafe { bindings::task_tgid_nr_ns(self.0.get(), pidns.as_ptr()) }
> }
The underlying C function accepts null pointers for the namespace. We
could do the same by accepting `pidns: Option<&PidNamespace>`.
Alice
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2] rust: add PidNamespace
2024-10-01 10:26 ` Alice Ryhl
@ 2024-10-01 14:17 ` Christian Brauner
2024-10-01 15:45 ` Miguel Ojeda
0 siblings, 1 reply; 52+ messages in thread
From: Christian Brauner @ 2024-10-01 14:17 UTC (permalink / raw)
To: Alice Ryhl
Cc: rust-for-linux, Paul Moore, James Morris, Serge E. Hallyn,
Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
Bjoern Roy Baron, Benno Lossin, Peter Zijlstra, Alexander Viro,
Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos, Martijn Coenen,
Joel Fernandes, Carlos Llamas, Suren Baghdasaryan, Dan Williams,
Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, linux-fsdevel, Kees Cook, Andreas Hindborg
On Tue, Oct 01, 2024 at 12:26:27PM GMT, Alice Ryhl wrote:
> On Tue, Oct 1, 2024 at 11:44 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> >
> > The `PidNamespace` of a `Task` doesn't ever change once the `Task` is
> > alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)`
> > will not have an effect on the calling `Task`'s pid namespace. It will
> > only effect the pid namespace of children created by the calling `Task`.
> > This invariant guarantees that after having acquired a reference to a
> > `Task`'s pid namespace it will remain unchanged.
> >
> > When a task has exited and been reaped `release_task()` will be called.
> > This will set the `PidNamespace` of the task to `NULL`. So retrieving
> > the `PidNamespace` of a task that is dead will return `NULL`. Note, that
> > neither holding the RCU lock nor holding a referencing count to the
> > `Task` will prevent `release_task()` being called.
> >
> > In order to retrieve the `PidNamespace` of a `Task` the
> > `task_active_pid_ns()` function can be used. There are two cases to
> > consider:
> >
> > (1) retrieving the `PidNamespace` of the `current` task (2) retrieving
> > the `PidNamespace` of a non-`current` task
> >
> > From system call context retrieving the `PidNamespace` for case (1) is
> > always safe and requires neither RCU locking nor a reference count to be
> > held. Retrieving the `PidNamespace` after `release_task()` for current
> > will return `NULL` but no codepath like that is exposed to Rust.
> >
> > Retrieving the `PidNamespace` from system call context for (2) requires
> > RCU protection. Accessing `PidNamespace` outside of RCU protection
> > requires a reference count that must've been acquired while holding the
> > RCU lock. Note that accessing a non-`current` task means `NULL` can be
> > returned as the non-`current` task could have already passed through
> > `release_task()`.
> >
> > To retrieve (1) the `current_pid_ns!()` macro should be used which
> > ensure that the returned `PidNamespace` cannot outlive the calling
> > scope. The associated `current_pid_ns()` function should not be called
> > directly as it could be abused to created an unbounded lifetime for
> > `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the
> > common case of accessing `current`'s `PidNamespace` without RCU
> > protection and without having to acquire a reference count.
> >
> > For (2) the `task_get_pid_ns()` method must be used. This will always
> > acquire a reference on `PidNamespace` and will return an `Option` to
> > force the caller to explicitly handle the case where `PidNamespace` is
> > `None`, something that tends to be forgotten when doing the equivalent
> > operation in `C`. Missing RCU primitives make it difficult to perform
> > operations that are otherwise safe without holding a reference count as
> > long as RCU protection is guaranteed. But it is not important currently.
> > But we do want it in the future.
> >
> > Note for (2) the required RCU protection around calling
> > `task_active_pid_ns()` synchronizes against putting the last reference
> > of the associated `struct pid` of `task->thread_pid`. The `struct pid`
> > stored in that field is used to retrieve the `PidNamespace` of the
> > caller. When `release_task()` is called `task->thread_pid` will be
> > `NULL`ed and `put_pid()` on said `struct pid` will be delayed in
> > `free_pid()` via `call_rcu()` allowing everyone with an RCU protected
> > access to the `struct pid` acquired from `task->thread_pid` to finish.
> >
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
>
> Overall looks good to me, but a few comments below.
>
> Also, I think it would be fine to send the next version without it
> being a reply to the file bindings thread.
>
> > rust/helpers/helpers.c | 1 +
> > rust/helpers/pid_namespace.c | 26 ++++++++++
> > rust/kernel/lib.rs | 1 +
> > rust/kernel/pid_namespace.rs | 70 +++++++++++++++++++++++++
> > rust/kernel/task.rs | 119 ++++++++++++++++++++++++++++++++++++++++---
> > 5 files changed, 211 insertions(+), 6 deletions(-)
> >
> > diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> > index 62022b18caf5ec17231fd0e7be1234592d1146e3..d553ad9361ce17950d505c3b372a568730020e2f 100644
> > --- a/rust/helpers/helpers.c
> > +++ b/rust/helpers/helpers.c
> > @@ -17,6 +17,7 @@
> > #include "kunit.c"
> > #include "mutex.c"
> > #include "page.c"
> > +#include "pid_namespace.c"
> > #include "rbtree.c"
> > #include "refcount.c"
> > #include "security.c"
> > diff --git a/rust/helpers/pid_namespace.c b/rust/helpers/pid_namespace.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..f41482bdec9a7c4e84b81ec141027fbd65251230
> > --- /dev/null
> > +++ b/rust/helpers/pid_namespace.c
> > @@ -0,0 +1,26 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/pid_namespace.h>
> > +#include <linux/cleanup.h>
> > +
> > +struct pid_namespace *rust_helper_get_pid_ns(struct pid_namespace *ns)
> > +{
> > + return get_pid_ns(ns);
> > +}
> > +
> > +void rust_helper_put_pid_ns(struct pid_namespace *ns)
> > +{
> > + put_pid_ns(ns);
> > +}
> > +
> > +/* Get a reference on a task's pid namespace. */
> > +struct pid_namespace *rust_helper_task_get_pid_ns(struct task_struct *task)
> > +{
> > + struct pid_namespace *pid_ns;
> > +
> > + guard(rcu)();
> > + pid_ns = task_active_pid_ns(task);
> > + if (pid_ns)
> > + get_pid_ns(pid_ns);
> > + return pid_ns;
> > +}
> > diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> > index ff7d88022c57ca232dc028066dfa062f3fc84d1c..0e78ec9d06e0199dfafc40988a2ae86cd5df949c 100644
> > --- a/rust/kernel/lib.rs
> > +++ b/rust/kernel/lib.rs
> > @@ -44,6 +44,7 @@
> > #[cfg(CONFIG_NET)]
> > pub mod net;
> > pub mod page;
> > +pub mod pid_namespace;
> > pub mod prelude;
> > pub mod print;
> > pub mod sizes;
> > diff --git a/rust/kernel/pid_namespace.rs b/rust/kernel/pid_namespace.rs
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9a0509e802b4939ad853a802ee6d069a5f00c9df
> > --- /dev/null
> > +++ b/rust/kernel/pid_namespace.rs
> > @@ -0,0 +1,70 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +// Copyright (c) 2024 Christian Brauner <brauner@kernel.org>
> > +
> > +//! Pid namespaces.
> > +//!
> > +//! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
> > +//! [`include/linux/pid.h`](srctree/include/linux/pid.h)
> > +
> > +use crate::{
> > + bindings,
> > + types::{AlwaysRefCounted, Opaque},
> > +};
> > +use core::{
> > + ptr,
> > +};
>
> This doesn't pass the rustfmt check.
Ok. Why does it pass the build then? Seems like it should just fail the build.
>
> $ rustfmt --check rust/kernel/pid_namespace.rs
> Diff in /home/aliceryhl/rust-for-linux/rust/kernel/pid_namespace.rs:11:
> bindings,
> types::{AlwaysRefCounted, Opaque},
> };
> -use core::{
> - ptr,
> -};
> +use core::ptr;
>
> /// Wraps the kernel's `struct pid_namespace`. Thread safe.
> ///
>
> > + /// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
> > + ///
> > + /// This function can be used to create an unbounded lifetime by e.g., storing the returned
> > + /// PidNamespace in a global variable which would be a bug. So the recommended way to get the
> > + /// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
> > + /// safe.
> > + ///
> > + /// # Safety
> > + ///
> > + /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> > + pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
> > + struct PidNamespaceRef<'a> {
> > + task: &'a PidNamespace,
> > + _not_send: NotThreadSafe,
> > + }
> > +
> > + impl Deref for PidNamespaceRef<'_> {
> > + type Target = PidNamespace;
> > +
> > + fn deref(&self) -> &Self::Target {
> > + self.task
> > + }
> > + }
> > +
> > + let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
> > + PidNamespaceRef {
> > + // SAFETY: If the current thread is still running, the current task and its associated
> > + // pid namespace are valid. Given that `PidNamespaceRef` is not `Send`, we know it
> > + // cannot be transferred to another thread (where it could potentially outlive the
> > + // current `Task`).
> > + task: unsafe { &*pidns.cast() },
>
> This could use `PidNamespace::from_ptr` instead of the cast.
Ok.
> Also, the safety comment about it not being Send seems incomplete. The
> real reason it's okay is that the caller must ensure that the
> PidNamespaceRef doesn't outlive the current task/thread.
Right, but that already documented at the top of the function.
>
> > + /// Returns the given task's pid in the provided pid namespace.
> > + pub fn task_tgid_nr_ns(&self, pidns: &PidNamespace) -> Pid {
> > + // SAFETY: We know that `self.0.get()` is valid by the type invariant.
> > + unsafe { bindings::task_tgid_nr_ns(self.0.get(), pidns.as_ptr()) }
> > }
>
> The underlying C function accepts null pointers for the namespace. We
> could do the same by accepting `pidns: Option<&PidNamespace>`.
Seems fine.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2] rust: add PidNamespace
2024-10-01 14:17 ` Christian Brauner
@ 2024-10-01 15:45 ` Miguel Ojeda
2024-10-02 10:14 ` Christian Brauner
0 siblings, 1 reply; 52+ messages in thread
From: Miguel Ojeda @ 2024-10-01 15:45 UTC (permalink / raw)
To: Christian Brauner, Stephen Rothwell
Cc: Alice Ryhl, rust-for-linux, Paul Moore, James Morris,
Serge E. Hallyn, Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho,
Boqun Feng, Bjoern Roy Baron, Benno Lossin, Peter Zijlstra,
Alexander Viro, Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos,
Martijn Coenen, Joel Fernandes, Carlos Llamas, Suren Baghdasaryan,
Dan Williams, Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, linux-fsdevel, Kees Cook, Andreas Hindborg
On Tue, Oct 1, 2024 at 4:17 PM Christian Brauner <brauner@kernel.org> wrote:
>
> Ok. Why does it pass the build then? Seems like it should just fail the build.
It is part of `make rustfmt` / `make rustfmtcheck`.
I would be happy to make it part of the normal build if people agree
-- though it could be annoying in some cases, e.g. iterating small
changes while developing.
If we do that, it would be nice if -next does it too, but I think
Stephen is already building Rust for x86_64 allmodconfig (Cc'd).
Cheers,
Miguel
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2] rust: add PidNamespace
2024-10-01 9:43 ` [PATCH v2] rust: add PidNamespace Christian Brauner
2024-10-01 10:26 ` Alice Ryhl
@ 2024-10-01 19:10 ` Gary Guo
2024-10-02 11:05 ` Christian Brauner
1 sibling, 1 reply; 52+ messages in thread
From: Gary Guo @ 2024-10-01 19:10 UTC (permalink / raw)
To: Christian Brauner
Cc: Alice Ryhl, rust-for-linux, Paul Moore, James Morris,
Serge E. Hallyn, Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho,
Boqun Feng, Bjoern Roy Baron, Benno Lossin, Peter Zijlstra,
Alexander Viro, Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos,
Martijn Coenen, Joel Fernandes, Carlos Llamas, Suren Baghdasaryan,
Dan Williams, Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, linux-fsdevel, Kees Cook, Andreas Hindborg
On Tue, 01 Oct 2024 11:43:42 +0200
Christian Brauner <brauner@kernel.org> wrote:
> The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
>
> The `PidNamespace` of a `Task` doesn't ever change once the `Task` is
> alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)`
> will not have an effect on the calling `Task`'s pid namespace. It will
> only effect the pid namespace of children created by the calling `Task`.
> This invariant guarantees that after having acquired a reference to a
> `Task`'s pid namespace it will remain unchanged.
>
> When a task has exited and been reaped `release_task()` will be called.
> This will set the `PidNamespace` of the task to `NULL`. So retrieving
> the `PidNamespace` of a task that is dead will return `NULL`. Note, that
> neither holding the RCU lock nor holding a referencing count to the
> `Task` will prevent `release_task()` being called.
>
> In order to retrieve the `PidNamespace` of a `Task` the
> `task_active_pid_ns()` function can be used. There are two cases to
> consider:
>
> (1) retrieving the `PidNamespace` of the `current` task (2) retrieving
> the `PidNamespace` of a non-`current` task
>
> From system call context retrieving the `PidNamespace` for case (1) is
> always safe and requires neither RCU locking nor a reference count to be
> held. Retrieving the `PidNamespace` after `release_task()` for current
> will return `NULL` but no codepath like that is exposed to Rust.
>
> Retrieving the `PidNamespace` from system call context for (2) requires
> RCU protection. Accessing `PidNamespace` outside of RCU protection
> requires a reference count that must've been acquired while holding the
> RCU lock. Note that accessing a non-`current` task means `NULL` can be
> returned as the non-`current` task could have already passed through
> `release_task()`.
>
> To retrieve (1) the `current_pid_ns!()` macro should be used which
> ensure that the returned `PidNamespace` cannot outlive the calling
> scope. The associated `current_pid_ns()` function should not be called
> directly as it could be abused to created an unbounded lifetime for
> `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the
> common case of accessing `current`'s `PidNamespace` without RCU
> protection and without having to acquire a reference count.
>
> For (2) the `task_get_pid_ns()` method must be used. This will always
> acquire a reference on `PidNamespace` and will return an `Option` to
> force the caller to explicitly handle the case where `PidNamespace` is
> `None`, something that tends to be forgotten when doing the equivalent
> operation in `C`. Missing RCU primitives make it difficult to perform
> operations that are otherwise safe without holding a reference count as
> long as RCU protection is guaranteed. But it is not important currently.
> But we do want it in the future.
>
> Note for (2) the required RCU protection around calling
> `task_active_pid_ns()` synchronizes against putting the last reference
> of the associated `struct pid` of `task->thread_pid`. The `struct pid`
> stored in that field is used to retrieve the `PidNamespace` of the
> caller. When `release_task()` is called `task->thread_pid` will be
> `NULL`ed and `put_pid()` on said `struct pid` will be delayed in
> `free_pid()` via `call_rcu()` allowing everyone with an RCU protected
> access to the `struct pid` acquired from `task->thread_pid` to finish.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
> rust/helpers/helpers.c | 1 +
> rust/helpers/pid_namespace.c | 26 ++++++++++
> rust/kernel/lib.rs | 1 +
> rust/kernel/pid_namespace.rs | 70 +++++++++++++++++++++++++
> rust/kernel/task.rs | 119 ++++++++++++++++++++++++++++++++++++++++---
> 5 files changed, 211 insertions(+), 6 deletions(-)
>
> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> index 62022b18caf5ec17231fd0e7be1234592d1146e3..d553ad9361ce17950d505c3b372a568730020e2f 100644
> --- a/rust/helpers/helpers.c
> +++ b/rust/helpers/helpers.c
> @@ -17,6 +17,7 @@
> #include "kunit.c"
> #include "mutex.c"
> #include "page.c"
> +#include "pid_namespace.c"
> #include "rbtree.c"
> #include "refcount.c"
> #include "security.c"
> diff --git a/rust/helpers/pid_namespace.c b/rust/helpers/pid_namespace.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f41482bdec9a7c4e84b81ec141027fbd65251230
> --- /dev/null
> +++ b/rust/helpers/pid_namespace.c
> @@ -0,0 +1,26 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/pid_namespace.h>
> +#include <linux/cleanup.h>
> +
> +struct pid_namespace *rust_helper_get_pid_ns(struct pid_namespace *ns)
> +{
> + return get_pid_ns(ns);
> +}
> +
> +void rust_helper_put_pid_ns(struct pid_namespace *ns)
> +{
> + put_pid_ns(ns);
> +}
> +
> +/* Get a reference on a task's pid namespace. */
> +struct pid_namespace *rust_helper_task_get_pid_ns(struct task_struct *task)
> +{
> + struct pid_namespace *pid_ns;
> +
> + guard(rcu)();
> + pid_ns = task_active_pid_ns(task);
> + if (pid_ns)
> + get_pid_ns(pid_ns);
> + return pid_ns;
> +}
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index ff7d88022c57ca232dc028066dfa062f3fc84d1c..0e78ec9d06e0199dfafc40988a2ae86cd5df949c 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -44,6 +44,7 @@
> #[cfg(CONFIG_NET)]
> pub mod net;
> pub mod page;
> +pub mod pid_namespace;
> pub mod prelude;
> pub mod print;
> pub mod sizes;
> diff --git a/rust/kernel/pid_namespace.rs b/rust/kernel/pid_namespace.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..9a0509e802b4939ad853a802ee6d069a5f00c9df
> --- /dev/null
> +++ b/rust/kernel/pid_namespace.rs
> @@ -0,0 +1,70 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +// Copyright (c) 2024 Christian Brauner <brauner@kernel.org>
> +
> +//! Pid namespaces.
> +//!
> +//! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
> +//! [`include/linux/pid.h`](srctree/include/linux/pid.h)
> +
> +use crate::{
> + bindings,
> + types::{AlwaysRefCounted, Opaque},
> +};
> +use core::{
> + ptr,
> +};
> +
> +/// Wraps the kernel's `struct pid_namespace`. Thread safe.
> +///
> +/// This structure represents the Rust abstraction for a C `struct pid_namespace`. This
> +/// implementation abstracts the usage of an already existing C `struct pid_namespace` within Rust
> +/// code that we get passed from the C side.
> +#[repr(transparent)]
> +pub struct PidNamespace {
> + inner: Opaque<bindings::pid_namespace>,
> +}
> +
> +impl PidNamespace {
> + /// Returns a raw pointer to the inner C struct.
> + #[inline]
> + pub fn as_ptr(&self) -> *mut bindings::pid_namespace {
> + self.inner.get()
> + }
> +
> + /// Creates a reference to a [`PidNamespace`] from a valid pointer.
> + ///
> + /// # Safety
> + ///
> + /// The caller must ensure that `ptr` is valid and remains valid for the lifetime of the
> + /// returned [`PidNamespace`] reference.
> + pub unsafe fn from_ptr<'a>(ptr: *const bindings::pid_namespace) -> &'a Self {
> + // SAFETY: The safety requirements guarantee the validity of the dereference, while the
> + // `PidNamespace` type being transparent makes the cast ok.
> + unsafe { &*ptr.cast() }
> + }
> +}
> +
> +// SAFETY: Instances of `PidNamespace` are always reference-counted.
> +unsafe impl AlwaysRefCounted for PidNamespace {
> + #[inline]
> + fn inc_ref(&self) {
> + // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> + unsafe { bindings::get_pid_ns(self.as_ptr()) };
> + }
> +
> + #[inline]
> + unsafe fn dec_ref(obj: ptr::NonNull<PidNamespace>) {
> + // SAFETY: The safety requirements guarantee that the refcount is non-zero.
> + unsafe { bindings::put_pid_ns(obj.cast().as_ptr()) }
> + }
> +}
> +
> +// SAFETY:
> +// - `PidNamespace::dec_ref` can be called from any thread.
> +// - It is okay to send ownership of `PidNamespace` across thread boundaries.
> +unsafe impl Send for PidNamespace {}
> +
> +// SAFETY: It's OK to access `PidNamespace` through shared references from other threads because
> +// we're either accessing properties that don't change or that are properly synchronised by C code.
> +unsafe impl Sync for PidNamespace {}
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 1a36a9f193685393e7211793b6e6dd7576af8bfd..92603cdb543d9617f1f7d092edb87ccb66c9f0c1 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -6,7 +6,8 @@
>
> use crate::{
> bindings,
> - types::{NotThreadSafe, Opaque},
> + pid_namespace::PidNamespace,
> + types::{ARef, NotThreadSafe, Opaque},
> };
> use core::{
> cmp::{Eq, PartialEq},
> @@ -36,6 +37,65 @@ macro_rules! current {
> };
> }
>
> +/// Returns the currently running task's pid namespace.
> +///
> +/// The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> +///
> +/// The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> +/// `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect on the
> +/// calling `Task`'s pid namespace. It will only effect the pid namespace of children created by
> +/// the calling `Task`. This invariant guarantees that after having acquired a reference to a
> +/// `Task`'s pid namespace it will remain unchanged.
> +///
> +/// When a task has exited and been reaped `release_task()` will be called. This will set the
> +/// `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task that is dead
> +/// will return `NULL`. Note, that neither holding the RCU lock nor holding a referencing count to
> +/// the `Task` will prevent `release_task()` being called.
> +///
> +/// In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function can be
> +/// used. There are two cases to consider:
> +///
> +/// (1) retrieving the `PidNamespace` of the `current` task
> +/// (2) retrieving the `PidNamespace` of a non-`current` task
> +///
> +/// From system call context retrieving the `PidNamespace` for case (1) is always safe and requires
> +/// neither RCU locking nor a reference count to be held. Retrieving the `PidNamespace` after
> +/// `release_task()` for current will return `NULL` but no codepath like that is exposed to Rust.
> +///
> +/// Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> +/// Accessing `PidNamespace` outside of RCU protection requires a reference count that must've been
> +/// acquired while holding the RCU lock. Note that accessing a non-`current` task means `NULL` can
> +/// be returned as the non-`current` task could have already passed through `release_task()`.
> +///
> +/// To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the returned
> +/// `PidNamespace` cannot outlive the calling scope. The associated `current_pid_ns()` function
> +/// should not be called directly as it could be abused to created an unbounded lifetime for
> +/// `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the common case of
> +/// accessing `current`'s `PidNamespace` without RCU protection and without having to acquire a
> +/// reference count.
> +///
> +/// For (2) the `task_get_pid_ns()` method must be used. This will always acquire a reference on
> +/// `PidNamespace` and will return an `Option` to force the caller to explicitly handle the case
> +/// where `PidNamespace` is `None`, something that tends to be forgotten when doing the equivalent
> +/// operation in `C`. Missing RCU primitives make it difficult to perform operations that are
> +/// otherwise safe without holding a reference count as long as RCU protection is guaranteed. But
> +/// it is not important currently. But we do want it in the future.
> +///
> +/// Note for (2) the required RCU protection around calling `task_active_pid_ns()` synchronizes
> +/// against putting the last reference of the associated `struct pid` of `task->thread_pid`.
> +/// The `struct pid` stored in that field is used to retrieve the `PidNamespace` of the caller.
> +/// When `release_task()` is called `task->thread_pid` will be `NULL`ed and `put_pid()` on said
> +/// `struct pid` will be delayed in `free_pid()` via `call_rcu()` allowing everyone with an RCU
> +/// protected access to the `struct pid` acquired from `task->thread_pid` to finish.
Is the comment here in the wrong place? The macro here is just getting
`current` one. Perhaps move it to the `task_get_pid_ns`, and as a
normal comment, since this is impl detail and not something for user to
worry about (yet)?
> +#[macro_export]
> +macro_rules! current_pid_ns {
> + () => {
> + // SAFETY: Deref + addr-of below create a temporary `PidNamespaceRef` that cannot outlive
> + // the caller.
> + unsafe { &*$crate::task::Task::current_pid_ns() }
> + };
> +}
> +
> /// Wraps the kernel's `struct task_struct`.
> ///
> /// # Invariants
> @@ -145,6 +205,41 @@ fn deref(&self) -> &Self::Target {
> }
> }
>
> + /// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
> + ///
> + /// This function can be used to create an unbounded lifetime by e.g., storing the returned
> + /// PidNamespace in a global variable which would be a bug. So the recommended way to get the
> + /// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
> + /// safe.
> + ///
> + /// # Safety
> + ///
> + /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> + pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
> + struct PidNamespaceRef<'a> {
> + task: &'a PidNamespace,
> + _not_send: NotThreadSafe,
> + }
> +
> + impl Deref for PidNamespaceRef<'_> {
> + type Target = PidNamespace;
> +
> + fn deref(&self) -> &Self::Target {
> + self.task
> + }
> + }
> +
> + let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
> + PidNamespaceRef {
> + // SAFETY: If the current thread is still running, the current task and its associated
> + // pid namespace are valid. Given that `PidNamespaceRef` is not `Send`, we know it
> + // cannot be transferred to another thread (where it could potentially outlive the
> + // current `Task`).
> + task: unsafe { &*pidns.cast() },
> + _not_send: NotThreadSafe,
> + }
> + }
> +
> /// Returns the group leader of the given task.
> pub fn group_leader(&self) -> &Task {
> // SAFETY: By the type invariant, we know that `self.0` is a valid task. Valid tasks always
> @@ -182,11 +277,23 @@ pub fn signal_pending(&self) -> bool {
> unsafe { bindings::signal_pending(self.0.get()) != 0 }
> }
>
> - /// Returns the given task's pid in the current pid namespace.
> - pub fn pid_in_current_ns(&self) -> Pid {
> - // SAFETY: We know that `self.0.get()` is valid by the type invariant, and passing a null
> - // pointer as the namespace is correct for using the current namespace.
> - unsafe { bindings::task_tgid_nr_ns(self.0.get(), ptr::null_mut()) }
> + /// Returns task's pid namespace with elevated reference count
> + pub fn task_get_pid_ns(&self) -> Option<ARef<PidNamespace>> {
Given that this is within `Task`, the full name of the function became
`Task::task_get_pid_ns`. So this can just be `get_pid_ns`?
> + let ptr = unsafe { bindings::task_get_pid_ns(self.0.get()) };
> + if ptr.is_null() {
> + None
> + } else {
> + // SAFETY: `ptr` is valid by the safety requirements of this function. And we own a
> + // reference count via `task_get_pid_ns()`.
> + // CAST: `Self` is a `repr(transparent)` wrapper around `bindings::pid_namespace`.
> + Some(unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr.cast::<PidNamespace>())) })
> + }
> + }
> +
> + /// Returns the given task's pid in the provided pid namespace.
> + pub fn task_tgid_nr_ns(&self, pidns: &PidNamespace) -> Pid {
Similarly, this can drop `task_` prefix as it's already scoped to
`Task`.
PS. I think I quite like the more descriptive name in Alice's patch,
maybe `Task::tgid_in_ns` could be a good name for this?
If there's concern about documentation searchability, there is a
feature in rustdoc where you can put
#[doc(alias = "task_tgid_nr_ns")]
and then the function will be searchable with the C name.
> + // SAFETY: We know that `self.0.get()` is valid by the type invariant.
> + unsafe { bindings::task_tgid_nr_ns(self.0.get(), pidns.as_ptr()) }
> }
>
> /// Wakes up the task.
>
> ---
> base-commit: e9980e40804730de33c1563d9ac74d5b51591ec0
> change-id: 20241001-brauner-rust-pid_namespace-52b0c92c8359
>
>
Best,
Gary
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2] rust: add PidNamespace
2024-10-01 15:45 ` Miguel Ojeda
@ 2024-10-02 10:14 ` Christian Brauner
2024-10-02 11:08 ` Miguel Ojeda
0 siblings, 1 reply; 52+ messages in thread
From: Christian Brauner @ 2024-10-02 10:14 UTC (permalink / raw)
To: Miguel Ojeda
Cc: Stephen Rothwell, Alice Ryhl, rust-for-linux, Paul Moore,
James Morris, Serge E. Hallyn, Miguel Ojeda, Alex Gaynor,
Wedson Almeida Filho, Boqun Feng, Bjoern Roy Baron, Benno Lossin,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjonnevag, Todd Kjos, Martijn Coenen, Joel Fernandes,
Carlos Llamas, Suren Baghdasaryan, Dan Williams, Matthew Wilcox,
Thomas Gleixner, Daniel Xu, Martin Rodriguez Reboredo,
Trevor Gross, linux-kernel, linux-security-module, linux-fsdevel,
Kees Cook, Andreas Hindborg
On Tue, Oct 01, 2024 at 05:45:15PM GMT, Miguel Ojeda wrote:
> On Tue, Oct 1, 2024 at 4:17 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > Ok. Why does it pass the build then? Seems like it should just fail the build.
>
> It is part of `make rustfmt` / `make rustfmtcheck`.
>
> I would be happy to make it part of the normal build if people agree
> -- though it could be annoying in some cases, e.g. iterating small
> changes while developing.
You could consider adding a way to turn it off then instead of turning
it on.
>
> If we do that, it would be nice if -next does it too, but I think
> Stephen is already building Rust for x86_64 allmodconfig (Cc'd).
Imho, since Rust enforces code formatting style I see no point in not
immediately failing the build because of formatting issues.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2] rust: add PidNamespace
2024-10-01 19:10 ` Gary Guo
@ 2024-10-02 11:05 ` Christian Brauner
0 siblings, 0 replies; 52+ messages in thread
From: Christian Brauner @ 2024-10-02 11:05 UTC (permalink / raw)
To: Gary Guo
Cc: Alice Ryhl, rust-for-linux, Paul Moore, James Morris,
Serge E. Hallyn, Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho,
Boqun Feng, Bjoern Roy Baron, Benno Lossin, Peter Zijlstra,
Alexander Viro, Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos,
Martijn Coenen, Joel Fernandes, Carlos Llamas, Suren Baghdasaryan,
Dan Williams, Matthew Wilcox, Thomas Gleixner, Daniel Xu,
Martin Rodriguez Reboredo, Trevor Gross, linux-kernel,
linux-security-module, linux-fsdevel, Kees Cook, Andreas Hindborg
On Tue, Oct 01, 2024 at 08:10:54PM GMT, Gary Guo wrote:
> On Tue, 01 Oct 2024 11:43:42 +0200
> Christian Brauner <brauner@kernel.org> wrote:
>
> > The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> >
> > The `PidNamespace` of a `Task` doesn't ever change once the `Task` is
> > alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)`
> > will not have an effect on the calling `Task`'s pid namespace. It will
> > only effect the pid namespace of children created by the calling `Task`.
> > This invariant guarantees that after having acquired a reference to a
> > `Task`'s pid namespace it will remain unchanged.
> >
> > When a task has exited and been reaped `release_task()` will be called.
> > This will set the `PidNamespace` of the task to `NULL`. So retrieving
> > the `PidNamespace` of a task that is dead will return `NULL`. Note, that
> > neither holding the RCU lock nor holding a referencing count to the
> > `Task` will prevent `release_task()` being called.
> >
> > In order to retrieve the `PidNamespace` of a `Task` the
> > `task_active_pid_ns()` function can be used. There are two cases to
> > consider:
> >
> > (1) retrieving the `PidNamespace` of the `current` task (2) retrieving
> > the `PidNamespace` of a non-`current` task
> >
> > From system call context retrieving the `PidNamespace` for case (1) is
> > always safe and requires neither RCU locking nor a reference count to be
> > held. Retrieving the `PidNamespace` after `release_task()` for current
> > will return `NULL` but no codepath like that is exposed to Rust.
> >
> > Retrieving the `PidNamespace` from system call context for (2) requires
> > RCU protection. Accessing `PidNamespace` outside of RCU protection
> > requires a reference count that must've been acquired while holding the
> > RCU lock. Note that accessing a non-`current` task means `NULL` can be
> > returned as the non-`current` task could have already passed through
> > `release_task()`.
> >
> > To retrieve (1) the `current_pid_ns!()` macro should be used which
> > ensure that the returned `PidNamespace` cannot outlive the calling
> > scope. The associated `current_pid_ns()` function should not be called
> > directly as it could be abused to created an unbounded lifetime for
> > `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the
> > common case of accessing `current`'s `PidNamespace` without RCU
> > protection and without having to acquire a reference count.
> >
> > For (2) the `task_get_pid_ns()` method must be used. This will always
> > acquire a reference on `PidNamespace` and will return an `Option` to
> > force the caller to explicitly handle the case where `PidNamespace` is
> > `None`, something that tends to be forgotten when doing the equivalent
> > operation in `C`. Missing RCU primitives make it difficult to perform
> > operations that are otherwise safe without holding a reference count as
> > long as RCU protection is guaranteed. But it is not important currently.
> > But we do want it in the future.
> >
> > Note for (2) the required RCU protection around calling
> > `task_active_pid_ns()` synchronizes against putting the last reference
> > of the associated `struct pid` of `task->thread_pid`. The `struct pid`
> > stored in that field is used to retrieve the `PidNamespace` of the
> > caller. When `release_task()` is called `task->thread_pid` will be
> > `NULL`ed and `put_pid()` on said `struct pid` will be delayed in
> > `free_pid()` via `call_rcu()` allowing everyone with an RCU protected
> > access to the `struct pid` acquired from `task->thread_pid` to finish.
> >
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > ---
> > rust/helpers/helpers.c | 1 +
> > rust/helpers/pid_namespace.c | 26 ++++++++++
> > rust/kernel/lib.rs | 1 +
> > rust/kernel/pid_namespace.rs | 70 +++++++++++++++++++++++++
> > rust/kernel/task.rs | 119 ++++++++++++++++++++++++++++++++++++++++---
> > 5 files changed, 211 insertions(+), 6 deletions(-)
> >
> > diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> > index 62022b18caf5ec17231fd0e7be1234592d1146e3..d553ad9361ce17950d505c3b372a568730020e2f 100644
> > --- a/rust/helpers/helpers.c
> > +++ b/rust/helpers/helpers.c
> > @@ -17,6 +17,7 @@
> > #include "kunit.c"
> > #include "mutex.c"
> > #include "page.c"
> > +#include "pid_namespace.c"
> > #include "rbtree.c"
> > #include "refcount.c"
> > #include "security.c"
> > diff --git a/rust/helpers/pid_namespace.c b/rust/helpers/pid_namespace.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..f41482bdec9a7c4e84b81ec141027fbd65251230
> > --- /dev/null
> > +++ b/rust/helpers/pid_namespace.c
> > @@ -0,0 +1,26 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/pid_namespace.h>
> > +#include <linux/cleanup.h>
> > +
> > +struct pid_namespace *rust_helper_get_pid_ns(struct pid_namespace *ns)
> > +{
> > + return get_pid_ns(ns);
> > +}
> > +
> > +void rust_helper_put_pid_ns(struct pid_namespace *ns)
> > +{
> > + put_pid_ns(ns);
> > +}
> > +
> > +/* Get a reference on a task's pid namespace. */
> > +struct pid_namespace *rust_helper_task_get_pid_ns(struct task_struct *task)
> > +{
> > + struct pid_namespace *pid_ns;
> > +
> > + guard(rcu)();
> > + pid_ns = task_active_pid_ns(task);
> > + if (pid_ns)
> > + get_pid_ns(pid_ns);
> > + return pid_ns;
> > +}
> > diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> > index ff7d88022c57ca232dc028066dfa062f3fc84d1c..0e78ec9d06e0199dfafc40988a2ae86cd5df949c 100644
> > --- a/rust/kernel/lib.rs
> > +++ b/rust/kernel/lib.rs
> > @@ -44,6 +44,7 @@
> > #[cfg(CONFIG_NET)]
> > pub mod net;
> > pub mod page;
> > +pub mod pid_namespace;
> > pub mod prelude;
> > pub mod print;
> > pub mod sizes;
> > diff --git a/rust/kernel/pid_namespace.rs b/rust/kernel/pid_namespace.rs
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9a0509e802b4939ad853a802ee6d069a5f00c9df
> > --- /dev/null
> > +++ b/rust/kernel/pid_namespace.rs
> > @@ -0,0 +1,70 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +// Copyright (c) 2024 Christian Brauner <brauner@kernel.org>
> > +
> > +//! Pid namespaces.
> > +//!
> > +//! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
> > +//! [`include/linux/pid.h`](srctree/include/linux/pid.h)
> > +
> > +use crate::{
> > + bindings,
> > + types::{AlwaysRefCounted, Opaque},
> > +};
> > +use core::{
> > + ptr,
> > +};
> > +
> > +/// Wraps the kernel's `struct pid_namespace`. Thread safe.
> > +///
> > +/// This structure represents the Rust abstraction for a C `struct pid_namespace`. This
> > +/// implementation abstracts the usage of an already existing C `struct pid_namespace` within Rust
> > +/// code that we get passed from the C side.
> > +#[repr(transparent)]
> > +pub struct PidNamespace {
> > + inner: Opaque<bindings::pid_namespace>,
> > +}
> > +
> > +impl PidNamespace {
> > + /// Returns a raw pointer to the inner C struct.
> > + #[inline]
> > + pub fn as_ptr(&self) -> *mut bindings::pid_namespace {
> > + self.inner.get()
> > + }
> > +
> > + /// Creates a reference to a [`PidNamespace`] from a valid pointer.
> > + ///
> > + /// # Safety
> > + ///
> > + /// The caller must ensure that `ptr` is valid and remains valid for the lifetime of the
> > + /// returned [`PidNamespace`] reference.
> > + pub unsafe fn from_ptr<'a>(ptr: *const bindings::pid_namespace) -> &'a Self {
> > + // SAFETY: The safety requirements guarantee the validity of the dereference, while the
> > + // `PidNamespace` type being transparent makes the cast ok.
> > + unsafe { &*ptr.cast() }
> > + }
> > +}
> > +
> > +// SAFETY: Instances of `PidNamespace` are always reference-counted.
> > +unsafe impl AlwaysRefCounted for PidNamespace {
> > + #[inline]
> > + fn inc_ref(&self) {
> > + // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> > + unsafe { bindings::get_pid_ns(self.as_ptr()) };
> > + }
> > +
> > + #[inline]
> > + unsafe fn dec_ref(obj: ptr::NonNull<PidNamespace>) {
> > + // SAFETY: The safety requirements guarantee that the refcount is non-zero.
> > + unsafe { bindings::put_pid_ns(obj.cast().as_ptr()) }
> > + }
> > +}
> > +
> > +// SAFETY:
> > +// - `PidNamespace::dec_ref` can be called from any thread.
> > +// - It is okay to send ownership of `PidNamespace` across thread boundaries.
> > +unsafe impl Send for PidNamespace {}
> > +
> > +// SAFETY: It's OK to access `PidNamespace` through shared references from other threads because
> > +// we're either accessing properties that don't change or that are properly synchronised by C code.
> > +unsafe impl Sync for PidNamespace {}
> > diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> > index 1a36a9f193685393e7211793b6e6dd7576af8bfd..92603cdb543d9617f1f7d092edb87ccb66c9f0c1 100644
> > --- a/rust/kernel/task.rs
> > +++ b/rust/kernel/task.rs
> > @@ -6,7 +6,8 @@
> >
> > use crate::{
> > bindings,
> > - types::{NotThreadSafe, Opaque},
> > + pid_namespace::PidNamespace,
> > + types::{ARef, NotThreadSafe, Opaque},
> > };
> > use core::{
> > cmp::{Eq, PartialEq},
> > @@ -36,6 +37,65 @@ macro_rules! current {
> > };
> > }
> >
> > +/// Returns the currently running task's pid namespace.
> > +///
> > +/// The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> > +///
> > +/// The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> > +/// `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect on the
> > +/// calling `Task`'s pid namespace. It will only effect the pid namespace of children created by
> > +/// the calling `Task`. This invariant guarantees that after having acquired a reference to a
> > +/// `Task`'s pid namespace it will remain unchanged.
> > +///
> > +/// When a task has exited and been reaped `release_task()` will be called. This will set the
> > +/// `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task that is dead
> > +/// will return `NULL`. Note, that neither holding the RCU lock nor holding a referencing count to
> > +/// the `Task` will prevent `release_task()` being called.
> > +///
> > +/// In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function can be
> > +/// used. There are two cases to consider:
> > +///
> > +/// (1) retrieving the `PidNamespace` of the `current` task
> > +/// (2) retrieving the `PidNamespace` of a non-`current` task
> > +///
> > +/// From system call context retrieving the `PidNamespace` for case (1) is always safe and requires
> > +/// neither RCU locking nor a reference count to be held. Retrieving the `PidNamespace` after
> > +/// `release_task()` for current will return `NULL` but no codepath like that is exposed to Rust.
> > +///
> > +/// Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> > +/// Accessing `PidNamespace` outside of RCU protection requires a reference count that must've been
> > +/// acquired while holding the RCU lock. Note that accessing a non-`current` task means `NULL` can
> > +/// be returned as the non-`current` task could have already passed through `release_task()`.
> > +///
> > +/// To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the returned
> > +/// `PidNamespace` cannot outlive the calling scope. The associated `current_pid_ns()` function
> > +/// should not be called directly as it could be abused to created an unbounded lifetime for
> > +/// `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the common case of
> > +/// accessing `current`'s `PidNamespace` without RCU protection and without having to acquire a
> > +/// reference count.
> > +///
> > +/// For (2) the `task_get_pid_ns()` method must be used. This will always acquire a reference on
> > +/// `PidNamespace` and will return an `Option` to force the caller to explicitly handle the case
> > +/// where `PidNamespace` is `None`, something that tends to be forgotten when doing the equivalent
> > +/// operation in `C`. Missing RCU primitives make it difficult to perform operations that are
> > +/// otherwise safe without holding a reference count as long as RCU protection is guaranteed. But
> > +/// it is not important currently. But we do want it in the future.
> > +///
> > +/// Note for (2) the required RCU protection around calling `task_active_pid_ns()` synchronizes
> > +/// against putting the last reference of the associated `struct pid` of `task->thread_pid`.
> > +/// The `struct pid` stored in that field is used to retrieve the `PidNamespace` of the caller.
> > +/// When `release_task()` is called `task->thread_pid` will be `NULL`ed and `put_pid()` on said
> > +/// `struct pid` will be delayed in `free_pid()` via `call_rcu()` allowing everyone with an RCU
> > +/// protected access to the `struct pid` acquired from `task->thread_pid` to finish.
>
> Is the comment here in the wrong place? The macro here is just getting
> `current` one. Perhaps move it to the `task_get_pid_ns`, and as a
> normal comment, since this is impl detail and not something for user to
> worry about (yet)?
Sure.
>
> > +#[macro_export]
> > +macro_rules! current_pid_ns {
> > + () => {
> > + // SAFETY: Deref + addr-of below create a temporary `PidNamespaceRef` that cannot outlive
> > + // the caller.
> > + unsafe { &*$crate::task::Task::current_pid_ns() }
> > + };
> > +}
> > +
> > /// Wraps the kernel's `struct task_struct`.
> > ///
> > /// # Invariants
> > @@ -145,6 +205,41 @@ fn deref(&self) -> &Self::Target {
> > }
> > }
> >
> > + /// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
> > + ///
> > + /// This function can be used to create an unbounded lifetime by e.g., storing the returned
> > + /// PidNamespace in a global variable which would be a bug. So the recommended way to get the
> > + /// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
> > + /// safe.
> > + ///
> > + /// # Safety
> > + ///
> > + /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> > + pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
> > + struct PidNamespaceRef<'a> {
> > + task: &'a PidNamespace,
> > + _not_send: NotThreadSafe,
> > + }
> > +
> > + impl Deref for PidNamespaceRef<'_> {
> > + type Target = PidNamespace;
> > +
> > + fn deref(&self) -> &Self::Target {
> > + self.task
> > + }
> > + }
> > +
> > + let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
> > + PidNamespaceRef {
> > + // SAFETY: If the current thread is still running, the current task and its associated
> > + // pid namespace are valid. Given that `PidNamespaceRef` is not `Send`, we know it
> > + // cannot be transferred to another thread (where it could potentially outlive the
> > + // current `Task`).
> > + task: unsafe { &*pidns.cast() },
> > + _not_send: NotThreadSafe,
> > + }
> > + }
> > +
> > /// Returns the group leader of the given task.
> > pub fn group_leader(&self) -> &Task {
> > // SAFETY: By the type invariant, we know that `self.0` is a valid task. Valid tasks always
> > @@ -182,11 +277,23 @@ pub fn signal_pending(&self) -> bool {
> > unsafe { bindings::signal_pending(self.0.get()) != 0 }
> > }
> >
> > - /// Returns the given task's pid in the current pid namespace.
> > - pub fn pid_in_current_ns(&self) -> Pid {
> > - // SAFETY: We know that `self.0.get()` is valid by the type invariant, and passing a null
> > - // pointer as the namespace is correct for using the current namespace.
> > - unsafe { bindings::task_tgid_nr_ns(self.0.get(), ptr::null_mut()) }
> > + /// Returns task's pid namespace with elevated reference count
> > + pub fn task_get_pid_ns(&self) -> Option<ARef<PidNamespace>> {
>
> Given that this is within `Task`, the full name of the function became
> `Task::task_get_pid_ns`. So this can just be `get_pid_ns`?
Fair.
>
> > + let ptr = unsafe { bindings::task_get_pid_ns(self.0.get()) };
> > + if ptr.is_null() {
> > + None
> > + } else {
> > + // SAFETY: `ptr` is valid by the safety requirements of this function. And we own a
> > + // reference count via `task_get_pid_ns()`.
> > + // CAST: `Self` is a `repr(transparent)` wrapper around `bindings::pid_namespace`.
> > + Some(unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr.cast::<PidNamespace>())) })
> > + }
> > + }
> > +
> > + /// Returns the given task's pid in the provided pid namespace.
> > + pub fn task_tgid_nr_ns(&self, pidns: &PidNamespace) -> Pid {
>
> Similarly, this can drop `task_` prefix as it's already scoped to
> `Task`.
>
> PS. I think I quite like the more descriptive name in Alice's patch,
> maybe `Task::tgid_in_ns` could be a good name for this?
I'm not found of the "in" part. tgid_nr_ns() is fine with me.
>
> If there's concern about documentation searchability, there is a
> feature in rustdoc where you can put
>
> #[doc(alias = "task_tgid_nr_ns")]
>
> and then the function will be searchable with the C name.
Sounds good.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2] rust: add PidNamespace
2024-10-02 10:14 ` Christian Brauner
@ 2024-10-02 11:08 ` Miguel Ojeda
0 siblings, 0 replies; 52+ messages in thread
From: Miguel Ojeda @ 2024-10-02 11:08 UTC (permalink / raw)
To: Christian Brauner
Cc: Stephen Rothwell, Alice Ryhl, rust-for-linux, Paul Moore,
James Morris, Serge E. Hallyn, Miguel Ojeda, Alex Gaynor,
Wedson Almeida Filho, Boqun Feng, Bjoern Roy Baron, Benno Lossin,
Peter Zijlstra, Alexander Viro, Greg Kroah-Hartman,
Arve Hjonnevag, Todd Kjos, Martijn Coenen, Joel Fernandes,
Carlos Llamas, Suren Baghdasaryan, Dan Williams, Matthew Wilcox,
Thomas Gleixner, Daniel Xu, Martin Rodriguez Reboredo,
Trevor Gross, linux-kernel, linux-security-module, linux-fsdevel,
Kees Cook, Andreas Hindborg
On Wed, Oct 2, 2024 at 12:14 PM Christian Brauner <brauner@kernel.org> wrote:
>
> You could consider adding a way to turn it off then instead of turning
> it on.
>
> Imho, since Rust enforces code formatting style I see no point in not
> immediately failing the build because of formatting issues.
For maintainers, it would be better if we could unconditionally do it,
but like with other diagnostics, it is a balance.
If there is a way out (like something like `WERROR` or perhaps a "dev
mode" like `make D=1` that could encompass other bits), then I think
it should be OK. Any preference?
(We also need to be careful about `rustfmt` having e.g. bugs in future
versions that change the output, but since we are in the Rust CI and
we can test the nightly compiler, the risk should be low.)
Cheers,
Miguel
^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2024-10-02 11:08 UTC | newest]
Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-15 14:31 [PATCH v10 0/8] File abstractions needed by Rust Binder Alice Ryhl
2024-09-15 14:31 ` [PATCH v10 1/8] rust: types: add `NotThreadSafe` Alice Ryhl
2024-09-15 15:38 ` Gary Guo
2024-09-27 11:21 ` Miguel Ojeda
2024-09-24 19:45 ` Serge E. Hallyn
2024-09-25 11:06 ` Alice Ryhl
2024-09-25 13:59 ` Serge E. Hallyn
2024-09-27 10:20 ` Gary Guo
2024-09-15 14:31 ` [PATCH v10 2/8] rust: task: add `Task::current_raw` Alice Ryhl
2024-09-15 14:31 ` [PATCH v10 3/8] rust: file: add Rust abstraction for `struct file` Alice Ryhl
2024-09-15 21:51 ` Gary Guo
2024-09-15 14:31 ` [PATCH v10 4/8] rust: cred: add Rust abstraction for `struct cred` Alice Ryhl
2024-09-15 20:24 ` Kees Cook
2024-09-15 20:55 ` Alice Ryhl
2024-09-19 7:57 ` Paul Moore
2024-09-15 14:31 ` [PATCH v10 5/8] rust: security: add abstraction for secctx Alice Ryhl
2024-09-15 20:58 ` Kees Cook
2024-09-15 21:07 ` Alice Ryhl
2024-09-16 15:40 ` Casey Schaufler
2024-09-17 13:18 ` Paul Moore
2024-09-22 15:01 ` Alice Ryhl
2024-09-22 15:08 ` Alice Ryhl
2024-09-22 16:50 ` Casey Schaufler
2024-09-22 17:04 ` Alice Ryhl
2024-09-19 7:56 ` Paul Moore
2024-09-15 14:31 ` [PATCH v10 6/8] rust: file: add `FileDescriptorReservation` Alice Ryhl
2024-09-15 18:39 ` Al Viro
2024-09-15 19:34 ` Al Viro
2024-09-16 4:18 ` Al Viro
2024-09-15 20:13 ` Alice Ryhl
2024-09-15 22:01 ` Al Viro
2024-09-15 22:05 ` Al Viro
2024-09-15 14:31 ` [PATCH v10 7/8] rust: file: add `Kuid` wrapper Alice Ryhl
2024-09-15 22:02 ` Gary Guo
2024-09-23 9:13 ` Alice Ryhl
2024-09-26 16:33 ` Christian Brauner
2024-09-26 16:35 ` [PATCH] [RFC] rust: add PidNamespace wrapper Christian Brauner
2024-09-27 12:04 ` Alice Ryhl
2024-09-27 14:21 ` Christian Brauner
2024-09-27 14:58 ` Alice Ryhl
2024-10-01 9:43 ` [PATCH v2] rust: add PidNamespace Christian Brauner
2024-10-01 10:26 ` Alice Ryhl
2024-10-01 14:17 ` Christian Brauner
2024-10-01 15:45 ` Miguel Ojeda
2024-10-02 10:14 ` Christian Brauner
2024-10-02 11:08 ` Miguel Ojeda
2024-10-01 19:10 ` Gary Guo
2024-10-02 11:05 ` Christian Brauner
2024-09-15 14:31 ` [PATCH v10 8/8] rust: file: add abstraction for `poll_table` Alice Ryhl
2024-09-15 22:24 ` Gary Guo
2024-09-23 9:10 ` Alice Ryhl
2024-09-27 9:28 ` [PATCH v10 0/8] File abstractions needed by Rust Binder Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).