* [PATCH v19 7/8] rust: Add `OwnableRefCounted`
From: Andreas Hindborg @ 2026-06-26 11:54 UTC (permalink / raw)
To: Danilo Krummrich, Lorenzo Stoakes, Vlastimil Babka,
Liam R. Howlett, Uladzislau Rezki, Miguel Ojeda, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Alice Ryhl,
Trevor Gross, Daniel Almeida, Tamir Duberstein, Alexandre Courbot,
Onur Özkan, Lyude Paul, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas, Rafael J. Wysocki, Dave Ertman, Ira Weiny,
Leon Romanovsky, Paul Moore, Serge Hallyn, David Airlie,
Simona Vetter, Alexander Viro, Jan Kara, Igor Korotin,
Viresh Kumar, Nishanth Menon, Stephen Boyd, Bjorn Helgaas,
Krzysztof Wilczyński, Pavel Tikhomirov, Michal Wilczynski
Cc: Andreas Hindborg, Philipp Stanner, rust-for-linux, linux-kernel,
linux-mm, driver-core, linux-block, linux-security-module,
dri-devel, linux-fsdevel, linux-pm, linux-pci, linux-pwm,
Oliver Mangold
In-Reply-To: <20260626-unique-ref-v19-0-2607ca88dfdf@kernel.org>
From: Oliver Mangold <oliver.mangold@pm.me>
Types implementing one of these traits can safely convert between an
`ARef<T>` and an `Owned<T>`.
This is useful for types which generally are accessed through an `ARef`
but have methods which can only safely be called when the reference is
unique, like e.g. `block::mq::Request::end_ok()`.
Signed-off-by: Oliver Mangold <oliver.mangold@pm.me>
[ Andreas: Fix formatting, update documentation, fix error handling in
examples. ]
Co-developed-by: Andreas Hindborg <a.hindborg@kernel.org>
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/kernel/owned.rs | 145 +++++++++++++++++++++++++++++++++++++++++++++--
rust/kernel/sync/aref.rs | 16 +++++-
rust/kernel/types.rs | 1 +
3 files changed, 156 insertions(+), 6 deletions(-)
diff --git a/rust/kernel/owned.rs b/rust/kernel/owned.rs
index a156267bf8bb1..acb611f084ff3 100644
--- a/rust/kernel/owned.rs
+++ b/rust/kernel/owned.rs
@@ -14,20 +14,26 @@
pin::Pin,
ptr::NonNull, //
};
+use kernel::{
+ sync::aref::ARef,
+ types::RefCounted, //
+};
use kernel::types::ForeignOwnable;
/// Types that specify their own way of performing allocation and destruction. Typically, this trait
/// is implemented on types from the C side.
///
-/// Implementing this trait allows types to be referenced via the [`Owned<Self>`] pointer type. This
-/// is useful when it is desirable to tie the lifetime of the reference to an owned object, rather
-/// than pass around a bare reference. [`Ownable`] types can define custom drop logic that is
-/// executed when the owned reference [`Owned<Self>`] pointing to the object is dropped.
+/// Implementing this trait allows types to be referenced via the [`Owned<Self>`] pointer type.
+/// - This is useful when it is desirable to tie the lifetime of an object reference to an owned
+/// object, rather than pass around a bare reference.
+/// - [`Ownable`] types can define custom drop logic that is executed when the owned reference
+/// of type [`Owned<_>`] pointing to the object is dropped.
///
/// Note: The underlying object is not required to provide internal reference counting, because it
/// represents a unique, owned reference. If reference counting (on the Rust side) is required,
-/// [`RefCounted`](crate::types::RefCounted) should be implemented.
+/// [`RefCounted`] should be implemented. [`OwnableRefCounted`] should be implemented if conversion
+/// between unique and shared (reference counted) ownership is needed.
///
/// # Examples
///
@@ -99,6 +105,8 @@ pub trait Ownable {
/// Callers must ensure that they have exclusive ownership of the `Self` pointed to by `this`,
/// and that this ownership is transferred to the `release` method. `this` must not be used
/// after calling this method, as the underlying object may have been freed.
+ ///
+ /// `this` is pinned and implementers of this method must observe this constraint.
unsafe fn release(this: NonNull<Self>);
}
@@ -136,6 +144,8 @@ pub unsafe fn from_raw(ptr: NonNull<T>) -> Self {
///
/// This function does not drop the underlying `T`. When this function returns, ownership of the
/// underlying `T` is with the caller.
+ ///
+ /// Note that the returned pointer is pinned.
#[inline]
pub fn into_raw(me: Self) -> NonNull<T> {
ManuallyDrop::new(me).ptr
@@ -236,3 +246,128 @@ unsafe fn borrow_mut<'a>(ptr: *mut kernel::ffi::c_void) -> Self::BorrowedMut<'a>
unsafe { Pin::new_unchecked(inner) }
}
}
+
+/// A trait for objects that can be wrapped in either one of the reference types [`Owned`] and
+/// [`ARef`].
+///
+/// # Examples
+///
+/// A minimal example implementation of [`OwnableRefCounted`], [`Ownable`] and its usage with
+/// [`ARef`] and [`Owned`] looks like this:
+///
+/// ```
+/// # #![expect(clippy::disallowed_names)]
+/// # use core::cell::Cell;
+/// # use core::ptr::NonNull;
+/// # use kernel::alloc::{flags, kbox::KBox, AllocError};
+/// # use kernel::sync::aref::{ARef, RefCounted};
+/// # use kernel::types::{Owned, Ownable, OwnableRefCounted};
+///
+/// // An internally refcounted struct for demonstration purposes.
+/// //
+/// // # Invariants
+/// //
+/// // - `refcount` is always non-zero for a valid object.
+/// // - `refcount` is >1 if there is more than one Rust reference to it.
+/// //
+/// struct Foo {
+/// refcount: Cell<usize>,
+/// }
+///
+/// impl Foo {
+/// fn new() -> Result<Owned<Self>> {
+/// // We are just using a `KBox` here to handle the actual allocation, as our `Foo` is
+/// // not actually a C-allocated object.
+/// // INVARIANT: We initialize `refcount` to 1, satisfying the invariants.
+/// let result = KBox::new(
+/// Foo {
+/// refcount: Cell::new(1),
+/// },
+/// flags::GFP_KERNEL,
+/// )?;
+/// let result = KBox::into_non_null(result);
+/// // SAFETY:
+/// // - We just allocated the `Self`, thus it is valid and we own it.
+/// // - We can transfer this ownership to the `from_raw` method.
+/// Ok(unsafe { Owned::from_raw(result) })
+/// }
+/// }
+///
+/// // SAFETY: We increment and decrement each time the respective function is called and only free
+/// // the `Foo` when the refcount reaches zero.
+/// unsafe impl RefCounted for Foo {
+/// fn inc_ref(&self) {
+/// self.refcount.replace(self.refcount.get() + 1);
+/// }
+///
+/// unsafe fn dec_ref(this: NonNull<Self>) {
+/// // SAFETY: By requirement on calling this function, the refcount is non-zero,
+/// // implying the underlying object is valid.
+/// let refcount = unsafe { &this.as_ref().refcount };
+/// let new_refcount = refcount.get() - 1;
+/// if new_refcount == 0 {
+/// // The `Foo` will be dropped when `KBox` goes out of scope.
+/// // SAFETY: The [`KBox<Foo>`] is still alive as the old refcount is 1. We can pass
+/// // ownership to the [`KBox`] as by requirement on calling this function,
+/// // the `Self` will no longer be used by the caller.
+/// unsafe { KBox::from_raw(this.as_ptr()) };
+/// } else {
+/// refcount.replace(new_refcount);
+/// }
+/// }
+/// }
+///
+/// impl OwnableRefCounted for Foo {
+/// fn try_from_shared(this: ARef<Self>) -> Result<Owned<Self>, ARef<Self>> {
+/// if this.refcount.get() == 1 {
+/// // SAFETY: The `Foo` is still alive and has no other Rust references as the refcount
+/// // is 1.
+/// Ok(unsafe { Owned::from_raw(ARef::into_raw(this)) })
+/// } else {
+/// Err(this)
+/// }
+/// }
+///
+/// fn into_shared(this: Owned<Self>) -> ARef<Self> {
+/// // SAFETY: An `Owned<Foo>` holds the unique reference (refcount 1), which we transfer to
+/// // the new `ARef`.
+/// unsafe { ARef::from_raw(Owned::into_raw(this)) }
+/// }
+/// }
+///
+/// impl Ownable for Foo {
+/// unsafe fn release(this: NonNull<Self>) {
+/// // SAFETY: Using `dec_ref()` from [`RefCounted`] to release is okay, as the refcount is
+/// // always 1 for an [`Owned<Foo>`].
+/// unsafe { Foo::dec_ref(this) };
+/// }
+/// }
+///
+/// let foo = Foo::new()?;
+/// let foo = ARef::from(foo);
+/// {
+/// let bar = foo.clone();
+/// assert!(Owned::try_from(bar).is_err());
+/// }
+/// assert!(Owned::try_from(foo).is_ok());
+/// # Ok::<(), Error>(())
+/// ```
+pub trait OwnableRefCounted: RefCounted + Ownable + Sized {
+ /// Checks if the [`ARef`] is unique and converts it to an [`Owned`] if that is the case.
+ /// Otherwise it returns again an [`ARef`] to the same underlying object.
+ fn try_from_shared(this: ARef<Self>) -> Result<Owned<Self>, ARef<Self>>;
+
+ /// Converts the [`Owned`] into an [`ARef`].
+ fn into_shared(this: Owned<Self>) -> ARef<Self>;
+}
+
+impl<T: OwnableRefCounted> TryFrom<ARef<T>> for Owned<T> {
+ type Error = ARef<T>;
+ /// Tries to convert the [`ARef`] to an [`Owned`] by calling
+ /// [`try_from_shared()`](OwnableRefCounted::try_from_shared). In case the [`ARef`] is not
+ /// unique, it returns again an [`ARef`] to the same underlying object.
+ #[inline]
+ fn try_from(b: ARef<T>) -> Result<Owned<T>, Self::Error> {
+ T::try_from_shared(b)
+ }
+}
diff --git a/rust/kernel/sync/aref.rs b/rust/kernel/sync/aref.rs
index f26ca39b84d0d..e6ffe7c650c39 100644
--- a/rust/kernel/sync/aref.rs
+++ b/rust/kernel/sync/aref.rs
@@ -23,6 +23,10 @@
ops::Deref,
ptr::NonNull, //
};
+use kernel::types::{
+ OwnableRefCounted,
+ Owned, //
+};
/// Types that are internally reference counted.
///
@@ -35,7 +39,10 @@
/// Note: Implementing this trait allows types to be wrapped in an [`ARef<Self>`]. It requires an
/// internal reference count and provides only shared references. If unique references are required
/// [`Ownable`](crate::types::Ownable) should be implemented which allows types to be wrapped in an
-/// [`Owned<Self>`](crate::types::Owned).
+/// [`Owned<Self>`](crate::types::Owned). Implementing the trait
+/// [`OwnableRefCounted`] allows to convert between unique and
+/// shared references (i.e. [`Owned<Self>`](crate::types::Owned) and
+/// [`ARef<Self>`]).
///
/// # Safety
///
@@ -188,6 +195,13 @@ fn from(b: &T) -> Self {
}
}
+impl<T: OwnableRefCounted> From<Owned<T>> for ARef<T> {
+ #[inline]
+ fn from(b: Owned<T>) -> Self {
+ T::into_shared(b)
+ }
+}
+
impl<T: RefCounted> Drop for ARef<T> {
fn drop(&mut self) {
// SAFETY: The type invariants guarantee that the `ARef` owns the reference we're about to
diff --git a/rust/kernel/types.rs b/rust/kernel/types.rs
index 5ef763717e59a..6aa760952cb63 100644
--- a/rust/kernel/types.rs
+++ b/rust/kernel/types.rs
@@ -18,6 +18,7 @@
pub use crate::{
owned::{
Ownable,
+ OwnableRefCounted,
Owned, //
},
sync::aref::{
--
2.51.2
^ permalink raw reply related
* [PATCH v19 4/8] rust: page: convert to `Ownable`
From: Andreas Hindborg @ 2026-06-26 11:54 UTC (permalink / raw)
To: Danilo Krummrich, Lorenzo Stoakes, Vlastimil Babka,
Liam R. Howlett, Uladzislau Rezki, Miguel Ojeda, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Alice Ryhl,
Trevor Gross, Daniel Almeida, Tamir Duberstein, Alexandre Courbot,
Onur Özkan, Lyude Paul, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas, Rafael J. Wysocki, Dave Ertman, Ira Weiny,
Leon Romanovsky, Paul Moore, Serge Hallyn, David Airlie,
Simona Vetter, Alexander Viro, Jan Kara, Igor Korotin,
Viresh Kumar, Nishanth Menon, Stephen Boyd, Bjorn Helgaas,
Krzysztof Wilczyński, Pavel Tikhomirov, Michal Wilczynski
Cc: Andreas Hindborg, Philipp Stanner, rust-for-linux, linux-kernel,
linux-mm, driver-core, linux-block, linux-security-module,
dri-devel, linux-fsdevel, linux-pm, linux-pci, linux-pwm,
Asahi Lina
In-Reply-To: <20260626-unique-ref-v19-0-2607ca88dfdf@kernel.org>
From: Asahi Lina <lina@asahilina.net>
This allows Page references to be returned as borrowed references,
without necessarily owning the struct page.
Remove `BorrowedPage` and update users to use `Owned<Page>`.
Signed-off-by: Asahi Lina <lina@asahilina.net>
[ Andreas: Fix formatting and add a safety comment, update users. ]
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
---
drivers/android/binder/page_range.rs | 10 +--
rust/kernel/alloc/allocator.rs | 19 +++---
rust/kernel/alloc/allocator/iter.rs | 6 +-
rust/kernel/page.rs | 122 +++++++++--------------------------
4 files changed, 46 insertions(+), 111 deletions(-)
diff --git a/drivers/android/binder/page_range.rs b/drivers/android/binder/page_range.rs
index e54a90e62402a..7941eb85b4ef4 100644
--- a/drivers/android/binder/page_range.rs
+++ b/drivers/android/binder/page_range.rs
@@ -33,7 +33,7 @@
sync::{aref::ARef, Mutex, SpinLock},
task::Pid,
transmute::FromBytes,
- types::Opaque,
+ types::{Opaque, Owned},
uaccess::UserSliceReader,
};
@@ -198,7 +198,7 @@ unsafe impl Send for Inner {}
#[repr(C)]
struct PageInfo {
lru: bindings::list_head,
- page: Option<Page>,
+ page: Option<Owned<Page>>,
range: *const ShrinkablePageRange,
}
@@ -206,7 +206,7 @@ impl PageInfo {
/// # Safety
///
/// The caller ensures that writing to `me.page` is ok, and that the page is not currently set.
- unsafe fn set_page(me: *mut PageInfo, page: Page) {
+ unsafe fn set_page(me: *mut PageInfo, page: Owned<Page>) {
// SAFETY: This pointer offset is in bounds.
let ptr = unsafe { &raw mut (*me).page };
@@ -229,13 +229,13 @@ unsafe fn get_page<'a>(me: *const PageInfo) -> Option<&'a Page> {
let ptr = unsafe { &raw const (*me).page };
// SAFETY: The pointer is valid for reading.
- unsafe { (*ptr).as_ref() }
+ unsafe { (*ptr).as_deref() }
}
/// # Safety
///
/// The caller ensures that writing to `me.page` is ok for the duration of 'a.
- unsafe fn take_page(me: *mut PageInfo) -> Option<Page> {
+ unsafe fn take_page(me: *mut PageInfo) -> Option<Owned<Page>> {
// SAFETY: This pointer offset is in bounds.
let ptr = unsafe { &raw mut (*me).page };
diff --git a/rust/kernel/alloc/allocator.rs b/rust/kernel/alloc/allocator.rs
index cd4203f27aed0..c7b9b069cf75d 100644
--- a/rust/kernel/alloc/allocator.rs
+++ b/rust/kernel/alloc/allocator.rs
@@ -169,7 +169,7 @@ unsafe fn realloc(
}
impl Vmalloc {
- /// Convert a pointer to a [`Vmalloc`] allocation to a [`page::BorrowedPage`].
+ /// Convert a pointer to a [`Vmalloc`] allocation to a [`Page`](page::Page) reference.
///
/// # Examples
///
@@ -202,20 +202,17 @@ impl Vmalloc {
///
/// - `ptr` must be a valid pointer to a [`Vmalloc`] allocation.
/// - `ptr` must remain valid for the entire duration of `'a`.
- pub unsafe fn to_page<'a>(ptr: NonNull<u8>) -> page::BorrowedPage<'a> {
+ pub unsafe fn to_page<'a>(ptr: NonNull<u8>) -> &'a page::Page {
// SAFETY: `ptr` is a valid pointer to `Vmalloc` memory.
let page = unsafe { bindings::vmalloc_to_page(ptr.as_ptr().cast()) };
- // SAFETY: `vmalloc_to_page` returns a valid pointer to a `struct page` for a valid pointer
- // to `Vmalloc` memory.
- let page = unsafe { NonNull::new_unchecked(page) };
-
// SAFETY:
- // - `page` is a valid pointer to a `struct page`, given that by the safety requirements of
- // this function `ptr` is a valid pointer to a `Vmalloc` allocation.
- // - By the safety requirements of this function `ptr` is valid for the entire lifetime of
- // `'a`.
- unsafe { page::BorrowedPage::from_raw(page) }
+ // - `vmalloc_to_page` returns a valid, non-null pointer to a `struct page` for a valid
+ // pointer to `Vmalloc` memory, given that by the safety requirements of this function
+ // `ptr` is a valid pointer to a `Vmalloc` allocation.
+ // - By the safety requirements of this function `ptr`, and hence the `struct page`, is
+ // valid for the entire lifetime of `'a`.
+ unsafe { &*page.cast() }
}
}
diff --git a/rust/kernel/alloc/allocator/iter.rs b/rust/kernel/alloc/allocator/iter.rs
index 02fda3ea5cae6..8dcc16ed89893 100644
--- a/rust/kernel/alloc/allocator/iter.rs
+++ b/rust/kernel/alloc/allocator/iter.rs
@@ -9,7 +9,7 @@
ptr::NonNull, //
};
-/// An [`Iterator`] of [`page::BorrowedPage`] items owned by a [`Vmalloc`] allocation.
+/// An [`Iterator`] of [`Page`](page::Page) references owned by a [`Vmalloc`] allocation.
///
/// # Guarantees
///
@@ -28,11 +28,11 @@ pub struct VmallocPageIter<'a> {
size: usize,
/// The current page index of the [`Iterator`].
index: usize,
- _p: PhantomData<page::BorrowedPage<'a>>,
+ _p: PhantomData<&'a page::Page>,
}
impl<'a> Iterator for VmallocPageIter<'a> {
- type Item = page::BorrowedPage<'a>;
+ type Item = &'a page::Page;
fn next(&mut self) -> Option<Self::Item> {
let offset = self.index.checked_mul(page::PAGE_SIZE)?;
diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
index 8affd8262891b..6dc1c2395acaf 100644
--- a/rust/kernel/page.rs
+++ b/rust/kernel/page.rs
@@ -12,16 +12,16 @@
code::*,
Result, //
},
+ types::{
+ Opaque,
+ Ownable,
+ Owned, //
+ },
uaccess::UserSliceReader, //
};
-use core::{
- marker::PhantomData,
- mem::ManuallyDrop,
- ops::Deref,
- ptr::{
- self,
- NonNull, //
- }, //
+use core::ptr::{
+ self,
+ NonNull, //
};
/// A bitwise shift for the page size.
@@ -65,93 +65,29 @@ pub const fn page_align(addr: usize) -> Option<usize> {
Some(sum & PAGE_MASK)
}
-/// Representation of a non-owning reference to a [`Page`].
-///
-/// This type provides a borrowed version of a [`Page`] that is owned by some other entity, e.g. a
-/// [`Vmalloc`] allocation such as [`VBox`].
-///
-/// # Example
-///
-/// ```
-/// # use kernel::{bindings, prelude::*};
-/// use kernel::page::{BorrowedPage, Page, PAGE_SIZE};
-/// # use core::{mem::MaybeUninit, ptr, ptr::NonNull };
-///
-/// fn borrow_page<'a>(vbox: &'a mut VBox<MaybeUninit<[u8; PAGE_SIZE]>>) -> BorrowedPage<'a> {
-/// let ptr = ptr::from_ref(&**vbox);
-///
-/// // SAFETY: `ptr` is a valid pointer to `Vmalloc` memory.
-/// let page = unsafe { bindings::vmalloc_to_page(ptr.cast()) };
-///
-/// // SAFETY: `vmalloc_to_page` returns a valid pointer to a `struct page` for a valid
-/// // pointer to `Vmalloc` memory.
-/// let page = unsafe { NonNull::new_unchecked(page) };
-///
-/// // SAFETY:
-/// // - `self.0` is a valid pointer to a `struct page`.
-/// // - `self.0` is valid for the entire lifetime of `self`.
-/// unsafe { BorrowedPage::from_raw(page) }
-/// }
-///
-/// let mut vbox = VBox::<[u8; PAGE_SIZE]>::new_uninit(GFP_KERNEL)?;
-/// let page = borrow_page(&mut vbox);
-///
-/// // SAFETY: There is no concurrent read or write to this page.
-/// unsafe { page.fill_zero_raw(0, PAGE_SIZE)? };
-/// # Ok::<(), Error>(())
-/// ```
-///
-/// # Invariants
-///
-/// The borrowed underlying pointer to a `struct page` is valid for the entire lifetime `'a`.
-///
-/// [`VBox`]: kernel::alloc::VBox
-/// [`Vmalloc`]: kernel::alloc::allocator::Vmalloc
-pub struct BorrowedPage<'a>(ManuallyDrop<Page>, PhantomData<&'a Page>);
-
-impl<'a> BorrowedPage<'a> {
- /// Constructs a [`BorrowedPage`] from a raw pointer to a `struct page`.
- ///
- /// # Safety
- ///
- /// - `ptr` must point to a valid `bindings::page`.
- /// - `ptr` must remain valid for the entire lifetime `'a`.
- pub unsafe fn from_raw(ptr: NonNull<bindings::page>) -> Self {
- let page = Page { page: ptr };
-
- // INVARIANT: The safety requirements guarantee that `ptr` is valid for the entire lifetime
- // `'a`.
- Self(ManuallyDrop::new(page), PhantomData)
- }
-}
-
-impl<'a> Deref for BorrowedPage<'a> {
- type Target = Page;
-
- fn deref(&self) -> &Self::Target {
- &self.0
- }
-}
-
-/// Trait to be implemented by types which provide an [`Iterator`] implementation of
-/// [`BorrowedPage`] items, such as [`VmallocPageIter`](kernel::alloc::allocator::VmallocPageIter).
+/// Trait to be implemented by types which provide an [`Iterator`] of [`Page`] references, such as
+/// [`VmallocPageIter`](kernel::alloc::allocator::VmallocPageIter).
pub trait AsPageIter {
/// The [`Iterator`] type, e.g. [`VmallocPageIter`](kernel::alloc::allocator::VmallocPageIter).
- type Iter<'a>: Iterator<Item = BorrowedPage<'a>>
+ type Iter<'a>: Iterator<Item = &'a Page>
where
Self: 'a;
- /// Returns an [`Iterator`] of [`BorrowedPage`] items over all pages owned by `self`.
+ /// Returns an [`Iterator`] of [`Page`] references over all pages owned by `self`.
fn page_iter(&mut self) -> Self::Iter<'_>;
}
-/// A pointer to a page that owns the page allocation.
+/// A `struct page`.
+///
+/// A `Page` is accessed through a shared reference or through an owning [`Owned<Page>`]; the latter
+/// frees the page allocation when it is dropped.
///
/// # Invariants
///
-/// The pointer is valid, and has ownership over the page.
+/// The `Page` is backed by a valid `struct page`.
+#[repr(transparent)]
pub struct Page {
- page: NonNull<bindings::page>,
+ page: Opaque<bindings::page>,
}
// SAFETY: Pages have no logic that relies on them staying on a given thread, so moving them across
@@ -185,19 +121,20 @@ impl Page {
/// # Ok::<(), kernel::alloc::AllocError>(())
/// ```
#[inline]
- pub fn alloc_page(flags: Flags) -> Result<Self, AllocError> {
+ pub fn alloc_page(flags: Flags) -> Result<Owned<Self>, AllocError> {
// SAFETY: Depending on the value of `gfp_flags`, this call may sleep. Other than that, it
// is always safe to call this method.
let page = unsafe { bindings::alloc_pages(flags.as_raw(), 0) };
let page = NonNull::new(page).ok_or(AllocError)?;
- // INVARIANT: We just successfully allocated a page, so we now have ownership of the newly
- // allocated page. We transfer that ownership to the new `Page` object.
- Ok(Self { page })
+ // SAFETY: We just successfully allocated a page, so we now have ownership of the newly
+ // allocated page. We transfer that ownership to the new `Owned<Page>` object.
+ // Since `Page` is transparent, we can cast the pointer directly.
+ Ok(unsafe { Owned::from_raw(page.cast()) })
}
/// Returns a raw pointer to the page.
pub fn as_ptr(&self) -> *mut bindings::page {
- self.page.as_ptr()
+ Opaque::cast_into(&self.page)
}
/// Get the node id containing this page.
@@ -372,10 +309,11 @@ pub unsafe fn copy_from_user_slice_raw(
}
}
-impl Drop for Page {
+impl Ownable for Page {
#[inline]
- fn drop(&mut self) {
- // SAFETY: By the type invariants, we have ownership of the page and can free it.
- unsafe { bindings::__free_pages(self.page.as_ptr(), 0) };
+ unsafe fn release(this: NonNull<Self>) {
+ // SAFETY: By the function safety requirements, we have ownership of the page and can free
+ // it. Since Page is transparent, we can cast the raw pointer directly.
+ unsafe { bindings::__free_pages(this.as_ptr().cast(), 0) };
}
}
--
2.51.2
^ permalink raw reply related
* [PATCH v19 1/8] rust: alloc: add `KBox::into_non_null`
From: Andreas Hindborg @ 2026-06-26 11:53 UTC (permalink / raw)
To: Danilo Krummrich, Lorenzo Stoakes, Vlastimil Babka,
Liam R. Howlett, Uladzislau Rezki, Miguel Ojeda, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Alice Ryhl,
Trevor Gross, Daniel Almeida, Tamir Duberstein, Alexandre Courbot,
Onur Özkan, Lyude Paul, Greg Kroah-Hartman,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas, Rafael J. Wysocki, Dave Ertman, Ira Weiny,
Leon Romanovsky, Paul Moore, Serge Hallyn, David Airlie,
Simona Vetter, Alexander Viro, Jan Kara, Igor Korotin,
Viresh Kumar, Nishanth Menon, Stephen Boyd, Bjorn Helgaas,
Krzysztof Wilczyński, Pavel Tikhomirov, Michal Wilczynski
Cc: Andreas Hindborg, Philipp Stanner, rust-for-linux, linux-kernel,
linux-mm, driver-core, linux-block, linux-security-module,
dri-devel, linux-fsdevel, linux-pm, linux-pci, linux-pwm
In-Reply-To: <20260626-unique-ref-v19-0-2607ca88dfdf@kernel.org>
Add a method to consume a `Box<T, A>` and return a `NonNull<T>`. This
is a convenience wrapper around `Self::into_raw` for callers that need
a `NonNull` pointer rather than a raw pointer.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
---
rust/kernel/alloc/kbox.rs | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/rust/kernel/alloc/kbox.rs b/rust/kernel/alloc/kbox.rs
index 35d1e015848dd..d534e8adcf7b3 100644
--- a/rust/kernel/alloc/kbox.rs
+++ b/rust/kernel/alloc/kbox.rs
@@ -211,6 +211,15 @@ pub fn leak<'a>(b: Self) -> &'a mut T {
// which points to an initialized instance of `T`.
unsafe { &mut *Box::into_raw(b) }
}
+
+ /// Consumes the `Box<T,A>` and returns a `NonNull<T>`.
+ ///
+ /// Like [`Self::into_raw`], but returns a `NonNull`.
+ #[inline]
+ pub fn into_non_null(b: Self) -> NonNull<T> {
+ // SAFETY: `KBox::into_raw` returns a valid pointer.
+ unsafe { NonNull::new_unchecked(Self::into_raw(b)) }
+ }
}
impl<T, A> Box<MaybeUninit<T>, A>
--
2.51.2
^ permalink raw reply related
* Re: [RFC PATCH 1/4] capabily: Add new capable_noaudit
From: Darrick J. Wong @ 2026-06-26 15:16 UTC (permalink / raw)
To: cem
Cc: linux-fsdevel, jack, hch, serge, linux-security-module,
linux-kernel, linux-xfs
In-Reply-To: <20260626114533.102138-2-cem@kernel.org>
s/capabily/capability/ in the subject even if the typo actually makes it
easier to find the thread.
On Fri, Jun 26, 2026 at 01:45:20PM +0200, cem@kernel.org wrote:
> From: Carlos Maiolino <cem@kernel.org>
>
> In some situations (quota enforcement bypass in this case) we'd like to
> check for a specific capability without triggering spurious audit
> messages from security modules like selinux.
>
> Add a new helper so we don't need to use ns_capable_noaudit() directly.
>
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
> include/linux/capability.h | 5 +++++
> kernel/capability.c | 17 +++++++++++++++++
> 2 files changed, 22 insertions(+)
>
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index 37db92b3d6f8..873416ba884c 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -145,6 +145,7 @@ extern bool has_capability_noaudit(struct task_struct *t, int cap);
> extern bool has_ns_capability_noaudit(struct task_struct *t,
> struct user_namespace *ns, int cap);
> extern bool capable(int cap);
> +extern bool capable_noaudit(int cap);
> extern bool ns_capable(struct user_namespace *ns, int cap);
> extern bool ns_capable_noaudit(struct user_namespace *ns, int cap);
> extern bool ns_capable_setid(struct user_namespace *ns, int cap);
> @@ -167,6 +168,10 @@ static inline bool capable(int cap)
> {
> return true;
> }
> +static inline bool capable_noaudit(int cap)
> +{
> + return true;
> +}
> static inline bool ns_capable(struct user_namespace *ns, int cap)
> {
> return true;
> diff --git a/kernel/capability.c b/kernel/capability.c
> index 829f49ae07b9..2c2d1e8300bd 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -416,6 +416,23 @@ bool capable(int cap)
> return ns_capable(&init_user_ns, cap);
> }
> EXPORT_SYMBOL(capable);
> +
> +/**
> + * capable_noaudit - Determine if the current task has a superior
> + * capability in effect (unaudited).
> + * @cap: The capability to be tested for
> + *
> + * This is the same as capable(), except it uses CAP_OPT_NOAUDIT as to prevent
> + * issuing spurious audit messages.
> + *
> + * This sets PF_SUPERPRIV on the task if the capability is available on the
> + * assumption that it's about to be used.
Can you mention that this checks the current process' effective
capabilities (as opposed to the real ones)? So that nobody else has to
suffer the confusion I pointed out in [1] which was the source of the
security bugs in the first place?
(I do like the wrapper though)
--D
[1] https://lore.kernel.org/linux-xfs/20260625160317.GY6078@frogsfrogsfrogs/
> + */
> +bool capable_noaudit(int cap)
> +{
> + return ns_capable_noaudit(&init_user_ns, cap);
> +}
> +EXPORT_SYMBOL(capable_noaudit);
> #endif /* CONFIG_MULTIUSER */
>
> /**
> --
> 2.54.0
>
>
^ permalink raw reply
* Re: [RFC PATCH 2/4] quota: Don't issue audit messages on quota enforcing
From: Darrick J. Wong @ 2026-06-26 15:18 UTC (permalink / raw)
To: cem
Cc: linux-fsdevel, jack, hch, serge, linux-security-module,
linux-kernel, linux-xfs
In-Reply-To: <20260626114533.102138-3-cem@kernel.org>
On Fri, Jun 26, 2026 at 01:45:21PM +0200, cem@kernel.org wrote:
> From: Carlos Maiolino <cem@kernel.org>
>
> Calling capable() to determine if we can bypass quota enforcement or not
> can trigger spurious audit messages. We don't really require it here so
> just use the capable_noaudit() version.
>
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
> fs/quota/dquot.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 64cf42721496..1122a29215f7 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -1308,7 +1308,7 @@ static int ignore_hardlimit(struct dquot *dquot)
> {
> struct mem_dqinfo *info = &sb_dqopt(dquot->dq_sb)->info[dquot->dq_id.type];
>
> - return capable(CAP_SYS_RESOURCE) &&
> + return capable_noaudit(CAP_SYS_RESOURCE) &&
Yeah, we're just checking if we're going to enforce hardlimits, not
actually denying something based on lack of capability. For all we know
the user is well under their disk quota limit.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> (info->dqi_format->qf_fmt_id != QFMT_VFS_OLD ||
> !(info->dqi_flags & DQF_ROOT_SQUASH));
> }
> --
> 2.54.0
>
>
^ permalink raw reply
* Re: [RFC PATCH 3/4] xfs: replace ns_capable_noaudit()
From: Darrick J. Wong @ 2026-06-26 15:19 UTC (permalink / raw)
To: cem
Cc: linux-fsdevel, jack, hch, serge, linux-security-module,
linux-kernel, linux-xfs
In-Reply-To: <20260626114533.102138-4-cem@kernel.org>
On Fri, Jun 26, 2026 at 01:45:22PM +0200, cem@kernel.org wrote:
> From: Carlos Maiolino <cem@kernel.org>
>
> We don't need to use ns_capable_noaudit() as all we care is the initial
> user namespace, use capable_noaudit() instead.
Might as well do the one in xfs_fsmap.c too, since it was originally a
capable() call.
--D
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
> fs/xfs/xfs_trans_dquot.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
> index 50e5b323f7f1..30c2f6ec0aac 100644
> --- a/fs/xfs/xfs_trans_dquot.c
> +++ b/fs/xfs/xfs_trans_dquot.c
> @@ -835,7 +835,7 @@ xfs_trans_dqresv(
> if ((flags & XFS_QMOPT_FORCE_RES) == 0 &&
> dqp->q_id &&
> xfs_dquot_is_enforced(dqp) &&
> - !ns_capable_noaudit(&init_user_ns, CAP_SYS_RESOURCE)) {
> + !capable_noaudit(CAP_SYS_RESOURCE)) {
> int quota_nl;
> bool fatal;
>
> --
> 2.54.0
>
>
^ permalink raw reply
* Re: [RFC PATCH 1/4] capabily: Add new capable_noaudit
From: Paul Moore @ 2026-06-26 15:31 UTC (permalink / raw)
To: cem
Cc: linux-fsdevel, jack, djwong, hch, serge, linux-security-module,
linux-kernel, linux-xfs
In-Reply-To: <20260626114533.102138-2-cem@kernel.org>
On Fri, Jun 26, 2026 at 7:49 AM <cem@kernel.org> wrote:
>
> From: Carlos Maiolino <cem@kernel.org>
>
> In some situations (quota enforcement bypass in this case) we'd like to
> check for a specific capability without triggering spurious audit
> messages from security modules like selinux.
>
> Add a new helper so we don't need to use ns_capable_noaudit() directly.
>
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
> include/linux/capability.h | 5 +++++
> kernel/capability.c | 17 +++++++++++++++++
> 2 files changed, 22 insertions(+)
This is Serge's call, not mine, but FWIW, I somewhat prefer to see
code use the ns_capable_XXX() variants directly as I like to think it
means some thought went into ensuring the capability check is being
done in the right namespace. Yes, we all know that capable() just
uses the init namespace, but I like to think that having to type that
out in the parameter list might be a good double check ;)
--
paul-moore.com
^ permalink raw reply
* Re: [RFC PATCH 1/4] capabily: Add new capable_noaudit
From: Serge E. Hallyn @ 2026-06-26 17:46 UTC (permalink / raw)
To: Paul Moore
Cc: cem, linux-fsdevel, jack, djwong, hch, serge,
linux-security-module, linux-kernel, linux-xfs
In-Reply-To: <CAHC9VhQNURc=d4AOVDF-z29fjLasCiLf120Y-N3txEBccpkfSA@mail.gmail.com>
On Fri, Jun 26, 2026 at 11:31:06AM -0400, Paul Moore wrote:
> On Fri, Jun 26, 2026 at 7:49 AM <cem@kernel.org> wrote:
> >
> > From: Carlos Maiolino <cem@kernel.org>
> >
> > In some situations (quota enforcement bypass in this case) we'd like to
> > check for a specific capability without triggering spurious audit
> > messages from security modules like selinux.
> >
> > Add a new helper so we don't need to use ns_capable_noaudit() directly.
> >
> > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> > ---
> > include/linux/capability.h | 5 +++++
> > kernel/capability.c | 17 +++++++++++++++++
> > 2 files changed, 22 insertions(+)
>
> This is Serge's call, not mine, but FWIW, I somewhat prefer to see
> code use the ns_capable_XXX() variants directly as I like to think it
> means some thought went into ensuring the capability check is being
> done in the right namespace. Yes, we all know that capable() just
> uses the init namespace, but I like to think that having to type that
> out in the parameter list might be a good double check ;)
Hm, yeah, on he one hand it seems like a nice shortcut, but I still
see people confusing what 'capable' really does, so standardizing on
ns_capable_noaudit(&init_user_ns, x) might be worthwhile.
(and then patch 3 can go)
^ permalink raw reply
* Re: [PATCH v2 stable/linux-6.18.y 0/2] Backport Fix incorrect overlayfs mmap() and mprotect() LSM access controls
From: Sasha Levin @ 2026-06-26 17:54 UTC (permalink / raw)
To: viro, brauner, jack, miklos, amir73il, paul, jmorris, serge,
stephen.smalley.work, omosnace, gregkh, bboscaccy, caixinchen1
Cc: Sasha Levin, linux-fsdevel, linux-kernel, linux-unionfs,
linux-security-module, selinux, bpf, stable, lujialin4
In-Reply-To: <20260626075035.143419-1-caixinchen1@huawei.com>
> [PATCH v2 stable/linux-6.18.y 0/2] Backport Fix incorrect overlayfs
> mmap() and mprotect() LSM access controls
Both patches queued for 6.18, thanks.
--
Thanks,
Sasha
^ permalink raw reply
* Re: [RFC PATCH 2/2] selftests/landlock: Add test for TCP fast open
From: Mickaël Salaün @ 2026-06-26 21:19 UTC (permalink / raw)
To: Matthieu Buffet
Cc: Bryam Vargas, Günther Noack, linux-security-module,
Mikhail Ivanov, Paul Moore, Eric Dumazet, Neal Cardwell,
linux-kernel, netdev
In-Reply-To: <20260617180526.15627-3-matthieu@buffet.re>
On Wed, Jun 17, 2026 at 08:05:24PM +0200, Matthieu Buffet wrote:
> Enforce that TCP Fast Open is controlled by
> LANDLOCK_ACCESS_NET_CONNECT_TCP. Semantics of connect() and
> sendmsg(MSG_FASTOPEN) should be identical from Landlock's perspective.
> Also enforce error code consistency, since UDP sockets ignore
> the MSG_FASTOPEN flag while Unix sockets reject it.
>
> Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
> ---
> tools/testing/selftests/landlock/net_test.c | 155 ++++++++++++++++++++
> 1 file changed, 155 insertions(+)
>
> diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
> index 0c256e7c8675..177ed28e70f6 100644
> --- a/tools/testing/selftests/landlock/net_test.c
> +++ b/tools/testing/selftests/landlock/net_test.c
> @@ -258,6 +258,64 @@ static int connect_variant(const int sock_fd,
> return connect_variant_addrlen(sock_fd, srv, get_addrlen(srv, false));
> }
>
> +static int sendto_variant_addrlen(const int sock_fd,
> + const struct service_fixture *const srv,
> + const socklen_t addrlen, void *buf,
> + size_t len, size_t flags)
> +{
> + const struct sockaddr *dst = NULL;
> + ssize_t ret;
> +
> + /*
> + * We never want our processes to be killed by SIGPIPE: we check return
> + * codes and errno, so that we have actual error messages.
> + */
There are some extra spaces above.
> + flags |= MSG_NOSIGNAL;
> +
> + if (srv != NULL) {
Just `if (srv) {`
> + switch (srv->protocol.domain) {
> + case AF_UNSPEC:
> + case AF_INET:
> + dst = (const struct sockaddr *)&srv->ipv4_addr;
> + break;
> +
> + case AF_INET6:
> + dst = (const struct sockaddr *)&srv->ipv6_addr;
> + break;
> +
> + case AF_UNIX:
> + dst = (const struct sockaddr *)&srv->unix_addr;
> + break;
> +
> + default:
> + errno = EAFNOSUPPORT;
> + return -errno;
> + }
> + }
> +
> + ret = sendto(sock_fd, buf, len, flags, dst, addrlen);
> + if (ret < 0)
> + return -errno;
> +
> + /* errno is not set in cases of partial writes. */
> + if (ret != len)
> + return -EINTR;
> +
> + return 0;
> +}
> +
> +static int sendto_variant(const int sock_fd,
> + const struct service_fixture *const srv, void *buf,
> + size_t len, size_t flags)
> +{
> + socklen_t addrlen = 0;
> +
> + if (srv != NULL)
ditto
> + addrlen = get_addrlen(srv, false);
> +
> + return sendto_variant_addrlen(sock_fd, srv, addrlen, buf, len, flags);
> +}
> +
> FIXTURE(protocol)
> {
> struct service_fixture srv0, srv1, srv2, unspec_any0, unspec_srv0;
> @@ -950,6 +1008,103 @@ TEST_F(protocol, connect_unspec)
> EXPECT_EQ(0, close(bind_fd));
> }
>
> +TEST_F(protocol, tcp_fastopen)
> +{
> + const bool restricted = variant->sandbox == TCP_SANDBOX &&
> + variant->prot.type == SOCK_STREAM &&
> + (variant->prot.protocol == IPPROTO_TCP ||
> + variant->prot.protocol == IPPROTO_IP) &&
> + (variant->prot.domain == AF_INET ||
> + variant->prot.domain == AF_INET6);
> + const struct landlock_ruleset_attr ruleset_attr = {
> + .handled_access_net = LANDLOCK_ACCESS_NET_CONNECT_TCP,
> + };
> + int bind_fd, client_fd, status;
> + char buf;
> + pid_t child;
> +
> + bind_fd = socket_variant(&self->srv0);
> + ASSERT_LE(0, bind_fd);
> + EXPECT_EQ(0, bind_variant(bind_fd, &self->srv0));
> + if (self->srv0.protocol.type == SOCK_STREAM)
> + EXPECT_EQ(0, listen(bind_fd, backlog));
> +
> + child = fork();
> + ASSERT_LE(0, child);
> + if (child == 0) {
> + int connect_fd, ret;
> +
> + /* Closes listening socket for the child. */
> + EXPECT_EQ(0, close(bind_fd));
> +
> + connect_fd = socket_variant(&self->srv0);
> + ASSERT_LE(0, connect_fd);
> +
> + if (variant->sandbox == TCP_SANDBOX) {
> + const int ruleset_fd = landlock_create_ruleset(
> + &ruleset_attr, sizeof(ruleset_attr), 0);
> + ASSERT_LE(0, ruleset_fd);
> +
> + enforce_ruleset(_metadata, ruleset_fd);
> + EXPECT_EQ(0, close(ruleset_fd));
> + }
> +
> + /* Fast Open with no address. */
> + ret = sendto_variant(connect_fd, NULL, NULL, 0, MSG_FASTOPEN);
> + if (self->srv0.protocol.domain == AF_UNIX) {
> + ASSERT_EQ(-ENOTCONN, ret);
> + } else if (self->srv0.protocol.type == SOCK_DGRAM) {
> + ASSERT_EQ(-EDESTADDRREQ, ret);
> + } else {
> + ASSERT_EQ(-EINVAL, ret);
> + }
> +
> + /* Fast Open to a denied address. */
> + ret = sendto_variant(connect_fd, &self->srv0, "A", 1,
> + MSG_FASTOPEN);
> + if (restricted) {
> + ASSERT_EQ(-EACCES, ret);
> + } else if (self->srv0.protocol.domain == AF_UNIX &&
> + self->srv0.protocol.type == SOCK_STREAM) {
> + ASSERT_EQ(-EOPNOTSUPP, ret);
> + } else {
> + ASSERT_EQ(0, ret);
> + }
All these ret checks should be done with EXPECT_EQ because they don't
block the test itself and we can get more info by checking more after
that.
> +
> + EXPECT_EQ(0, close(connect_fd));
> + _exit(_metadata->exit_code);
> + return;
> + }
> +
> + client_fd = bind_fd;
> + if (!restricted && self->srv0.protocol.type == SOCK_STREAM &&
> + self->srv0.protocol.domain != AF_UNIX) {
> + client_fd = accept(bind_fd, NULL, 0);
> + ASSERT_LE(0, client_fd);
> + }
> +
> + if (restricted) {
> + EXPECT_EQ(-1, read(client_fd, &buf, 1));
> + EXPECT_EQ(ENOTCONN, errno);
> + } else if (self->srv0.protocol.domain == AF_UNIX &&
> + self->srv0.protocol.type == SOCK_STREAM) {
> + EXPECT_EQ(-1, read(client_fd, &buf, 1));
> + EXPECT_EQ(EINVAL, errno);
> + } else {
> + EXPECT_EQ(1, read(client_fd, &buf, 1));
> + EXPECT_EQ('A', buf);
> + }
> +
> + EXPECT_EQ(child, waitpid(child, &status, 0));
> + EXPECT_EQ(1, WIFEXITED(status));
> + EXPECT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
> +
> + if (client_fd != bind_fd)
> + EXPECT_LE(0, close(client_fd));
> +
> + EXPECT_EQ(0, close(bind_fd));
> +}
> +
> FIXTURE(ipv4)
> {
> struct service_fixture srv0, srv1;
> --
> 2.47.3
>
>
^ permalink raw reply
* Re: [RFC PATCH 1/2] landlock: fix TCP Fast Open connection bypass
From: Mickaël Salaün @ 2026-06-26 20:40 UTC (permalink / raw)
To: Matthieu Buffet
Cc: Bryam Vargas, Günther Noack, linux-security-module,
Mikhail Ivanov, Paul Moore, Eric Dumazet, Neal Cardwell,
linux-kernel, netdev
In-Reply-To: <20260617180526.15627-2-matthieu@buffet.re>
Thanks Matthieu, could you please rebase this serise on the master
branch (especially on top of your UDP changes)?
This patch will be useful for backports though.
On Wed, Jun 17, 2026 at 08:05:23PM +0200, Matthieu Buffet wrote:
> The documentation of the socket_connect() LSM hook states that it
> controls connecting a socket to a remote address. It has not been the
> case since the addition of TCP Fast Open (RFC 7413) support, which allows
> opening a TCP connection (thus, setting a socket's destination address)
> via the MSG_FASTOPEN flag passed to sendto()/sendmsg()/sendmmsg(). The
> problem then got duplicated into MPTCP.
>
> Landlock did not take it into account when its TCP support was added,
> leaving a bypass of TCP connect policy.
>
> Ideally a call to the LSM hook would be added in the fastopen code path,
> in order to fix this generically. But connect() hooks are designed to run
> with the socket locked, unlike sendmsg() hooks.
>
> Closes: https://github.com/landlock-lsm/linux/issues/41
> Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
> Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
> ---
> security/landlock/net.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/security/landlock/net.c b/security/landlock/net.c
> index 4ee4002a8f56..a2375762c18b 100644
> --- a/security/landlock/net.c
> +++ b/security/landlock/net.c
> @@ -246,9 +246,26 @@ static int hook_socket_connect(struct socket *const sock,
> access_request);
> }
>
> +static int hook_socket_sendmsg(struct socket *const sock,
> + struct msghdr *const msg, const int size)
> +{
> + struct sockaddr *const address = msg->msg_name;
> + const int addrlen = msg->msg_namelen;
> +
> + if (sk_is_tcp(sock->sk) && address != NULL &&
> + (msg->msg_flags & MSG_FASTOPEN) != 0) {
This might be a bit better:
if ((msg->msg_flags & MSG_FASTOPEN) && address && sk_is_tcp(sock->sk))
> + return current_check_access_socket(
> + sock, address, addrlen,
> + LANDLOCK_ACCESS_NET_CONNECT_TCP);
> + }
> +
> + return 0;
> +}
> +
> static struct security_hook_list landlock_hooks[] __ro_after_init = {
> LSM_HOOK_INIT(socket_bind, hook_socket_bind),
> LSM_HOOK_INIT(socket_connect, hook_socket_connect),
> + LSM_HOOK_INIT(socket_sendmsg, hook_socket_sendmsg),
> };
>
> __init void landlock_add_net_hooks(void)
> --
> 2.47.3
>
>
^ permalink raw reply
* Re: [PATCH bpf-next v2 1/5] bpf: Verify signed loader metadata at load time
From: KP Singh @ 2026-06-26 22:01 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Paul Moore, Daniel Borkmann, ast, James.Bottomley, bboscaccy,
memxor, torvalds, bpf, linux-security-module
In-Reply-To: <DJIL18C2F40B.39U9WHD43SDBR@gmail.com>
On Fri, Jun 26, 2026 at 3:16 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu Jun 25, 2026 at 5:59 PM PDT, Paul Moore wrote:
> >
> > For all the reasons I gave previously, I can't support moving the
> > existing security_bpf_prog_load() hook at this point in time.
>
> Paul,
> it's not up to you to approve or deny where security_bpf_prog_load()
> is called within bpf subsystem as long as it doesn't affect behavior.
> Daniel's patch doesn't change observable state from LSMs pov.
> It merely moves the call from syscall.c to verifier.c.
> So we're going to proceed.
>
> > I'm guessing you still haven't looked at Blaise's patchset from last
> > September.
>
> Blaise approach was Nacked because you guys ignored TOCTOU issue.
> I pointed it a year ago before AI was a thing. Then sashiko
> pointed it again and the bot explained it in detail. It was again
> ignored.
>
Agreed, with Alexei: I like Daniel's solution because it avoids the
complexity of machine independent BTF for loader programs and still
addresses the signing goal, without the TOCTOU. I am okay with the
current implementation as well. I think having kfuncs in loader
programs would be useful, but we don't need to rush it or tie it to
the signing effort.
- KP
> Daniel's v1 sadly had the same issue and sashiko spotted it too.
> Hence v2 is moving the location of security_bpf_prog_load().
>
> > on-list. As you can see from the lore archives, he has vehemently
> > opposed the approach you are proposing for quite a while.
>
> Exactly, because you kept ignoring TOCTOU issue.
> Claiming support for signed bpf that can be easily defeated
> is a shameless security scam.
>
^ permalink raw reply
* Re: [PATCH bpf-next v2 1/5] bpf: Verify signed loader metadata at load time
From: Paul Moore @ 2026-06-27 1:32 UTC (permalink / raw)
To: KP Singh
Cc: Alexei Starovoitov, Daniel Borkmann, ast, James.Bottomley,
bboscaccy, memxor, torvalds, bpf, linux-security-module
In-Reply-To: <CACYkzJ64SxZj1Qvf1JRDfpoOWkg+bk_hpXcfGfPvecSfSu2jdA@mail.gmail.com>
On Fri, Jun 26, 2026 at 6:01 PM KP Singh <kpsingh@kernel.org> wrote:
> On Fri, Jun 26, 2026 at 3:16 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Thu Jun 25, 2026 at 5:59 PM PDT, Paul Moore wrote:
> > >
> > > For all the reasons I gave previously, I can't support moving the
> > > existing security_bpf_prog_load() hook at this point in time.
> >
> > Paul,
> > it's not up to you to approve or deny where security_bpf_prog_load()
> > is called within bpf subsystem as long as it doesn't affect behavior.
> > Daniel's patch doesn't change observable state from LSMs pov.
> > It merely moves the call from syscall.c to verifier.c.
> > So we're going to proceed.
> >
> > > I'm guessing you still haven't looked at Blaise's patchset from last
> > > September.
> >
> > Blaise approach was Nacked because you guys ignored TOCTOU issue.
> > I pointed it a year ago before AI was a thing. Then sashiko
> > pointed it again and the bot explained it in detail. It was again
> > ignored.
>
> Agreed, with Alexei: I like Daniel's solution because it avoids the
> complexity of machine independent BTF for loader programs and still
> addresses the signing goal, without the TOCTOU. I am okay with the
> current implementation as well. I think having kfuncs in loader
> programs would be useful, but we don't need to rush it or tie it to
> the signing effort.
As I've said a couple of times now in Daniel's approach, I like the
basic idea in Daniel's approach; it aligns with the same scheme we
have been advocating for all along: run the PKCS7 signature over both
the lskel loader and the maps. I know you keep mentioning a TOCTOU
bug in that September patchset, but the main objection to all the
different variants of Blaise's patches was always the same basic idea:
you and Alexei wanted to verify the maps in the lskel loader BPF code,
while we wanted to directly verify the maps as part of the PKCS7
signature. Any TOCTOU issues were implementation details; the
difference in the signature scheme was the sticking point in many of
the arguments. For whatever it is worth, you and Alexei were not
immune to TOCTOU issues either. The frozen map problem was mentioned
on-list and ignored. We had to submit a PoC/report to security@kernel
to get it taken seriously and patched.
Regardless, of how we got here and whether or not you recognize
Daniel's signature scheme as a retread of Blaise's scheme, the point
is that you've come around to a concept we have advocated for over a
year. If you plan to merge it, good (although I do wonder what Linus
will think of the userspace breakage caused by abandoning the
signature scheme that shipped in v6.18, maybe it's not a big deal, who
knows). However, and this is important, we can't move the
security_bpf_prog_load() function to where Daniel proposed in his v2
patchset. The issue is that non-trivial work happens, e.g. the
potential allocation of a buffer to hold up to one million eBPF
instructions, and we want to ensure we have an LSM access control
point before that happens. If you, Daniel, and Alexei want to move
the bpf_prog_verify_signature() call into the verifier that's fine.
However, the security_bpf_prog_load() call needs to stay where it is.
It served an important purpose well before signed BPF programs were
supported, and adding BPF signatures doesn't invalidate that original
purpose. If you want an LSM hook *after* the
bpf_prog_verify_signature() call and you oppose adding a new hook for
this new functionality, use the security_bpf_prog() hook. It isn't
quite as conceptually clean of a solution as adding a new hook, but it
should work to allow you to control access to a BPF program based on
the signature verification result.
I'll support Daniel's patchset as long as you don't tamper with the
security_bpf_prog_load() hook. If you move the hook to where it is
currently proposed in Daniel's v2 patchset I will have to object. Do
we really want to keep going around on this stupid, petty carousel of
bickering and constant NACKs?
--
paul-moore.com
^ permalink raw reply
page: | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox