* [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
@ 2025-02-02 13:05 Asahi Lina
2025-02-02 13:05 ` [PATCH 1/6] rust: types: Add Ownable/Owned types Asahi Lina
` (7 more replies)
0 siblings, 8 replies; 67+ messages in thread
From: Asahi Lina @ 2025-02-02 13:05 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross
Cc: Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi,
Asahi Lina
This series refactors the existing Page wrapper to support borrowing
`struct page` objects without ownership on the Rust side, and converting
page references to/from physical memory addresses.
The series overlaps with the earlier submission in [1] and follows a
different approach, based on the discussion that happened there.
The primary use case for this is implementing IOMMU-style page table
management in Rust. This allows drivers for IOMMUs and MMU-containing
SoC devices to be written in Rust (such as embedded GPUs). The intended
logic is similar to how ARM SMMU page tables are managed in the
drivers/iommu tree.
First, introduce a concept of Owned<T> and an Ownable trait. These are
similar to ARef<T> and AlwaysRefCounted, but are used for types which
are not ref counted but rather have a single intended owner.
Then, refactor the existing Page support to use the new mechanism. Pages
returned from the page allocator are not intended to be ref counted by
consumers (see previous discussion in [1]), so this keeps Rust's view of
page ownership as a simple "owned or not". Of course, this is still
composable as Arc<Owned<Page>> if Rust code needs to reference count its
own Page allocations for whatever reason.
Then, make some existing private methods public, since this is needed to
reasonably use allocated pages as IOMMU page tables.
Along the way we also add a small module to represent a core kernel
address types (PhysicalAddr, DmaAddr, ResourceSize, Pfn). In the future,
this might grow with helpers to make address math safer and more
Rust-like.
Finally, add methods to:
- Get a page's physical address
- Convert an owned Page into its physical address
- Convert a physical address back to its owned Page
- Borrow a Page from a physical address, in both checked (with checks
that a struct page exists and is accessible as regular RAM) and not
checked forms (useful when the user knows the physaddr is valid,
for example because it got it from Page::into_phys()).
Of course, all but the first two have to be `unsafe` by nature, but that
comes with the territory of writing low level memory management code.
These methods allow page table code to know the physical address of
pages (needed to build intermediate level PTEs) and to essentially
transfer ownership of the pages into the page table structure itself,
and back into Page objects when freeing page tables. Without that, the
code would have to keep track of page allocations in duplicate, once in
Rust code and once in the page table structure itself, which is less
desirable.
For Apple GPUs, the address space shared between firmware and the driver
is actually pre-allocated by the bootloader, with the top level page
table already pre-allocated, and the firmware owning some PTEs within it
while the kernel populates others. This cooperation works well when the
kernel can reference this top level page table by physical address. The
only thing the driver needs to ensure is that it never attempts to free
it in this case, nor the page tables corresponding to virtual address
ranges it doesn't own. Without the ability to just borrow the
pre-allocated top level page and access it, the driver would have to
special-case this and manually manage the top level PTEs outside the
main page table code, as well as introduce different page table
configurations with different numbers of levels so the kernel's view is
one lever shallower.
The physical address borrow feature is also useful to generate virtual
address space dumps for crash dumps, including firmware pages. The
intent is that firmware pages are configured in the Device Tree as
reserved System RAM (without no-map), which creates struct page objects
for them and makes them available in the kernel's direct map. Then the
driver's page table code can walk the page tables and make a snapshot of
the entire address space, including firmware code and data pages,
pre-allocated shared segments, and driver-allocated objects (which are
GEM objects), again without special casing anything. The checks in
`Page::borrow_phys()` should ensure that the page is safe to access as
RAM, so this will skip MMIO pages and anything that wasn't declared to
the kernel in the DT.
Example usage:
https://github.com/AsahiLinux/linux/blob/gpu/rust-wip/drivers/gpu/drm/asahi/pgtable.rs
The last patch is a minor cleanup to the Page abstraction noticed while
preparing this series.
[1] https://lore.kernel.org/lkml/20241119112408.779243-1-abdiel.janulgue@gmail.com/T/#u
Signed-off-by: Asahi Lina <lina@asahilina.net>
---
Asahi Lina (6):
rust: types: Add Ownable/Owned types
rust: page: Convert to Ownable
rust: page: Make with_page_mapped() and with_pointer_into_page() public
rust: addr: Add a module to declare core address types
rust: page: Add physical address conversion functions
rust: page: Make Page::as_ptr() pub(crate)
rust/helpers/page.c | 26 ++++++++++++
rust/kernel/addr.rs | 15 +++++++
rust/kernel/lib.rs | 1 +
rust/kernel/page.rs | 101 ++++++++++++++++++++++++++++++++++++++--------
rust/kernel/types.rs | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 236 insertions(+), 17 deletions(-)
---
base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
change-id: 20250202-rust-page-80892069fc78
Cheers,
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 1/6] rust: types: Add Ownable/Owned types
2025-02-02 13:05 [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Asahi Lina
@ 2025-02-02 13:05 ` Asahi Lina
2025-02-03 9:13 ` Alice Ryhl
2025-02-19 8:37 ` Andreas Hindborg
2025-02-02 13:05 ` [PATCH 2/6] rust: page: Convert to Ownable Asahi Lina
` (6 subsequent siblings)
7 siblings, 2 replies; 67+ messages in thread
From: Asahi Lina @ 2025-02-02 13:05 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross
Cc: Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi,
Asahi Lina
By analogy to AlwaysRefCounted and ARef, an Ownable type is a (typically
C FFI) type that *may* be owned by Rust, but need not be. Unlike
AlwaysRefCounted, this mechanism expects the reference to be unique
within Rust, and does not allow cloning.
Conceptually, this is similar to a KBox<T>, except that it delegates
resource management to the T instead of using a generic allocator.
Signed-off-by: Asahi Lina <lina@asahilina.net>
---
rust/kernel/types.rs | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 110 insertions(+)
diff --git a/rust/kernel/types.rs b/rust/kernel/types.rs
index ec6457bb3084ae327c38ba4ba79b1601aef38244..0bee56153dcea47fb1321162df6b8765b5436e9f 100644
--- a/rust/kernel/types.rs
+++ b/rust/kernel/types.rs
@@ -479,6 +479,116 @@ fn drop(&mut self) {
}
}
+/// Types that may be owned by Rust code or borrowed, but have a lifetime managed by C code.
+///
+/// It allows such types to define their own custom destructor function to be called when
+/// a Rust-owned reference is dropped.
+///
+/// This is usually implemented by wrappers to existing structures on the C side of the code.
+///
+/// # Safety
+///
+/// Implementers must ensure that any objects borrowed directly as `&T` stay alive for the duration
+/// of the lifetime, and that any objects owned by Rust as `Owned<T>`) stay alive while that owned
+/// reference exists, until the [`Ownable::release()`] trait method is called.
+pub unsafe trait Ownable {
+ /// Releases the object (frees it or returns it to foreign ownership).
+ ///
+ /// # Safety
+ ///
+ /// Callers must ensure that the object is no longer referenced after this call.
+ unsafe fn release(this: NonNull<Self>);
+}
+
+/// A subtrait of Ownable that asserts that an `Owned<T>` Rust reference is not only unique
+/// within Rust and keeps the `T` alive, but also guarantees that the C code follows the
+/// usual mutable reference requirements. That is, the kernel will never mutate the
+/// `T` (excluding internal mutability that follows the usual rules) while Rust owns it.
+///
+/// When this type is implemented for an [`Ownable`] type, it allows `Owned<T>` to be
+/// dereferenced into a &mut T.
+///
+/// # Safety
+///
+/// Implementers must ensure that the kernel never mutates the underlying type while
+/// Rust owns it.
+pub unsafe trait OwnableMut: Ownable {}
+
+/// An owned reference to an ownable kernel object.
+///
+/// The object is automatically freed or released when an instance of [`Owned`] is
+/// dropped.
+///
+/// # Invariants
+///
+/// The pointer stored in `ptr` is non-null and valid for the lifetime of the [`Owned`] instance.
+pub struct Owned<T: Ownable> {
+ ptr: NonNull<T>,
+ _p: PhantomData<T>,
+}
+
+// SAFETY: It is safe to send `Owned<T>` to another thread when the underlying `T` is `Send` because
+// it effectively means sending a unique `&mut T` pointer (which is safe because `T` is `Send`).
+unsafe impl<T: Ownable + Send> Send for Owned<T> {}
+
+// SAFETY: It is safe to send `&Owned<T>` to another thread when the underlying `T` is `Sync`
+// because it effectively means sharing `&T` (which is safe because `T` is `Sync`).
+unsafe impl<T: Ownable + Sync> Sync for Owned<T> {}
+
+impl<T: Ownable> Owned<T> {
+ /// Creates a new instance of [`Owned`].
+ ///
+ /// It takes over ownership of the underlying object.
+ ///
+ /// # Safety
+ ///
+ /// Callers must ensure that the underlying object is acquired and can be considered owned by
+ /// Rust.
+ pub(crate) unsafe fn from_raw(ptr: NonNull<T>) -> Self {
+ // INVARIANT: The safety requirements guarantee that the new instance now owns the
+ // reference.
+ Self {
+ ptr,
+ _p: PhantomData,
+ }
+ }
+
+ /// Consumes the `Owned`, returning a raw pointer.
+ ///
+ /// This function does not actually relinquish ownership of the object.
+ /// After calling this function, the caller is responsible for ownership previously managed
+ /// by the `Owned`.
+ #[allow(dead_code)]
+ pub(crate) fn into_raw(me: Self) -> NonNull<T> {
+ ManuallyDrop::new(me).ptr
+ }
+}
+
+impl<T: Ownable> Deref for Owned<T> {
+ type Target = T;
+
+ fn deref(&self) -> &Self::Target {
+ // SAFETY: The type invariants guarantee that the object is valid.
+ unsafe { self.ptr.as_ref() }
+ }
+}
+
+impl<T: Ownable + OwnableMut> DerefMut for Owned<T> {
+ fn deref_mut(&mut self) -> &mut Self::Target {
+ // SAFETY: The type invariants guarantee that the object is valid,
+ // and that we can safely return a mutable reference to it.
+ unsafe { self.ptr.as_mut() }
+ }
+}
+
+impl<T: Ownable> Drop for Owned<T> {
+ fn drop(&mut self) {
+ // SAFETY: The type invariants guarantee that the `Owned` owns the object we're about to
+ // release.
+ unsafe { T::release(self.ptr) };
+ }
+}
+
/// A sum type that always holds either a value of type `L` or `R`.
///
/// # Examples
--
2.47.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 2/6] rust: page: Convert to Ownable
2025-02-02 13:05 [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Asahi Lina
2025-02-02 13:05 ` [PATCH 1/6] rust: types: Add Ownable/Owned types Asahi Lina
@ 2025-02-02 13:05 ` Asahi Lina
2025-02-03 9:17 ` Alice Ryhl
` (2 more replies)
2025-02-02 13:05 ` [PATCH 3/6] rust: page: Make with_page_mapped() and with_pointer_into_page() public Asahi Lina
` (5 subsequent siblings)
7 siblings, 3 replies; 67+ messages in thread
From: Asahi Lina @ 2025-02-02 13:05 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross
Cc: Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi,
Asahi Lina
This allows Page references to be returned as borrowed references,
without necessarily owning the struct page.
Signed-off-by: Asahi Lina <lina@asahilina.net>
---
rust/kernel/page.rs | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
index fdac6c375fe46e1ba589f1640afeae3e001e39ae..0b6cbe02522ab6e6e1810288ad23af4e4aa587d8 100644
--- a/rust/kernel/page.rs
+++ b/rust/kernel/page.rs
@@ -7,6 +7,7 @@
bindings,
error::code::*,
error::Result,
+ types::{Opaque, Ownable, Owned},
uaccess::UserSliceReader,
};
use core::ptr::{self, NonNull};
@@ -30,13 +31,10 @@ pub const fn page_align(addr: usize) -> usize {
(addr + (PAGE_SIZE - 1)) & PAGE_MASK
}
-/// A pointer to a page that owns the page allocation.
-///
-/// # Invariants
-///
-/// The pointer is valid, and has ownership over the page.
+/// An object representing a memory page in the kernel (`struct page`).
+#[repr(transparent)]
pub struct Page {
- page: NonNull<bindings::page>,
+ page: Opaque<bindings::page>,
}
// SAFETY: Pages have no logic that relies on them staying on a given thread, so moving them across
@@ -71,19 +69,20 @@ impl Page {
/// let page = Page::alloc_page(GFP_KERNEL | __GFP_ZERO)?;
/// # Ok(()) }
/// ```
- pub fn alloc_page(flags: Flags) -> Result<Self, AllocError> {
+ pub fn alloc_page(flags: Flags) -> Result<Owned<Self>, AllocError> {
// SAFETY: Depending on the value of `gfp_flags`, this call may sleep. Other than that, it
// is always safe to call this method.
let page = unsafe { bindings::alloc_pages(flags.as_raw(), 0) };
let page = NonNull::new(page).ok_or(AllocError)?;
- // INVARIANT: We just successfully allocated a page, so we now have ownership of the newly
- // allocated page. We transfer that ownership to the new `Page` object.
- Ok(Self { page })
+ // SAFETY: We just successfully allocated a page, so we now have ownership of the newly
+ // allocated page. We transfer that ownership to the new `Owned<Page>` object.
+ // Since `Page` is transparent, we can cast the pointer directly.
+ Ok(unsafe { Owned::from_raw(page.cast()) })
}
/// Returns a raw pointer to the page.
pub fn as_ptr(&self) -> *mut bindings::page {
- self.page.as_ptr()
+ Opaque::raw_get(&self.page)
}
/// Runs a piece of code with this page mapped to an address.
@@ -252,9 +251,12 @@ pub unsafe fn copy_from_user_slice_raw(
}
}
-impl Drop for Page {
- fn drop(&mut self) {
+// SAFETY: `Owned<Page>` objects returned by Page::alloc_page() follow the requirements of
+// the Ownable abstraction.
+unsafe impl Ownable for Page {
+ unsafe fn release(this: NonNull<Self>) {
// SAFETY: By the type invariants, we have ownership of the page and can free it.
- unsafe { bindings::__free_pages(self.page.as_ptr(), 0) };
+ // Since Page is transparent, we can cast the raw pointer directly.
+ unsafe { bindings::__free_pages(this.cast().as_ptr(), 0) };
}
}
--
2.47.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 3/6] rust: page: Make with_page_mapped() and with_pointer_into_page() public
2025-02-02 13:05 [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Asahi Lina
2025-02-02 13:05 ` [PATCH 1/6] rust: types: Add Ownable/Owned types Asahi Lina
2025-02-02 13:05 ` [PATCH 2/6] rust: page: Convert to Ownable Asahi Lina
@ 2025-02-02 13:05 ` Asahi Lina
2025-02-03 9:10 ` Alice Ryhl
` (2 more replies)
2025-02-02 13:05 ` [PATCH 4/6] rust: addr: Add a module to declare core address types Asahi Lina
` (4 subsequent siblings)
7 siblings, 3 replies; 67+ messages in thread
From: Asahi Lina @ 2025-02-02 13:05 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross
Cc: Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi,
Asahi Lina
Lets users do (unsafe) complex page read/write operations without having
to repeatedly call into read_raw()/write_raw() (which may be expensive
in some cases).
The functions themselves are not unsafe, but they do take a closure that
receives a raw pointer, so actually making the access requires unsafe
code.
Signed-off-by: Asahi Lina <lina@asahilina.net>
---
rust/kernel/page.rs | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
index 0b6cbe02522ab6e6e1810288ad23af4e4aa587d8..fe5f879f9d1a86083fd55c682fad9d52466f79a2 100644
--- a/rust/kernel/page.rs
+++ b/rust/kernel/page.rs
@@ -101,7 +101,7 @@ pub fn as_ptr(&self) -> *mut bindings::page {
/// different addresses. However, even if the addresses are different, the underlying memory is
/// still the same for these purposes (e.g., it's still a data race if they both write to the
/// same underlying byte at the same time).
- fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
+ pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
// SAFETY: `page` is valid due to the type invariants on `Page`.
let mapped_addr = unsafe { bindings::kmap_local_page(self.as_ptr()) };
@@ -142,7 +142,7 @@ fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
/// different addresses. However, even if the addresses are different, the underlying memory is
/// still the same for these purposes (e.g., it's still a data race if they both write to the
/// same underlying byte at the same time).
- fn with_pointer_into_page<T>(
+ pub fn with_pointer_into_page<T>(
&self,
off: usize,
len: usize,
--
2.47.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 4/6] rust: addr: Add a module to declare core address types
2025-02-02 13:05 [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Asahi Lina
` (2 preceding siblings ...)
2025-02-02 13:05 ` [PATCH 3/6] rust: page: Make with_page_mapped() and with_pointer_into_page() public Asahi Lina
@ 2025-02-02 13:05 ` Asahi Lina
2025-02-03 9:09 ` Alice Ryhl
` (2 more replies)
2025-02-02 13:05 ` [PATCH 5/6] rust: page: Add physical address conversion functions Asahi Lina
` (3 subsequent siblings)
7 siblings, 3 replies; 67+ messages in thread
From: Asahi Lina @ 2025-02-02 13:05 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross
Cc: Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi,
Asahi Lina
Encapsulates the core physical/DMA address types, so they can be used by
Rust abstractions.
Signed-off-by: Asahi Lina <lina@asahilina.net>
---
rust/kernel/addr.rs | 15 +++++++++++++++
rust/kernel/lib.rs | 1 +
2 files changed, 16 insertions(+)
diff --git a/rust/kernel/addr.rs b/rust/kernel/addr.rs
new file mode 100644
index 0000000000000000000000000000000000000000..06aff10a0332355597060c5518d7fd6e114cf630
--- /dev/null
+++ b/rust/kernel/addr.rs
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Kernel core address types.
+
+use bindings;
+use core::ffi;
+
+/// A physical memory address (which may be wider than the CPU pointer size)
+pub type PhysicalAddr = bindings::phys_addr_t;
+/// A DMA memory address (which may be narrower than `PhysicalAddr` on some systems)
+pub type DmaAddr = bindings::dma_addr_t;
+/// A physical resource size, typically the same width as `PhysicalAddr`
+pub type ResourceSize = bindings::resource_size_t;
+/// A raw page frame number, not to be confused with the C `pfn_t` which also encodes flags.
+pub type Pfn = ffi::c_ulong;
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index e1065a7551a39e68d6379031d80d4be336e652a3..eb1a80ba8ff83ab2d1b3b1d11fed4fb704c7a4f5 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -29,6 +29,7 @@
pub use ffi;
+pub mod addr;
pub mod alloc;
#[cfg(CONFIG_BLOCK)]
pub mod block;
--
2.47.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 5/6] rust: page: Add physical address conversion functions
2025-02-02 13:05 [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Asahi Lina
` (3 preceding siblings ...)
2025-02-02 13:05 ` [PATCH 4/6] rust: addr: Add a module to declare core address types Asahi Lina
@ 2025-02-02 13:05 ` Asahi Lina
2025-02-03 9:35 ` Alice Ryhl
` (2 more replies)
2025-02-02 13:05 ` [PATCH 6/6] rust: page: Make Page::as_ptr() pub(crate) Asahi Lina
` (2 subsequent siblings)
7 siblings, 3 replies; 67+ messages in thread
From: Asahi Lina @ 2025-02-02 13:05 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross
Cc: Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi,
Asahi Lina
Add methods to allow code using the Page type to obtain the physical
address of a page, convert to and from an (owned) physical address, and
borrow a Page from a physical address. Most of these operations are, as
you might expect, unsafe.
These primitives are useful to implement page table structures in Rust,
and to implement arbitrary physical memory access (as needed to walk
arbitrary page tables and dereference through them). These mechanisms
are, of course, fraught with danger, and are only expected to be used
for core memory management code (in e.g. drivers with their own device
page table implementations) and for debug features such as crash dumps
of device memory.
Signed-off-by: Asahi Lina <lina@asahilina.net>
---
rust/helpers/page.c | 26 +++++++++++++++++++++
rust/kernel/page.rs | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 91 insertions(+)
diff --git a/rust/helpers/page.c b/rust/helpers/page.c
index b3f2b8fbf87fc9aa89cb1636736c52be16411301..1c3bd68818d77f7ce7806329b8f040a7d4205bb3 100644
--- a/rust/helpers/page.c
+++ b/rust/helpers/page.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
+#include <linux/io.h>
#include <linux/gfp.h>
#include <linux/highmem.h>
@@ -17,3 +18,28 @@ void rust_helper_kunmap_local(const void *addr)
{
kunmap_local(addr);
}
+
+struct page *rust_helper_phys_to_page(phys_addr_t phys)
+{
+ return phys_to_page(phys);
+}
+
+phys_addr_t rust_helper_page_to_phys(struct page *page)
+{
+ return page_to_phys(page);
+}
+
+unsigned long rust_helper_phys_to_pfn(phys_addr_t phys)
+{
+ return __phys_to_pfn(phys);
+}
+
+struct page *rust_helper_pfn_to_page(unsigned long pfn)
+{
+ return pfn_to_page(pfn);
+}
+
+bool rust_helper_pfn_valid(unsigned long pfn)
+{
+ return pfn_valid(pfn);
+}
diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
index fe5f879f9d1a86083fd55c682fad9d52466f79a2..67cd7006fa63ab5aed4c4de2be639ed8e1fbc2ba 100644
--- a/rust/kernel/page.rs
+++ b/rust/kernel/page.rs
@@ -3,6 +3,7 @@
//! Kernel page allocation and management.
use crate::{
+ addr::*,
alloc::{AllocError, Flags},
bindings,
error::code::*,
@@ -10,6 +11,7 @@
types::{Opaque, Ownable, Owned},
uaccess::UserSliceReader,
};
+use core::mem::ManuallyDrop;
use core::ptr::{self, NonNull};
/// A bitwise shift for the page size.
@@ -249,6 +251,69 @@ pub unsafe fn copy_from_user_slice_raw(
reader.read_raw(unsafe { core::slice::from_raw_parts_mut(dst.cast(), len) })
})
}
+
+ /// Returns the physical address of this page.
+ pub fn phys(&self) -> PhysicalAddr {
+ // SAFETY: `page` is valid due to the type invariants on `Page`.
+ unsafe { bindings::page_to_phys(self.as_ptr()) }
+ }
+
+ /// Converts a Rust-owned Page into its physical address.
+ ///
+ /// The caller is responsible for calling [`Page::from_phys()`] to avoid leaking memory.
+ pub fn into_phys(this: Owned<Self>) -> PhysicalAddr {
+ ManuallyDrop::new(this).phys()
+ }
+
+ /// Converts a physical address to a Rust-owned Page.
+ ///
+ /// # Safety
+ /// The caller must ensure that the physical address was previously returned by a call to
+ /// [`Page::into_phys()`], and that the physical address is no longer used after this call,
+ /// nor is [`Page::from_phys()`] called again on it.
+ pub unsafe fn from_phys(phys: PhysicalAddr) -> Owned<Self> {
+ // SAFETY: By the safety requirements, the physical address must be valid and
+ // have come from `into_phys()`, so phys_to_page() cannot fail and
+ // must return the original struct page pointer.
+ unsafe { Owned::from_raw(NonNull::new_unchecked(bindings::phys_to_page(phys)).cast()) }
+ }
+
+ /// Borrows a Page from a physical address, without taking over ownership.
+ ///
+ /// If the physical address does not have a `struct page` entry or is not
+ /// part of a System RAM region, returns None.
+ ///
+ /// # Safety
+ /// The caller must ensure that the physical address, if it is backed by a `struct page`,
+ /// remains available for the duration of the borrowed lifetime.
+ pub unsafe fn borrow_phys(phys: &PhysicalAddr) -> Option<&Self> {
+ // SAFETY: This is always safe, as it is just arithmetic
+ let pfn = unsafe { bindings::phys_to_pfn(*phys) };
+ // SAFETY: This function is safe to call with any pfn
+ if !unsafe { bindings::pfn_valid(pfn) && bindings::page_is_ram(pfn) != 0 } {
+ None
+ } else {
+ // SAFETY: We have just checked that the pfn is valid above, so it must
+ // have a corresponding struct page. By the safety requirements, we can
+ // return a borrowed reference to it.
+ Some(unsafe { &*(bindings::pfn_to_page(pfn) as *mut Self as *const Self) })
+ }
+ }
+
+ /// Borrows a Page from a physical address, without taking over ownership
+ /// nor checking for validity.
+ ///
+ /// # Safety
+ /// The caller must ensure that the physical address is backed by a `struct page` and
+ /// corresponds to System RAM. This is true when the address was returned by
+ /// [`Page::into_phys()`].
+ pub unsafe fn borrow_phys_unchecked(phys: &PhysicalAddr) -> &Self {
+ // SAFETY: This is always safe, as it is just arithmetic
+ let pfn = unsafe { bindings::phys_to_pfn(*phys) };
+ // SAFETY: The caller guarantees that the pfn is valid. By the safety
+ // requirements, we can return a borrowed reference to it.
+ unsafe { &*(bindings::pfn_to_page(pfn) as *mut Self as *const Self) }
+ }
}
// SAFETY: `Owned<Page>` objects returned by Page::alloc_page() follow the requirements of
--
2.47.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 6/6] rust: page: Make Page::as_ptr() pub(crate)
2025-02-02 13:05 [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Asahi Lina
` (4 preceding siblings ...)
2025-02-02 13:05 ` [PATCH 5/6] rust: page: Add physical address conversion functions Asahi Lina
@ 2025-02-02 13:05 ` Asahi Lina
2025-02-03 9:08 ` Alice Ryhl
2025-02-19 9:08 ` Andreas Hindborg
2025-02-03 9:58 ` [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Simona Vetter
2025-02-03 10:22 ` Danilo Krummrich
7 siblings, 2 replies; 67+ messages in thread
From: Asahi Lina @ 2025-02-02 13:05 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross
Cc: Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi,
Asahi Lina
There's no good reason for drivers to need access to the raw `struct
page` pointer, since that should be handled by Rust abstractions. Make
this method pub(crate) so it is only visible to the kernel crate.
Signed-off-by: Asahi Lina <lina@asahilina.net>
---
rust/kernel/page.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
index 67cd7006fa63ab5aed4c4de2be639ed8e1fbc2ba..b7bce033b53682eca6e61f47e4fd3aa902da7900 100644
--- a/rust/kernel/page.rs
+++ b/rust/kernel/page.rs
@@ -83,7 +83,7 @@ pub fn alloc_page(flags: Flags) -> Result<Owned<Self>, AllocError> {
}
/// Returns a raw pointer to the page.
- pub fn as_ptr(&self) -> *mut bindings::page {
+ pub(crate) fn as_ptr(&self) -> *mut bindings::page {
Opaque::raw_get(&self.page)
}
--
2.47.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH 6/6] rust: page: Make Page::as_ptr() pub(crate)
2025-02-02 13:05 ` [PATCH 6/6] rust: page: Make Page::as_ptr() pub(crate) Asahi Lina
@ 2025-02-03 9:08 ` Alice Ryhl
2025-02-19 9:08 ` Andreas Hindborg
1 sibling, 0 replies; 67+ messages in thread
From: Alice Ryhl @ 2025-02-03 9:08 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Sun, Feb 2, 2025 at 2:06 PM Asahi Lina <lina@asahilina.net> wrote:
>
> There's no good reason for drivers to need access to the raw `struct
> page` pointer, since that should be handled by Rust abstractions. Make
> this method pub(crate) so it is only visible to the kernel crate.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
The Rust Binder driver has a C component, so it needs various as_ptr()
methods to be public.
Alice
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 4/6] rust: addr: Add a module to declare core address types
2025-02-02 13:05 ` [PATCH 4/6] rust: addr: Add a module to declare core address types Asahi Lina
@ 2025-02-03 9:09 ` Alice Ryhl
2025-02-03 15:04 ` Boqun Feng
2025-02-19 8:51 ` Andreas Hindborg
2 siblings, 0 replies; 67+ messages in thread
From: Alice Ryhl @ 2025-02-03 9:09 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Sun, Feb 2, 2025 at 2:06 PM Asahi Lina <lina@asahilina.net> wrote:
>
> Encapsulates the core physical/DMA address types, so they can be used by
> Rust abstractions.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 3/6] rust: page: Make with_page_mapped() and with_pointer_into_page() public
2025-02-02 13:05 ` [PATCH 3/6] rust: page: Make with_page_mapped() and with_pointer_into_page() public Asahi Lina
@ 2025-02-03 9:10 ` Alice Ryhl
2025-02-03 9:43 ` Fiona Behrens
2025-02-19 8:48 ` Andreas Hindborg
2 siblings, 0 replies; 67+ messages in thread
From: Alice Ryhl @ 2025-02-03 9:10 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Sun, Feb 2, 2025 at 2:06 PM Asahi Lina <lina@asahilina.net> wrote:
>
> Lets users do (unsafe) complex page read/write operations without having
> to repeatedly call into read_raw()/write_raw() (which may be expensive
> in some cases).
>
> The functions themselves are not unsafe, but they do take a closure that
> receives a raw pointer, so actually making the access requires unsafe
> code.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 1/6] rust: types: Add Ownable/Owned types
2025-02-02 13:05 ` [PATCH 1/6] rust: types: Add Ownable/Owned types Asahi Lina
@ 2025-02-03 9:13 ` Alice Ryhl
2025-02-03 14:17 ` Asahi Lina
2025-02-19 8:37 ` Andreas Hindborg
1 sibling, 1 reply; 67+ messages in thread
From: Alice Ryhl @ 2025-02-03 9:13 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Sun, Feb 2, 2025 at 2:06 PM Asahi Lina <lina@asahilina.net> wrote:
>
> By analogy to AlwaysRefCounted and ARef, an Ownable type is a (typically
> C FFI) type that *may* be owned by Rust, but need not be. Unlike
> AlwaysRefCounted, this mechanism expects the reference to be unique
> within Rust, and does not allow cloning.
>
> Conceptually, this is similar to a KBox<T>, except that it delegates
> resource management to the T instead of using a generic allocator.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
Overall LGTM.
> +/// A subtrait of Ownable that asserts that an `Owned<T>` Rust reference is not only unique
> +/// within Rust and keeps the `T` alive, but also guarantees that the C code follows the
> +/// usual mutable reference requirements. That is, the kernel will never mutate the
> +/// `T` (excluding internal mutability that follows the usual rules) while Rust owns it.
> +///
> +/// When this type is implemented for an [`Ownable`] type, it allows `Owned<T>` to be
> +/// dereferenced into a &mut T.
> +///
> +/// # Safety
> +///
> +/// Implementers must ensure that the kernel never mutates the underlying type while
> +/// Rust owns it.
> +pub unsafe trait OwnableMut: Ownable {}
Giving out mutable references allows users to call core::mem::swap on
the object. We must require that this is allowed.
> +impl<T: Ownable> Owned<T> {
> + /// Creates a new instance of [`Owned`].
> + ///
> + /// It takes over ownership of the underlying object.
> + ///
> + /// # Safety
> + ///
> + /// Callers must ensure that the underlying object is acquired and can be considered owned by
> + /// Rust.
> + pub(crate) unsafe fn from_raw(ptr: NonNull<T>) -> Self {
> + // INVARIANT: The safety requirements guarantee that the new instance now owns the
> + // reference.
> + Self {
> + ptr,
> + _p: PhantomData,
> + }
> + }
> +
> + /// Consumes the `Owned`, returning a raw pointer.
> + ///
> + /// This function does not actually relinquish ownership of the object.
> + /// After calling this function, the caller is responsible for ownership previously managed
> + /// by the `Owned`.
> + #[allow(dead_code)]
> + pub(crate) fn into_raw(me: Self) -> NonNull<T> {
I would just make these methods public, like the ARef ones. Then you
can drop the #[allow(dead_code)] annotation.
Alice
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 2/6] rust: page: Convert to Ownable
2025-02-02 13:05 ` [PATCH 2/6] rust: page: Convert to Ownable Asahi Lina
@ 2025-02-03 9:17 ` Alice Ryhl
2025-02-03 9:39 ` Fiona Behrens
2025-02-19 8:46 ` Andreas Hindborg
2 siblings, 0 replies; 67+ messages in thread
From: Alice Ryhl @ 2025-02-03 9:17 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Sun, Feb 2, 2025 at 2:06 PM Asahi Lina <lina@asahilina.net> wrote:
>
> This allows Page references to be returned as borrowed references,
> without necessarily owning the struct page.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 5/6] rust: page: Add physical address conversion functions
2025-02-02 13:05 ` [PATCH 5/6] rust: page: Add physical address conversion functions Asahi Lina
@ 2025-02-03 9:35 ` Alice Ryhl
2025-02-04 11:43 ` Asahi Lina
2025-02-03 9:53 ` Fiona Behrens
2025-02-19 9:06 ` Andreas Hindborg
2 siblings, 1 reply; 67+ messages in thread
From: Alice Ryhl @ 2025-02-03 9:35 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Sun, Feb 2, 2025 at 2:06 PM Asahi Lina <lina@asahilina.net> wrote:
>
> Add methods to allow code using the Page type to obtain the physical
> address of a page, convert to and from an (owned) physical address, and
> borrow a Page from a physical address. Most of these operations are, as
> you might expect, unsafe.
>
> These primitives are useful to implement page table structures in Rust,
> and to implement arbitrary physical memory access (as needed to walk
> arbitrary page tables and dereference through them). These mechanisms
> are, of course, fraught with danger, and are only expected to be used
> for core memory management code (in e.g. drivers with their own device
> page table implementations) and for debug features such as crash dumps
> of device memory.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
> ---
> rust/helpers/page.c | 26 +++++++++++++++++++++
> rust/kernel/page.rs | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 91 insertions(+)
>
> diff --git a/rust/helpers/page.c b/rust/helpers/page.c
> index b3f2b8fbf87fc9aa89cb1636736c52be16411301..1c3bd68818d77f7ce7806329b8f040a7d4205bb3 100644
> --- a/rust/helpers/page.c
> +++ b/rust/helpers/page.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0
>
> +#include <linux/io.h>
> #include <linux/gfp.h>
> #include <linux/highmem.h>
>
> @@ -17,3 +18,28 @@ void rust_helper_kunmap_local(const void *addr)
> {
> kunmap_local(addr);
> }
> +
> +struct page *rust_helper_phys_to_page(phys_addr_t phys)
> +{
> + return phys_to_page(phys);
> +}
> +
> +phys_addr_t rust_helper_page_to_phys(struct page *page)
> +{
> + return page_to_phys(page);
> +}
> +
> +unsigned long rust_helper_phys_to_pfn(phys_addr_t phys)
> +{
> + return __phys_to_pfn(phys);
> +}
> +
> +struct page *rust_helper_pfn_to_page(unsigned long pfn)
> +{
> + return pfn_to_page(pfn);
> +}
> +
> +bool rust_helper_pfn_valid(unsigned long pfn)
> +{
> + return pfn_valid(pfn);
> +}
> diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
> index fe5f879f9d1a86083fd55c682fad9d52466f79a2..67cd7006fa63ab5aed4c4de2be639ed8e1fbc2ba 100644
> --- a/rust/kernel/page.rs
> +++ b/rust/kernel/page.rs
> @@ -3,6 +3,7 @@
> //! Kernel page allocation and management.
>
> use crate::{
> + addr::*,
> alloc::{AllocError, Flags},
> bindings,
> error::code::*,
> @@ -10,6 +11,7 @@
> types::{Opaque, Ownable, Owned},
> uaccess::UserSliceReader,
> };
> +use core::mem::ManuallyDrop;
> use core::ptr::{self, NonNull};
>
> /// A bitwise shift for the page size.
> @@ -249,6 +251,69 @@ pub unsafe fn copy_from_user_slice_raw(
> reader.read_raw(unsafe { core::slice::from_raw_parts_mut(dst.cast(), len) })
> })
> }
> +
> + /// Returns the physical address of this page.
> + pub fn phys(&self) -> PhysicalAddr {
> + // SAFETY: `page` is valid due to the type invariants on `Page`.
> + unsafe { bindings::page_to_phys(self.as_ptr()) }
> + }
> +
> + /// Converts a Rust-owned Page into its physical address.
> + ///
> + /// The caller is responsible for calling [`Page::from_phys()`] to avoid leaking memory.
> + pub fn into_phys(this: Owned<Self>) -> PhysicalAddr {
> + ManuallyDrop::new(this).phys()
> + }
> +
> + /// Converts a physical address to a Rust-owned Page.
> + ///
> + /// # Safety
> + /// The caller must ensure that the physical address was previously returned by a call to
> + /// [`Page::into_phys()`], and that the physical address is no longer used after this call,
> + /// nor is [`Page::from_phys()`] called again on it.
> + pub unsafe fn from_phys(phys: PhysicalAddr) -> Owned<Self> {
> + // SAFETY: By the safety requirements, the physical address must be valid and
> + // have come from `into_phys()`, so phys_to_page() cannot fail and
> + // must return the original struct page pointer.
> + unsafe { Owned::from_raw(NonNull::new_unchecked(bindings::phys_to_page(phys)).cast()) }
> + }
> +
> + /// Borrows a Page from a physical address, without taking over ownership.
> + ///
> + /// If the physical address does not have a `struct page` entry or is not
> + /// part of a System RAM region, returns None.
> + ///
> + /// # Safety
> + /// The caller must ensure that the physical address, if it is backed by a `struct page`,
> + /// remains available for the duration of the borrowed lifetime.
> + pub unsafe fn borrow_phys(phys: &PhysicalAddr) -> Option<&Self> {
> + // SAFETY: This is always safe, as it is just arithmetic
> + let pfn = unsafe { bindings::phys_to_pfn(*phys) };
> + // SAFETY: This function is safe to call with any pfn
> + if !unsafe { bindings::pfn_valid(pfn) && bindings::page_is_ram(pfn) != 0 } {
> + None
> + } else {
> + // SAFETY: We have just checked that the pfn is valid above, so it must
> + // have a corresponding struct page. By the safety requirements, we can
> + // return a borrowed reference to it.
> + Some(unsafe { &*(bindings::pfn_to_page(pfn) as *mut Self as *const Self) })
> + }
> + }
> +
> + /// Borrows a Page from a physical address, without taking over ownership
> + /// nor checking for validity.
> + ///
> + /// # Safety
> + /// The caller must ensure that the physical address is backed by a `struct page` and
> + /// corresponds to System RAM. This is true when the address was returned by
> + /// [`Page::into_phys()`].
> + pub unsafe fn borrow_phys_unchecked(phys: &PhysicalAddr) -> &Self {
Should this be
pub unsafe fn borrow_phys_unchecked<'a>(phys: PhysicalAddr) -> &'a Self
? That's how the signature of these raw methods usually goes, and then
your safety requirements say that the requirements must hold for the
duration of 'a.
> + // SAFETY: This is always safe, as it is just arithmetic
> + let pfn = unsafe { bindings::phys_to_pfn(*phys) };
> + // SAFETY: The caller guarantees that the pfn is valid. By the safety
> + // requirements, we can return a borrowed reference to it.
> + unsafe { &*(bindings::pfn_to_page(pfn) as *mut Self as *const Self) }
Can this just be
&*bindings::pfn_to_page(pfn).cast()
?
Alice
> + }
> }
>
> // SAFETY: `Owned<Page>` objects returned by Page::alloc_page() follow the requirements of
>
> --
> 2.47.1
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 2/6] rust: page: Convert to Ownable
2025-02-02 13:05 ` [PATCH 2/6] rust: page: Convert to Ownable Asahi Lina
2025-02-03 9:17 ` Alice Ryhl
@ 2025-02-03 9:39 ` Fiona Behrens
2025-02-19 8:46 ` Andreas Hindborg
2 siblings, 0 replies; 67+ messages in thread
From: Fiona Behrens @ 2025-02-03 9:39 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
Asahi Lina <lina@asahilina.net> writes:
> This allows Page references to be returned as borrowed references,
> without necessarily owning the struct page.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
Reviewed-by: Fiona Behrens <me@kloenk.dev>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 3/6] rust: page: Make with_page_mapped() and with_pointer_into_page() public
2025-02-02 13:05 ` [PATCH 3/6] rust: page: Make with_page_mapped() and with_pointer_into_page() public Asahi Lina
2025-02-03 9:10 ` Alice Ryhl
@ 2025-02-03 9:43 ` Fiona Behrens
2025-02-19 8:48 ` Andreas Hindborg
2 siblings, 0 replies; 67+ messages in thread
From: Fiona Behrens @ 2025-02-03 9:43 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
Asahi Lina <lina@asahilina.net> writes:
> Lets users do (unsafe) complex page read/write operations without having
> to repeatedly call into read_raw()/write_raw() (which may be expensive
> in some cases).
>
> The functions themselves are not unsafe, but they do take a closure that
> receives a raw pointer, so actually making the access requires unsafe
> code.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
Reviewed-by: Fiona Behrens <me@kloenk.dev>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 5/6] rust: page: Add physical address conversion functions
2025-02-02 13:05 ` [PATCH 5/6] rust: page: Add physical address conversion functions Asahi Lina
2025-02-03 9:35 ` Alice Ryhl
@ 2025-02-03 9:53 ` Fiona Behrens
2025-02-03 10:01 ` Alice Ryhl
2025-02-19 9:06 ` Andreas Hindborg
2 siblings, 1 reply; 67+ messages in thread
From: Fiona Behrens @ 2025-02-03 9:53 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
Asahi Lina <lina@asahilina.net> writes:
> Add methods to allow code using the Page type to obtain the physical
> address of a page, convert to and from an (owned) physical address, and
> borrow a Page from a physical address. Most of these operations are, as
> you might expect, unsafe.
>
> These primitives are useful to implement page table structures in Rust,
> and to implement arbitrary physical memory access (as needed to walk
> arbitrary page tables and dereference through them). These mechanisms
> are, of course, fraught with danger, and are only expected to be used
> for core memory management code (in e.g. drivers with their own device
> page table implementations) and for debug features such as crash dumps
> of device memory.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
> ---
> rust/helpers/page.c | 26 +++++++++++++++++++++
> rust/kernel/page.rs | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 91 insertions(+)
>
> diff --git a/rust/helpers/page.c b/rust/helpers/page.c
> index b3f2b8fbf87fc9aa89cb1636736c52be16411301..1c3bd68818d77f7ce7806329b8f040a7d4205bb3 100644
> --- a/rust/helpers/page.c
> +++ b/rust/helpers/page.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0
>
> +#include <linux/io.h>
> #include <linux/gfp.h>
> #include <linux/highmem.h>
>
> @@ -17,3 +18,28 @@ void rust_helper_kunmap_local(const void *addr)
> {
> kunmap_local(addr);
> }
> +
> +struct page *rust_helper_phys_to_page(phys_addr_t phys)
> +{
> + return phys_to_page(phys);
> +}
> +
> +phys_addr_t rust_helper_page_to_phys(struct page *page)
> +{
> + return page_to_phys(page);
> +}
> +
> +unsigned long rust_helper_phys_to_pfn(phys_addr_t phys)
> +{
> + return __phys_to_pfn(phys);
> +}
> +
> +struct page *rust_helper_pfn_to_page(unsigned long pfn)
> +{
> + return pfn_to_page(pfn);
> +}
> +
> +bool rust_helper_pfn_valid(unsigned long pfn)
> +{
> + return pfn_valid(pfn);
> +}
> diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
> index fe5f879f9d1a86083fd55c682fad9d52466f79a2..67cd7006fa63ab5aed4c4de2be639ed8e1fbc2ba 100644
> --- a/rust/kernel/page.rs
> +++ b/rust/kernel/page.rs
> @@ -3,6 +3,7 @@
> //! Kernel page allocation and management.
>
> use crate::{
> + addr::*,
> alloc::{AllocError, Flags},
> bindings,
> error::code::*,
> @@ -10,6 +11,7 @@
> types::{Opaque, Ownable, Owned},
> uaccess::UserSliceReader,
> };
> +use core::mem::ManuallyDrop;
> use core::ptr::{self, NonNull};
>
> /// A bitwise shift for the page size.
> @@ -249,6 +251,69 @@ pub unsafe fn copy_from_user_slice_raw(
> reader.read_raw(unsafe { core::slice::from_raw_parts_mut(dst.cast(), len) })
> })
> }
> +
> + /// Returns the physical address of this page.
> + pub fn phys(&self) -> PhysicalAddr {
Rust uses for similar references `as_*` so `as_phys`, would it make
sense to use the same naming format here?
Thanks,
Fiona
> + // SAFETY: `page` is valid due to the type invariants on `Page`.
> + unsafe { bindings::page_to_phys(self.as_ptr()) }
> + }
> +
> + /// Converts a Rust-owned Page into its physical address.
> + ///
> + /// The caller is responsible for calling [`Page::from_phys()`] to avoid leaking memory.
> + pub fn into_phys(this: Owned<Self>) -> PhysicalAddr {
> + ManuallyDrop::new(this).phys()
> + }
> +
> + /// Converts a physical address to a Rust-owned Page.
> + ///
> + /// # Safety
> + /// The caller must ensure that the physical address was previously returned by a call to
> + /// [`Page::into_phys()`], and that the physical address is no longer used after this call,
> + /// nor is [`Page::from_phys()`] called again on it.
> + pub unsafe fn from_phys(phys: PhysicalAddr) -> Owned<Self> {
> + // SAFETY: By the safety requirements, the physical address must be valid and
> + // have come from `into_phys()`, so phys_to_page() cannot fail and
> + // must return the original struct page pointer.
> + unsafe { Owned::from_raw(NonNull::new_unchecked(bindings::phys_to_page(phys)).cast()) }
> + }
> +
> + /// Borrows a Page from a physical address, without taking over ownership.
> + ///
> + /// If the physical address does not have a `struct page` entry or is not
> + /// part of a System RAM region, returns None.
> + ///
> + /// # Safety
> + /// The caller must ensure that the physical address, if it is backed by a `struct page`,
> + /// remains available for the duration of the borrowed lifetime.
> + pub unsafe fn borrow_phys(phys: &PhysicalAddr) -> Option<&Self> {
> + // SAFETY: This is always safe, as it is just arithmetic
> + let pfn = unsafe { bindings::phys_to_pfn(*phys) };
> + // SAFETY: This function is safe to call with any pfn
> + if !unsafe { bindings::pfn_valid(pfn) && bindings::page_is_ram(pfn) != 0 } {
> + None
> + } else {
> + // SAFETY: We have just checked that the pfn is valid above, so it must
> + // have a corresponding struct page. By the safety requirements, we can
> + // return a borrowed reference to it.
> + Some(unsafe { &*(bindings::pfn_to_page(pfn) as *mut Self as *const Self) })
> + }
> + }
> +
> + /// Borrows a Page from a physical address, without taking over ownership
> + /// nor checking for validity.
> + ///
> + /// # Safety
> + /// The caller must ensure that the physical address is backed by a `struct page` and
> + /// corresponds to System RAM. This is true when the address was returned by
> + /// [`Page::into_phys()`].
> + pub unsafe fn borrow_phys_unchecked(phys: &PhysicalAddr) -> &Self {
> + // SAFETY: This is always safe, as it is just arithmetic
> + let pfn = unsafe { bindings::phys_to_pfn(*phys) };
> + // SAFETY: The caller guarantees that the pfn is valid. By the safety
> + // requirements, we can return a borrowed reference to it.
> + unsafe { &*(bindings::pfn_to_page(pfn) as *mut Self as *const Self) }
> + }
> }
>
> // SAFETY: `Owned<Page>` objects returned by Page::alloc_page() follow the requirements of
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-02 13:05 [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Asahi Lina
` (5 preceding siblings ...)
2025-02-02 13:05 ` [PATCH 6/6] rust: page: Make Page::as_ptr() pub(crate) Asahi Lina
@ 2025-02-03 9:58 ` Simona Vetter
2025-02-03 14:32 ` Asahi Lina
2025-02-04 10:33 ` David Hildenbrand
2025-02-03 10:22 ` Danilo Krummrich
7 siblings, 2 replies; 67+ messages in thread
From: Simona Vetter @ 2025-02-03 9:58 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
> This series refactors the existing Page wrapper to support borrowing
> `struct page` objects without ownership on the Rust side, and converting
> page references to/from physical memory addresses.
>
> The series overlaps with the earlier submission in [1] and follows a
> different approach, based on the discussion that happened there.
>
> The primary use case for this is implementing IOMMU-style page table
> management in Rust. This allows drivers for IOMMUs and MMU-containing
> SoC devices to be written in Rust (such as embedded GPUs). The intended
> logic is similar to how ARM SMMU page tables are managed in the
> drivers/iommu tree.
>
> First, introduce a concept of Owned<T> and an Ownable trait. These are
> similar to ARef<T> and AlwaysRefCounted, but are used for types which
> are not ref counted but rather have a single intended owner.
>
> Then, refactor the existing Page support to use the new mechanism. Pages
> returned from the page allocator are not intended to be ref counted by
> consumers (see previous discussion in [1]), so this keeps Rust's view of
> page ownership as a simple "owned or not". Of course, this is still
> composable as Arc<Owned<Page>> if Rust code needs to reference count its
> own Page allocations for whatever reason.
I think there's a bit a potential mess here because the conversion to
folios isn't far enough yet that we can entirely ignore page refcounts and
just use folio refcounts. But I guess we can deal with that oddity if we
hit it (maybe folio conversion moves fast enough), since this only really
starts to become relevant for hmm/svm gpu stuff.
iow I think anticipating the future where struct page really doesn't have
a refcount is the right move. Aside from that it's really not a refcount
that works in the rust ARef sense, since struct page cannot disappear for
system memory, and for dev_pagemap memory it's an entirely different
reference you need (and then there's a few more special cases).
For dma/iommu stuff there's also a push to move towards pfn + metadata
model, so that p2pdma doesn't need struct page. But I haven't looked into
that much yet.
Cheers, Sima
> Then, make some existing private methods public, since this is needed to
> reasonably use allocated pages as IOMMU page tables.
>
> Along the way we also add a small module to represent a core kernel
> address types (PhysicalAddr, DmaAddr, ResourceSize, Pfn). In the future,
> this might grow with helpers to make address math safer and more
> Rust-like.
>
> Finally, add methods to:
> - Get a page's physical address
> - Convert an owned Page into its physical address
> - Convert a physical address back to its owned Page
> - Borrow a Page from a physical address, in both checked (with checks
> that a struct page exists and is accessible as regular RAM) and not
> checked forms (useful when the user knows the physaddr is valid,
> for example because it got it from Page::into_phys()).
>
> Of course, all but the first two have to be `unsafe` by nature, but that
> comes with the territory of writing low level memory management code.
>
> These methods allow page table code to know the physical address of
> pages (needed to build intermediate level PTEs) and to essentially
> transfer ownership of the pages into the page table structure itself,
> and back into Page objects when freeing page tables. Without that, the
> code would have to keep track of page allocations in duplicate, once in
> Rust code and once in the page table structure itself, which is less
> desirable.
>
> For Apple GPUs, the address space shared between firmware and the driver
> is actually pre-allocated by the bootloader, with the top level page
> table already pre-allocated, and the firmware owning some PTEs within it
> while the kernel populates others. This cooperation works well when the
> kernel can reference this top level page table by physical address. The
> only thing the driver needs to ensure is that it never attempts to free
> it in this case, nor the page tables corresponding to virtual address
> ranges it doesn't own. Without the ability to just borrow the
> pre-allocated top level page and access it, the driver would have to
> special-case this and manually manage the top level PTEs outside the
> main page table code, as well as introduce different page table
> configurations with different numbers of levels so the kernel's view is
> one lever shallower.
>
> The physical address borrow feature is also useful to generate virtual
> address space dumps for crash dumps, including firmware pages. The
> intent is that firmware pages are configured in the Device Tree as
> reserved System RAM (without no-map), which creates struct page objects
> for them and makes them available in the kernel's direct map. Then the
> driver's page table code can walk the page tables and make a snapshot of
> the entire address space, including firmware code and data pages,
> pre-allocated shared segments, and driver-allocated objects (which are
> GEM objects), again without special casing anything. The checks in
> `Page::borrow_phys()` should ensure that the page is safe to access as
> RAM, so this will skip MMIO pages and anything that wasn't declared to
> the kernel in the DT.
>
> Example usage:
> https://github.com/AsahiLinux/linux/blob/gpu/rust-wip/drivers/gpu/drm/asahi/pgtable.rs
>
> The last patch is a minor cleanup to the Page abstraction noticed while
> preparing this series.
>
> [1] https://lore.kernel.org/lkml/20241119112408.779243-1-abdiel.janulgue@gmail.com/T/#u
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
> ---
> Asahi Lina (6):
> rust: types: Add Ownable/Owned types
> rust: page: Convert to Ownable
> rust: page: Make with_page_mapped() and with_pointer_into_page() public
> rust: addr: Add a module to declare core address types
> rust: page: Add physical address conversion functions
> rust: page: Make Page::as_ptr() pub(crate)
>
> rust/helpers/page.c | 26 ++++++++++++
> rust/kernel/addr.rs | 15 +++++++
> rust/kernel/lib.rs | 1 +
> rust/kernel/page.rs | 101 ++++++++++++++++++++++++++++++++++++++--------
> rust/kernel/types.rs | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 236 insertions(+), 17 deletions(-)
> ---
> base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
> change-id: 20250202-rust-page-80892069fc78
>
> Cheers,
> ~~ Lina
>
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 5/6] rust: page: Add physical address conversion functions
2025-02-03 9:53 ` Fiona Behrens
@ 2025-02-03 10:01 ` Alice Ryhl
0 siblings, 0 replies; 67+ messages in thread
From: Alice Ryhl @ 2025-02-03 10:01 UTC (permalink / raw)
To: Fiona Behrens
Cc: Asahi Lina, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Mon, Feb 3, 2025 at 10:54 AM Fiona Behrens <me@kloenk.dev> wrote:
>
> Asahi Lina <lina@asahilina.net> writes:
>
> > Add methods to allow code using the Page type to obtain the physical
> > address of a page, convert to and from an (owned) physical address, and
> > borrow a Page from a physical address. Most of these operations are, as
> > you might expect, unsafe.
> >
> > These primitives are useful to implement page table structures in Rust,
> > and to implement arbitrary physical memory access (as needed to walk
> > arbitrary page tables and dereference through them). These mechanisms
> > are, of course, fraught with danger, and are only expected to be used
> > for core memory management code (in e.g. drivers with their own device
> > page table implementations) and for debug features such as crash dumps
> > of device memory.
> >
> > Signed-off-by: Asahi Lina <lina@asahilina.net>
> > ---
> > rust/helpers/page.c | 26 +++++++++++++++++++++
> > rust/kernel/page.rs | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 91 insertions(+)
> >
> > diff --git a/rust/helpers/page.c b/rust/helpers/page.c
> > index b3f2b8fbf87fc9aa89cb1636736c52be16411301..1c3bd68818d77f7ce7806329b8f040a7d4205bb3 100644
> > --- a/rust/helpers/page.c
> > +++ b/rust/helpers/page.c
> > @@ -1,5 +1,6 @@
> > // SPDX-License-Identifier: GPL-2.0
> >
> > +#include <linux/io.h>
> > #include <linux/gfp.h>
> > #include <linux/highmem.h>
> >
> > @@ -17,3 +18,28 @@ void rust_helper_kunmap_local(const void *addr)
> > {
> > kunmap_local(addr);
> > }
> > +
> > +struct page *rust_helper_phys_to_page(phys_addr_t phys)
> > +{
> > + return phys_to_page(phys);
> > +}
> > +
> > +phys_addr_t rust_helper_page_to_phys(struct page *page)
> > +{
> > + return page_to_phys(page);
> > +}
> > +
> > +unsigned long rust_helper_phys_to_pfn(phys_addr_t phys)
> > +{
> > + return __phys_to_pfn(phys);
> > +}
> > +
> > +struct page *rust_helper_pfn_to_page(unsigned long pfn)
> > +{
> > + return pfn_to_page(pfn);
> > +}
> > +
> > +bool rust_helper_pfn_valid(unsigned long pfn)
> > +{
> > + return pfn_valid(pfn);
> > +}
> > diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
> > index fe5f879f9d1a86083fd55c682fad9d52466f79a2..67cd7006fa63ab5aed4c4de2be639ed8e1fbc2ba 100644
> > --- a/rust/kernel/page.rs
> > +++ b/rust/kernel/page.rs
> > @@ -3,6 +3,7 @@
> > //! Kernel page allocation and management.
> >
> > use crate::{
> > + addr::*,
> > alloc::{AllocError, Flags},
> > bindings,
> > error::code::*,
> > @@ -10,6 +11,7 @@
> > types::{Opaque, Ownable, Owned},
> > uaccess::UserSliceReader,
> > };
> > +use core::mem::ManuallyDrop;
> > use core::ptr::{self, NonNull};
> >
> > /// A bitwise shift for the page size.
> > @@ -249,6 +251,69 @@ pub unsafe fn copy_from_user_slice_raw(
> > reader.read_raw(unsafe { core::slice::from_raw_parts_mut(dst.cast(), len) })
> > })
> > }
> > +
> > + /// Returns the physical address of this page.
> > + pub fn phys(&self) -> PhysicalAddr {
>
> Rust uses for similar references `as_*` so `as_phys`, would it make
> sense to use the same naming format here?
The as_ prefix is only used when returning a borrow, which is not the
case here. I think not having a prefix is correct in this case.
https://rust-lang.github.io/api-guidelines/naming.html#ad-hoc-conversions-follow-as_-to_-into_-conventions-c-conv
Alice
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-02 13:05 [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Asahi Lina
` (6 preceding siblings ...)
2025-02-03 9:58 ` [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Simona Vetter
@ 2025-02-03 10:22 ` Danilo Krummrich
2025-02-03 14:41 ` Asahi Lina
7 siblings, 1 reply; 67+ messages in thread
From: Danilo Krummrich @ 2025-02-03 10:22 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
Hi Lina,
On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
> This series refactors the existing Page wrapper to support borrowing
> `struct page` objects without ownership on the Rust side, and converting
> page references to/from physical memory addresses.
Thanks for picking this up!
As you know, this has been previously worked on by Abdiel. Kindly drop a note
if you intend to pick up something up next time, such that we don't end up doing
duplicated work.
- Danilo
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 1/6] rust: types: Add Ownable/Owned types
2025-02-03 9:13 ` Alice Ryhl
@ 2025-02-03 14:17 ` Asahi Lina
2025-02-03 18:17 ` Alice Ryhl
0 siblings, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-03 14:17 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On 2/3/25 6:13 PM, Alice Ryhl wrote:
> On Sun, Feb 2, 2025 at 2:06 PM Asahi Lina <lina@asahilina.net> wrote:
>>
>> By analogy to AlwaysRefCounted and ARef, an Ownable type is a (typically
>> C FFI) type that *may* be owned by Rust, but need not be. Unlike
>> AlwaysRefCounted, this mechanism expects the reference to be unique
>> within Rust, and does not allow cloning.
>>
>> Conceptually, this is similar to a KBox<T>, except that it delegates
>> resource management to the T instead of using a generic allocator.
>>
>> Signed-off-by: Asahi Lina <lina@asahilina.net>
>
> Overall LGTM.
>
>> +/// A subtrait of Ownable that asserts that an `Owned<T>` Rust reference is not only unique
>> +/// within Rust and keeps the `T` alive, but also guarantees that the C code follows the
>> +/// usual mutable reference requirements. That is, the kernel will never mutate the
>> +/// `T` (excluding internal mutability that follows the usual rules) while Rust owns it.
>> +///
>> +/// When this type is implemented for an [`Ownable`] type, it allows `Owned<T>` to be
>> +/// dereferenced into a &mut T.
>> +///
>> +/// # Safety
>> +///
>> +/// Implementers must ensure that the kernel never mutates the underlying type while
>> +/// Rust owns it.
>> +pub unsafe trait OwnableMut: Ownable {}
>
> Giving out mutable references allows users to call core::mem::swap on
> the object. We must require that this is allowed.
Hmm, yeah. I don't use this yet, and I'm not sure if it makes much sense
with that caveat. I'll drop it for v2.
>
>> +impl<T: Ownable> Owned<T> {
>> + /// Creates a new instance of [`Owned`].
>> + ///
>> + /// It takes over ownership of the underlying object.
>> + ///
>> + /// # Safety
>> + ///
>> + /// Callers must ensure that the underlying object is acquired and can be considered owned by
>> + /// Rust.
>> + pub(crate) unsafe fn from_raw(ptr: NonNull<T>) -> Self {
>> + // INVARIANT: The safety requirements guarantee that the new instance now owns the
>> + // reference.
>> + Self {
>> + ptr,
>> + _p: PhantomData,
>> + }
>> + }
>> +
>> + /// Consumes the `Owned`, returning a raw pointer.
>> + ///
>> + /// This function does not actually relinquish ownership of the object.
>> + /// After calling this function, the caller is responsible for ownership previously managed
>> + /// by the `Owned`.
>> + #[allow(dead_code)]
>> + pub(crate) fn into_raw(me: Self) -> NonNull<T> {
>
> I would just make these methods public, like the ARef ones. Then you
> can drop the #[allow(dead_code)] annotation.
Does it make sense to ever have drivers doing this? I feel like these
methods should be limited to the kernel crate.
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-03 9:58 ` [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Simona Vetter
@ 2025-02-03 14:32 ` Asahi Lina
2025-02-03 21:05 ` Zi Yan
2025-02-04 10:33 ` David Hildenbrand
1 sibling, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-03 14:32 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On 2/3/25 6:58 PM, Simona Vetter wrote:
> On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
>> This series refactors the existing Page wrapper to support borrowing
>> `struct page` objects without ownership on the Rust side, and converting
>> page references to/from physical memory addresses.
>>
>> The series overlaps with the earlier submission in [1] and follows a
>> different approach, based on the discussion that happened there.
>>
>> The primary use case for this is implementing IOMMU-style page table
>> management in Rust. This allows drivers for IOMMUs and MMU-containing
>> SoC devices to be written in Rust (such as embedded GPUs). The intended
>> logic is similar to how ARM SMMU page tables are managed in the
>> drivers/iommu tree.
>>
>> First, introduce a concept of Owned<T> and an Ownable trait. These are
>> similar to ARef<T> and AlwaysRefCounted, but are used for types which
>> are not ref counted but rather have a single intended owner.
>>
>> Then, refactor the existing Page support to use the new mechanism. Pages
>> returned from the page allocator are not intended to be ref counted by
>> consumers (see previous discussion in [1]), so this keeps Rust's view of
>> page ownership as a simple "owned or not". Of course, this is still
>> composable as Arc<Owned<Page>> if Rust code needs to reference count its
>> own Page allocations for whatever reason.
>
> I think there's a bit a potential mess here because the conversion to
> folios isn't far enough yet that we can entirely ignore page refcounts and
> just use folio refcounts. But I guess we can deal with that oddity if we
> hit it (maybe folio conversion moves fast enough), since this only really
> starts to become relevant for hmm/svm gpu stuff.
>
> iow I think anticipating the future where struct page really doesn't have
> a refcount is the right move. Aside from that it's really not a refcount
> that works in the rust ARef sense, since struct page cannot disappear for
> system memory, and for dev_pagemap memory it's an entirely different
> reference you need (and then there's a few more special cases).
Right, as far as this abstraction is concerned, all that needs to hold
is that:
- alloc_pages() and __free_pages() work as intended, however that may
be, to reserve and return one page (for now, though I think extending
the Rust abstraction to handle higher-order folios is pretty easy, but
that can happen later).
- Whatever borrows pages knows what it's doing. In this case there's
only support for borrowing pages by physaddr, and it's only going to be
used in a driver for a platform without memory hot remove (so far) and
only for pages which have known usage (in principle) and are either
explicitly allocated or known pinned or reserved, so it's not a problem
right now. Future abstractions that return borrowed pages can do their
own locking/bookkeeping/whatever is necessary to keep it safe.
I would like to hear how memory hot-remove is supposed to work though,
to see if we should be doing something to make the abstraction safer
(though it's still unsafe and always will be). Is there a chance a
`struct page` could vanish out from under us under some conditions?
For dev_pagemap memory I imagine we'd have an entirely different
abstraction wrapping that, that can just return a borrowed &Page to give
the user access to page operations without going through Owned<Page>.
> For dma/iommu stuff there's also a push to move towards pfn + metadata
> model, so that p2pdma doesn't need struct page. But I haven't looked into
> that much yet.
Yeah, I don't know how that stuff works...
>
> Cheers, Sima
>
>> Then, make some existing private methods public, since this is needed to
>> reasonably use allocated pages as IOMMU page tables.
>>
>> Along the way we also add a small module to represent a core kernel
>> address types (PhysicalAddr, DmaAddr, ResourceSize, Pfn). In the future,
>> this might grow with helpers to make address math safer and more
>> Rust-like.
>>
>> Finally, add methods to:
>> - Get a page's physical address
>> - Convert an owned Page into its physical address
>> - Convert a physical address back to its owned Page
>> - Borrow a Page from a physical address, in both checked (with checks
>> that a struct page exists and is accessible as regular RAM) and not
>> checked forms (useful when the user knows the physaddr is valid,
>> for example because it got it from Page::into_phys()).
>>
>> Of course, all but the first two have to be `unsafe` by nature, but that
>> comes with the territory of writing low level memory management code.
>>
>> These methods allow page table code to know the physical address of
>> pages (needed to build intermediate level PTEs) and to essentially
>> transfer ownership of the pages into the page table structure itself,
>> and back into Page objects when freeing page tables. Without that, the
>> code would have to keep track of page allocations in duplicate, once in
>> Rust code and once in the page table structure itself, which is less
>> desirable.
>>
>> For Apple GPUs, the address space shared between firmware and the driver
>> is actually pre-allocated by the bootloader, with the top level page
>> table already pre-allocated, and the firmware owning some PTEs within it
>> while the kernel populates others. This cooperation works well when the
>> kernel can reference this top level page table by physical address. The
>> only thing the driver needs to ensure is that it never attempts to free
>> it in this case, nor the page tables corresponding to virtual address
>> ranges it doesn't own. Without the ability to just borrow the
>> pre-allocated top level page and access it, the driver would have to
>> special-case this and manually manage the top level PTEs outside the
>> main page table code, as well as introduce different page table
>> configurations with different numbers of levels so the kernel's view is
>> one lever shallower.
>>
>> The physical address borrow feature is also useful to generate virtual
>> address space dumps for crash dumps, including firmware pages. The
>> intent is that firmware pages are configured in the Device Tree as
>> reserved System RAM (without no-map), which creates struct page objects
>> for them and makes them available in the kernel's direct map. Then the
>> driver's page table code can walk the page tables and make a snapshot of
>> the entire address space, including firmware code and data pages,
>> pre-allocated shared segments, and driver-allocated objects (which are
>> GEM objects), again without special casing anything. The checks in
>> `Page::borrow_phys()` should ensure that the page is safe to access as
>> RAM, so this will skip MMIO pages and anything that wasn't declared to
>> the kernel in the DT.
>>
>> Example usage:
>> https://github.com/AsahiLinux/linux/blob/gpu/rust-wip/drivers/gpu/drm/asahi/pgtable.rs
>>
>> The last patch is a minor cleanup to the Page abstraction noticed while
>> preparing this series.
>>
>> [1] https://lore.kernel.org/lkml/20241119112408.779243-1-abdiel.janulgue@gmail.com/T/#u
>>
>> Signed-off-by: Asahi Lina <lina@asahilina.net>
>> ---
>> Asahi Lina (6):
>> rust: types: Add Ownable/Owned types
>> rust: page: Convert to Ownable
>> rust: page: Make with_page_mapped() and with_pointer_into_page() public
>> rust: addr: Add a module to declare core address types
>> rust: page: Add physical address conversion functions
>> rust: page: Make Page::as_ptr() pub(crate)
>>
>> rust/helpers/page.c | 26 ++++++++++++
>> rust/kernel/addr.rs | 15 +++++++
>> rust/kernel/lib.rs | 1 +
>> rust/kernel/page.rs | 101 ++++++++++++++++++++++++++++++++++++++--------
>> rust/kernel/types.rs | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++
>> 5 files changed, 236 insertions(+), 17 deletions(-)
>> ---
>> base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
>> change-id: 20250202-rust-page-80892069fc78
>>
>> Cheers,
>> ~~ Lina
>>
>
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-03 10:22 ` Danilo Krummrich
@ 2025-02-03 14:41 ` Asahi Lina
2025-02-15 19:47 ` Asahi Lina
0 siblings, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-03 14:41 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
On 2/3/25 7:22 PM, Danilo Krummrich wrote:
> Hi Lina,
>
> On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
>> This series refactors the existing Page wrapper to support borrowing
>> `struct page` objects without ownership on the Rust side, and converting
>> page references to/from physical memory addresses.
>
> Thanks for picking this up!
>
> As you know, this has been previously worked on by Abdiel. Kindly drop a note
> if you intend to pick up something up next time, such that we don't end up doing
> duplicated work.
Sorry! I wasn't sure if this was going to end up submitted over the
holidays so I wasn't in a rush, but I ended up switching gears in the
past couple of weeks and I really needed this feature now (for crashdump
support to debug a really annoying firmware crash).
I actually only just noticed that the previous v2 already had
Owned/Ownable... I only saw the v3 which didn't, so I didn't realize
there was already a version of that approach in the past.
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 4/6] rust: addr: Add a module to declare core address types
2025-02-02 13:05 ` [PATCH 4/6] rust: addr: Add a module to declare core address types Asahi Lina
2025-02-03 9:09 ` Alice Ryhl
@ 2025-02-03 15:04 ` Boqun Feng
2025-02-04 11:50 ` Asahi Lina
2025-02-19 8:51 ` Andreas Hindborg
2 siblings, 1 reply; 67+ messages in thread
From: Boqun Feng @ 2025-02-03 15:04 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
On Sun, Feb 02, 2025 at 10:05:46PM +0900, Asahi Lina wrote:
> Encapsulates the core physical/DMA address types, so they can be used by
> Rust abstractions.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
> ---
> rust/kernel/addr.rs | 15 +++++++++++++++
> rust/kernel/lib.rs | 1 +
> 2 files changed, 16 insertions(+)
>
> diff --git a/rust/kernel/addr.rs b/rust/kernel/addr.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..06aff10a0332355597060c5518d7fd6e114cf630
> --- /dev/null
> +++ b/rust/kernel/addr.rs
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Kernel core address types.
> +
> +use bindings;
> +use core::ffi;
I think we should use crate::ffi here. See:
d072acda4862 "rust: use custom FFI integer types"
> +
> +/// A physical memory address (which may be wider than the CPU pointer size)
> +pub type PhysicalAddr = bindings::phys_addr_t;
> +/// A DMA memory address (which may be narrower than `PhysicalAddr` on some systems)
> +pub type DmaAddr = bindings::dma_addr_t;
> +/// A physical resource size, typically the same width as `PhysicalAddr`
> +pub type ResourceSize = bindings::resource_size_t;
> +/// A raw page frame number, not to be confused with the C `pfn_t` which also encodes flags.
> +pub type Pfn = ffi::c_ulong;
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index e1065a7551a39e68d6379031d80d4be336e652a3..eb1a80ba8ff83ab2d1b3b1d11fed4fb704c7a4f5 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -29,6 +29,7 @@
>
> pub use ffi;
>
> +pub mod addr;
I want to share my worry on this `pub mod` list in kernel/lib.rs may get
too long ;-)
I was about to suggest putting `addr` and `page` into `mm` after Alice's
patchset merged, however, seems that `mm` mod only cover userspace
memory management (which is not my impression of what "mm" in kernel
development), thoughts? Alice, do you think we should extend `mm` mod to
contain all memory management mods?
Regards,
Boqun
> pub mod alloc;
> #[cfg(CONFIG_BLOCK)]
> pub mod block;
>
> --
> 2.47.1
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 1/6] rust: types: Add Ownable/Owned types
2025-02-03 14:17 ` Asahi Lina
@ 2025-02-03 18:17 ` Alice Ryhl
2025-02-03 19:17 ` Asahi Lina
0 siblings, 1 reply; 67+ messages in thread
From: Alice Ryhl @ 2025-02-03 18:17 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Mon, Feb 3, 2025 at 3:17 PM Asahi Lina <lina@asahilina.net> wrote:
>
>
>
> On 2/3/25 6:13 PM, Alice Ryhl wrote:
> > On Sun, Feb 2, 2025 at 2:06 PM Asahi Lina <lina@asahilina.net> wrote:
> >> + /// Consumes the `Owned`, returning a raw pointer.
> >> + ///
> >> + /// This function does not actually relinquish ownership of the object.
> >> + /// After calling this function, the caller is responsible for ownership previously managed
> >> + /// by the `Owned`.
> >> + #[allow(dead_code)]
> >> + pub(crate) fn into_raw(me: Self) -> NonNull<T> {
> >
> > I would just make these methods public, like the ARef ones. Then you
> > can drop the #[allow(dead_code)] annotation.
>
> Does it make sense to ever have drivers doing this? I feel like these
> methods should be limited to the kernel crate.
Not having drivers use this is the ideal, but I don't think we should
always expect it to be possible. The Binder driver has a C component
for the binderfs component, and it also has some code that's
essentially an abstraction inside the driver that I was asked to move
into Binder because it's so specific to Binder that it's not useful
for anyone else.
Alice
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 1/6] rust: types: Add Ownable/Owned types
2025-02-03 18:17 ` Alice Ryhl
@ 2025-02-03 19:17 ` Asahi Lina
2025-02-19 8:34 ` Andreas Hindborg
0 siblings, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-03 19:17 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On 2/4/25 3:17 AM, Alice Ryhl wrote:
> On Mon, Feb 3, 2025 at 3:17 PM Asahi Lina <lina@asahilina.net> wrote:
>>
>>
>>
>> On 2/3/25 6:13 PM, Alice Ryhl wrote:
>>> On Sun, Feb 2, 2025 at 2:06 PM Asahi Lina <lina@asahilina.net> wrote:
>>>> + /// Consumes the `Owned`, returning a raw pointer.
>>>> + ///
>>>> + /// This function does not actually relinquish ownership of the object.
>>>> + /// After calling this function, the caller is responsible for ownership previously managed
>>>> + /// by the `Owned`.
>>>> + #[allow(dead_code)]
>>>> + pub(crate) fn into_raw(me: Self) -> NonNull<T> {
>>>
>>> I would just make these methods public, like the ARef ones. Then you
>>> can drop the #[allow(dead_code)] annotation.
>>
>> Does it make sense to ever have drivers doing this? I feel like these
>> methods should be limited to the kernel crate.
>
> Not having drivers use this is the ideal, but I don't think we should
> always expect it to be possible. The Binder driver has a C component
> for the binderfs component, and it also has some code that's
> essentially an abstraction inside the driver that I was asked to move
> into Binder because it's so specific to Binder that it's not useful
> for anyone else.
That's fair, I'll make it pub.
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-03 14:32 ` Asahi Lina
@ 2025-02-03 21:05 ` Zi Yan
2025-02-04 10:26 ` David Hildenbrand
0 siblings, 1 reply; 67+ messages in thread
From: Zi Yan @ 2025-02-03 21:05 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, David Hildenbrand, Oscar Salvador,
Muchun Song
On 3 Feb 2025, at 9:32, Asahi Lina wrote:
> On 2/3/25 6:58 PM, Simona Vetter wrote:
>> On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
>>> This series refactors the existing Page wrapper to support borrowing
>>> `struct page` objects without ownership on the Rust side, and converting
>>> page references to/from physical memory addresses.
>>>
>>> The series overlaps with the earlier submission in [1] and follows a
>>> different approach, based on the discussion that happened there.
>>>
>>> The primary use case for this is implementing IOMMU-style page table
>>> management in Rust. This allows drivers for IOMMUs and MMU-containing
>>> SoC devices to be written in Rust (such as embedded GPUs). The intended
>>> logic is similar to how ARM SMMU page tables are managed in the
>>> drivers/iommu tree.
>>>
>>> First, introduce a concept of Owned<T> and an Ownable trait. These are
>>> similar to ARef<T> and AlwaysRefCounted, but are used for types which
>>> are not ref counted but rather have a single intended owner.
>>>
>>> Then, refactor the existing Page support to use the new mechanism. Pages
>>> returned from the page allocator are not intended to be ref counted by
>>> consumers (see previous discussion in [1]), so this keeps Rust's view of
>>> page ownership as a simple "owned or not". Of course, this is still
>>> composable as Arc<Owned<Page>> if Rust code needs to reference count its
>>> own Page allocations for whatever reason.
>>
>> I think there's a bit a potential mess here because the conversion to
>> folios isn't far enough yet that we can entirely ignore page refcounts and
>> just use folio refcounts. But I guess we can deal with that oddity if we
>> hit it (maybe folio conversion moves fast enough), since this only really
>> starts to become relevant for hmm/svm gpu stuff.
>>
>> iow I think anticipating the future where struct page really doesn't have
>> a refcount is the right move. Aside from that it's really not a refcount
>> that works in the rust ARef sense, since struct page cannot disappear for
>> system memory, and for dev_pagemap memory it's an entirely different
>> reference you need (and then there's a few more special cases).
>
> Right, as far as this abstraction is concerned, all that needs to hold
> is that:
>
> - alloc_pages() and __free_pages() work as intended, however that may
> be, to reserve and return one page (for now, though I think extending
> the Rust abstraction to handle higher-order folios is pretty easy, but
> that can happen later).
> - Whatever borrows pages knows what it's doing. In this case there's
> only support for borrowing pages by physaddr, and it's only going to be
> used in a driver for a platform without memory hot remove (so far) and
> only for pages which have known usage (in principle) and are either
> explicitly allocated or known pinned or reserved, so it's not a problem
> right now. Future abstractions that return borrowed pages can do their
> own locking/bookkeeping/whatever is necessary to keep it safe.
>
> I would like to hear how memory hot-remove is supposed to work though,
> to see if we should be doing something to make the abstraction safer
> (though it's still unsafe and always will be). Is there a chance a
> `struct page` could vanish out from under us under some conditions?
Add DavidH and OscarS for memory hot-remove questions.
IIUC, struct page could be freed if a chunk of memory is hot-removed.
Another case struct page can be freed is when hugetlb vmemmap optimization
is used. Muchun (cc'd) is the maintainer of hugetlbfs.
>
> For dev_pagemap memory I imagine we'd have an entirely different
> abstraction wrapping that, that can just return a borrowed &Page to give
> the user access to page operations without going through Owned<Page>.
>
>> For dma/iommu stuff there's also a push to move towards pfn + metadata
>> model, so that p2pdma doesn't need struct page. But I haven't looked into
>> that much yet.
>
> Yeah, I don't know how that stuff works...
>
>>
>> Cheers, Sima
>>
>>> Then, make some existing private methods public, since this is needed to
>>> reasonably use allocated pages as IOMMU page tables.
>>>
>>> Along the way we also add a small module to represent a core kernel
>>> address types (PhysicalAddr, DmaAddr, ResourceSize, Pfn). In the future,
>>> this might grow with helpers to make address math safer and more
>>> Rust-like.
>>>
>>> Finally, add methods to:
>>> - Get a page's physical address
>>> - Convert an owned Page into its physical address
>>> - Convert a physical address back to its owned Page
>>> - Borrow a Page from a physical address, in both checked (with checks
>>> that a struct page exists and is accessible as regular RAM) and not
>>> checked forms (useful when the user knows the physaddr is valid,
>>> for example because it got it from Page::into_phys()).
>>>
>>> Of course, all but the first two have to be `unsafe` by nature, but that
>>> comes with the territory of writing low level memory management code.
>>>
>>> These methods allow page table code to know the physical address of
>>> pages (needed to build intermediate level PTEs) and to essentially
>>> transfer ownership of the pages into the page table structure itself,
>>> and back into Page objects when freeing page tables. Without that, the
>>> code would have to keep track of page allocations in duplicate, once in
>>> Rust code and once in the page table structure itself, which is less
>>> desirable.
>>>
>>> For Apple GPUs, the address space shared between firmware and the driver
>>> is actually pre-allocated by the bootloader, with the top level page
>>> table already pre-allocated, and the firmware owning some PTEs within it
>>> while the kernel populates others. This cooperation works well when the
>>> kernel can reference this top level page table by physical address. The
>>> only thing the driver needs to ensure is that it never attempts to free
>>> it in this case, nor the page tables corresponding to virtual address
>>> ranges it doesn't own. Without the ability to just borrow the
>>> pre-allocated top level page and access it, the driver would have to
>>> special-case this and manually manage the top level PTEs outside the
>>> main page table code, as well as introduce different page table
>>> configurations with different numbers of levels so the kernel's view is
>>> one lever shallower.
>>>
>>> The physical address borrow feature is also useful to generate virtual
>>> address space dumps for crash dumps, including firmware pages. The
>>> intent is that firmware pages are configured in the Device Tree as
>>> reserved System RAM (without no-map), which creates struct page objects
>>> for them and makes them available in the kernel's direct map. Then the
>>> driver's page table code can walk the page tables and make a snapshot of
>>> the entire address space, including firmware code and data pages,
>>> pre-allocated shared segments, and driver-allocated objects (which are
>>> GEM objects), again without special casing anything. The checks in
>>> `Page::borrow_phys()` should ensure that the page is safe to access as
>>> RAM, so this will skip MMIO pages and anything that wasn't declared to
>>> the kernel in the DT.
>>>
>>> Example usage:
>>> https://github.com/AsahiLinux/linux/blob/gpu/rust-wip/drivers/gpu/drm/asahi/pgtable.rs
>>>
>>> The last patch is a minor cleanup to the Page abstraction noticed while
>>> preparing this series.
>>>
>>> [1] https://lore.kernel.org/lkml/20241119112408.779243-1-abdiel.janulgue@gmail.com/T/#u
>>>
>>> Signed-off-by: Asahi Lina <lina@asahilina.net>
>>> ---
>>> Asahi Lina (6):
>>> rust: types: Add Ownable/Owned types
>>> rust: page: Convert to Ownable
>>> rust: page: Make with_page_mapped() and with_pointer_into_page() public
>>> rust: addr: Add a module to declare core address types
>>> rust: page: Add physical address conversion functions
>>> rust: page: Make Page::as_ptr() pub(crate)
>>>
>>> rust/helpers/page.c | 26 ++++++++++++
>>> rust/kernel/addr.rs | 15 +++++++
>>> rust/kernel/lib.rs | 1 +
>>> rust/kernel/page.rs | 101 ++++++++++++++++++++++++++++++++++++++--------
>>> rust/kernel/types.rs | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 5 files changed, 236 insertions(+), 17 deletions(-)
>>> ---
>>> base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
>>> change-id: 20250202-rust-page-80892069fc78
>>>
>>> Cheers,
>>> ~~ Lina
>>>
>>
>
> ~~ Lina
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-03 21:05 ` Zi Yan
@ 2025-02-04 10:26 ` David Hildenbrand
2025-02-04 11:41 ` Asahi Lina
0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-02-04 10:26 UTC (permalink / raw)
To: Zi Yan, Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
On 03.02.25 22:05, Zi Yan wrote:
> On 3 Feb 2025, at 9:32, Asahi Lina wrote:
>
>> On 2/3/25 6:58 PM, Simona Vetter wrote:
>>> On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
>>>> This series refactors the existing Page wrapper to support borrowing
>>>> `struct page` objects without ownership on the Rust side, and converting
>>>> page references to/from physical memory addresses.
>>>>
>>>> The series overlaps with the earlier submission in [1] and follows a
>>>> different approach, based on the discussion that happened there.
>>>>
>>>> The primary use case for this is implementing IOMMU-style page table
>>>> management in Rust. This allows drivers for IOMMUs and MMU-containing
>>>> SoC devices to be written in Rust (such as embedded GPUs). The intended
>>>> logic is similar to how ARM SMMU page tables are managed in the
>>>> drivers/iommu tree.
>>>>
>>>> First, introduce a concept of Owned<T> and an Ownable trait. These are
>>>> similar to ARef<T> and AlwaysRefCounted, but are used for types which
>>>> are not ref counted but rather have a single intended owner.
>>>>
>>>> Then, refactor the existing Page support to use the new mechanism. Pages
>>>> returned from the page allocator are not intended to be ref counted by
>>>> consumers (see previous discussion in [1]), so this keeps Rust's view of
>>>> page ownership as a simple "owned or not". Of course, this is still
>>>> composable as Arc<Owned<Page>> if Rust code needs to reference count its
>>>> own Page allocations for whatever reason.
>>>
>>> I think there's a bit a potential mess here because the conversion to
>>> folios isn't far enough yet that we can entirely ignore page refcounts and
>>> just use folio refcounts. But I guess we can deal with that oddity if we
>>> hit it (maybe folio conversion moves fast enough), since this only really
>>> starts to become relevant for hmm/svm gpu stuff.
>>>
>>> iow I think anticipating the future where struct page really doesn't have
>>> a refcount is the right move. Aside from that it's really not a refcount
>>> that works in the rust ARef sense, since struct page cannot disappear for
>>> system memory, and for dev_pagemap memory it's an entirely different
>>> reference you need (and then there's a few more special cases).
>>
>> Right, as far as this abstraction is concerned, all that needs to hold
>> is that:
>>
>> - alloc_pages() and __free_pages() work as intended, however that may
>> be, to reserve and return one page (for now, though I think extending
>> the Rust abstraction to handle higher-order folios is pretty easy, but
>> that can happen later).
>> - Whatever borrows pages knows what it's doing. In this case there's
>> only support for borrowing pages by physaddr, and it's only going to be
>> used in a driver for a platform without memory hot remove (so far) and
>> only for pages which have known usage (in principle) and are either
>> explicitly allocated or known pinned or reserved, so it's not a problem
>> right now. Future abstractions that return borrowed pages can do their
>> own locking/bookkeeping/whatever is necessary to keep it safe.
>>
>> I would like to hear how memory hot-remove is supposed to work though,
>> to see if we should be doing something to make the abstraction safer
>> (though it's still unsafe and always will be). Is there a chance a
>> `struct page` could vanish out from under us under some conditions?
>
> Add DavidH and OscarS for memory hot-remove questions.
>
> IIUC, struct page could be freed if a chunk of memory is hot-removed.
Right, but only after there are no users anymore (IOW, memory was freed
back to the buddy). PFN walkers might still stumble over them, but I
would not expect (or recommend) rust to do that.
>
> Another case struct page can be freed is when hugetlb vmemmap optimization
> is used. Muchun (cc'd) is the maintainer of hugetlbfs.
Here, the "struct page" remains valid though; it can still be accessed,
although we disallow writes (which would be wrong).
If you only allocate a page and free it later, there is no need to worry
about either on the rust side.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-03 9:58 ` [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Simona Vetter
2025-02-03 14:32 ` Asahi Lina
@ 2025-02-04 10:33 ` David Hildenbrand
2025-02-04 18:39 ` Jason Gunthorpe
1 sibling, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-02-04 10:33 UTC (permalink / raw)
To: Asahi Lina, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On 03.02.25 10:58, Simona Vetter wrote:
> On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
>> This series refactors the existing Page wrapper to support borrowing
>> `struct page` objects without ownership on the Rust side, and converting
>> page references to/from physical memory addresses.
>>
>> The series overlaps with the earlier submission in [1] and follows a
>> different approach, based on the discussion that happened there.
>>
>> The primary use case for this is implementing IOMMU-style page table
>> management in Rust. This allows drivers for IOMMUs and MMU-containing
>> SoC devices to be written in Rust (such as embedded GPUs). The intended
>> logic is similar to how ARM SMMU page tables are managed in the
>> drivers/iommu tree.
>>
>> First, introduce a concept of Owned<T> and an Ownable trait. These are
>> similar to ARef<T> and AlwaysRefCounted, but are used for types which
>> are not ref counted but rather have a single intended owner.
>>
>> Then, refactor the existing Page support to use the new mechanism. Pages
>> returned from the page allocator are not intended to be ref counted by
>> consumers (see previous discussion in [1]), so this keeps Rust's view of
>> page ownership as a simple "owned or not". Of course, this is still
>> composable as Arc<Owned<Page>> if Rust code needs to reference count its
>> own Page allocations for whatever reason.
>
> I think there's a bit a potential mess here because the conversion to
> folios isn't far enough yet that we can entirely ignore page refcounts and
> just use folio refcounts. But I guess we can deal with that oddity if we
> hit it (maybe folio conversion moves fast enough), since this only really
> starts to become relevant for hmm/svm gpu stuff.
I'll note that in the future only selected things will be folios (e.g.,
pagecache, anonymous memory). Everything else will either get a separate
memdesc (e.g., ptdesc), or work on bare pages.
Likely, when talking about page tables, "ptdesc" might be what you want
to allocate here, and not "folios".
So in the future, this interface here will likely look quite a bit
different.
(I know there was a discussion on that on RFC v3 with Willy, just
stumbled over the usage of "folio" here)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 10:26 ` David Hildenbrand
@ 2025-02-04 11:41 ` Asahi Lina
2025-02-04 11:59 ` David Hildenbrand
0 siblings, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-04 11:41 UTC (permalink / raw)
To: David Hildenbrand, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
On 2/4/25 7:26 PM, David Hildenbrand wrote:
> On 03.02.25 22:05, Zi Yan wrote:
>> On 3 Feb 2025, at 9:32, Asahi Lina wrote:
>>
>>> On 2/3/25 6:58 PM, Simona Vetter wrote:
>>>> On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
>>>>> This series refactors the existing Page wrapper to support borrowing
>>>>> `struct page` objects without ownership on the Rust side, and
>>>>> converting
>>>>> page references to/from physical memory addresses.
>>>>>
>>>>> The series overlaps with the earlier submission in [1] and follows a
>>>>> different approach, based on the discussion that happened there.
>>>>>
>>>>> The primary use case for this is implementing IOMMU-style page table
>>>>> management in Rust. This allows drivers for IOMMUs and MMU-containing
>>>>> SoC devices to be written in Rust (such as embedded GPUs). The
>>>>> intended
>>>>> logic is similar to how ARM SMMU page tables are managed in the
>>>>> drivers/iommu tree.
>>>>>
>>>>> First, introduce a concept of Owned<T> and an Ownable trait. These are
>>>>> similar to ARef<T> and AlwaysRefCounted, but are used for types which
>>>>> are not ref counted but rather have a single intended owner.
>>>>>
>>>>> Then, refactor the existing Page support to use the new mechanism.
>>>>> Pages
>>>>> returned from the page allocator are not intended to be ref counted by
>>>>> consumers (see previous discussion in [1]), so this keeps Rust's
>>>>> view of
>>>>> page ownership as a simple "owned or not". Of course, this is still
>>>>> composable as Arc<Owned<Page>> if Rust code needs to reference
>>>>> count its
>>>>> own Page allocations for whatever reason.
>>>>
>>>> I think there's a bit a potential mess here because the conversion to
>>>> folios isn't far enough yet that we can entirely ignore page
>>>> refcounts and
>>>> just use folio refcounts. But I guess we can deal with that oddity
>>>> if we
>>>> hit it (maybe folio conversion moves fast enough), since this only
>>>> really
>>>> starts to become relevant for hmm/svm gpu stuff.
>>>>
>>>> iow I think anticipating the future where struct page really doesn't
>>>> have
>>>> a refcount is the right move. Aside from that it's really not a
>>>> refcount
>>>> that works in the rust ARef sense, since struct page cannot
>>>> disappear for
>>>> system memory, and for dev_pagemap memory it's an entirely different
>>>> reference you need (and then there's a few more special cases).
>>>
>>> Right, as far as this abstraction is concerned, all that needs to hold
>>> is that:
>>>
>>> - alloc_pages() and __free_pages() work as intended, however that may
>>> be, to reserve and return one page (for now, though I think extending
>>> the Rust abstraction to handle higher-order folios is pretty easy, but
>>> that can happen later).
>>> - Whatever borrows pages knows what it's doing. In this case there's
>>> only support for borrowing pages by physaddr, and it's only going to be
>>> used in a driver for a platform without memory hot remove (so far) and
>>> only for pages which have known usage (in principle) and are either
>>> explicitly allocated or known pinned or reserved, so it's not a problem
>>> right now. Future abstractions that return borrowed pages can do their
>>> own locking/bookkeeping/whatever is necessary to keep it safe.
>>>
>>> I would like to hear how memory hot-remove is supposed to work though,
>>> to see if we should be doing something to make the abstraction safer
>>> (though it's still unsafe and always will be). Is there a chance a
>>> `struct page` could vanish out from under us under some conditions?
>>
>> Add DavidH and OscarS for memory hot-remove questions.
>>
>> IIUC, struct page could be freed if a chunk of memory is hot-removed.
>
> Right, but only after there are no users anymore (IOW, memory was freed
> back to the buddy). PFN walkers might still stumble over them, but I
> would not expect (or recommend) rust to do that.
The physaddr to page function does look up pages by pfn, but it's
intended to be used by drivers that know what they're doing. There are
two variants of the API, one that is completely unchecked (a fast path
for cases where the driver definitely allocated these pages itself, for
example just grabbing the `struct page` back from a decoded PTE it
wrote), and one that has this check:
pfn_valid(pfn) && page_is_ram(pfn)
Which is intended as a safety net to allow drivers to look up
firmware-reserved pages too, and fail gracefully if the kernel doesn't
know about them (because they weren't declared in the
bootloader/firmware memory map correctly) or doesn't have them mapped in
the direct map (because they were declared no-map).
Is there anything else that can reasonably be done here to make the API
safer to call on an arbitrary pfn?
If the answer is "no" then that's fine. It's still an unsafe function
and we need to document in the safety section that it should only be
used for memory that is either known to be allocated and pinned and will
not be freed while the `struct page` is borrowed, or memory that is
reserved and not owned by the buddy allocator, so in practice correct
use would not be racy with memory hot-remove anyway.
This is already the case for the drm/asahi use case, where the pfns
looked up will only ever be one of:
- GEM objects that are mapped to the GPU and whose physical pages are
therefore pinned (and the VM is locked while this happens so the objects
cannot become unpinned out from under the running code),
- Raw pages allocated from the page allocator for use as GPU page tables,
- System memory that is marked reserved by the firmware/bootloader,
- (Potentially) invalid PFNs that aren't part of the System RAM region
at all and don't have a struct page to begin with, which we check for,
so the API returns an error. This would only happen if the bootloader
didn't declare some used firmware ranges at all, so Linux doesn't know
about them.
>
>>
>> Another case struct page can be freed is when hugetlb vmemmap
>> optimization
>> is used. Muchun (cc'd) is the maintainer of hugetlbfs.
>
> Here, the "struct page" remains valid though; it can still be accessed,
> although we disallow writes (which would be wrong).
>
> If you only allocate a page and free it later, there is no need to worry
> about either on the rust side.
This is what the safe API does. (Also the unsafe physaddr APIs if all
you ever do is convert an allocated page to a physaddr and back, which
is the only thing the GPU page table code does during normal use. The
walking leaf PFNs story is only for GPU device coredumps when the
firmware crashes.)
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 5/6] rust: page: Add physical address conversion functions
2025-02-03 9:35 ` Alice Ryhl
@ 2025-02-04 11:43 ` Asahi Lina
0 siblings, 0 replies; 67+ messages in thread
From: Asahi Lina @ 2025-02-04 11:43 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On 2/3/25 6:35 PM, Alice Ryhl wrote:
> On Sun, Feb 2, 2025 at 2:06 PM Asahi Lina <lina@asahilina.net> wrote:
>>
>> Add methods to allow code using the Page type to obtain the physical
>> address of a page, convert to and from an (owned) physical address, and
>> borrow a Page from a physical address. Most of these operations are, as
>> you might expect, unsafe.
>>
>> These primitives are useful to implement page table structures in Rust,
>> and to implement arbitrary physical memory access (as needed to walk
>> arbitrary page tables and dereference through them). These mechanisms
>> are, of course, fraught with danger, and are only expected to be used
>> for core memory management code (in e.g. drivers with their own device
>> page table implementations) and for debug features such as crash dumps
>> of device memory.
>>
>> Signed-off-by: Asahi Lina <lina@asahilina.net>
>> ---
>> rust/helpers/page.c | 26 +++++++++++++++++++++
>> rust/kernel/page.rs | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 91 insertions(+)
>>
>> diff --git a/rust/helpers/page.c b/rust/helpers/page.c
>> index b3f2b8fbf87fc9aa89cb1636736c52be16411301..1c3bd68818d77f7ce7806329b8f040a7d4205bb3 100644
>> --- a/rust/helpers/page.c
>> +++ b/rust/helpers/page.c
>> @@ -1,5 +1,6 @@
>> // SPDX-License-Identifier: GPL-2.0
>>
>> +#include <linux/io.h>
>> #include <linux/gfp.h>
>> #include <linux/highmem.h>
>>
>> @@ -17,3 +18,28 @@ void rust_helper_kunmap_local(const void *addr)
>> {
>> kunmap_local(addr);
>> }
>> +
>> +struct page *rust_helper_phys_to_page(phys_addr_t phys)
>> +{
>> + return phys_to_page(phys);
>> +}
>> +
>> +phys_addr_t rust_helper_page_to_phys(struct page *page)
>> +{
>> + return page_to_phys(page);
>> +}
>> +
>> +unsigned long rust_helper_phys_to_pfn(phys_addr_t phys)
>> +{
>> + return __phys_to_pfn(phys);
>> +}
>> +
>> +struct page *rust_helper_pfn_to_page(unsigned long pfn)
>> +{
>> + return pfn_to_page(pfn);
>> +}
>> +
>> +bool rust_helper_pfn_valid(unsigned long pfn)
>> +{
>> + return pfn_valid(pfn);
>> +}
>> diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
>> index fe5f879f9d1a86083fd55c682fad9d52466f79a2..67cd7006fa63ab5aed4c4de2be639ed8e1fbc2ba 100644
>> --- a/rust/kernel/page.rs
>> +++ b/rust/kernel/page.rs
>> @@ -3,6 +3,7 @@
>> //! Kernel page allocation and management.
>>
>> use crate::{
>> + addr::*,
>> alloc::{AllocError, Flags},
>> bindings,
>> error::code::*,
>> @@ -10,6 +11,7 @@
>> types::{Opaque, Ownable, Owned},
>> uaccess::UserSliceReader,
>> };
>> +use core::mem::ManuallyDrop;
>> use core::ptr::{self, NonNull};
>>
>> /// A bitwise shift for the page size.
>> @@ -249,6 +251,69 @@ pub unsafe fn copy_from_user_slice_raw(
>> reader.read_raw(unsafe { core::slice::from_raw_parts_mut(dst.cast(), len) })
>> })
>> }
>> +
>> + /// Returns the physical address of this page.
>> + pub fn phys(&self) -> PhysicalAddr {
>> + // SAFETY: `page` is valid due to the type invariants on `Page`.
>> + unsafe { bindings::page_to_phys(self.as_ptr()) }
>> + }
>> +
>> + /// Converts a Rust-owned Page into its physical address.
>> + ///
>> + /// The caller is responsible for calling [`Page::from_phys()`] to avoid leaking memory.
>> + pub fn into_phys(this: Owned<Self>) -> PhysicalAddr {
>> + ManuallyDrop::new(this).phys()
>> + }
>> +
>> + /// Converts a physical address to a Rust-owned Page.
>> + ///
>> + /// # Safety
>> + /// The caller must ensure that the physical address was previously returned by a call to
>> + /// [`Page::into_phys()`], and that the physical address is no longer used after this call,
>> + /// nor is [`Page::from_phys()`] called again on it.
>> + pub unsafe fn from_phys(phys: PhysicalAddr) -> Owned<Self> {
>> + // SAFETY: By the safety requirements, the physical address must be valid and
>> + // have come from `into_phys()`, so phys_to_page() cannot fail and
>> + // must return the original struct page pointer.
>> + unsafe { Owned::from_raw(NonNull::new_unchecked(bindings::phys_to_page(phys)).cast()) }
>> + }
>> +
>> + /// Borrows a Page from a physical address, without taking over ownership.
>> + ///
>> + /// If the physical address does not have a `struct page` entry or is not
>> + /// part of a System RAM region, returns None.
>> + ///
>> + /// # Safety
>> + /// The caller must ensure that the physical address, if it is backed by a `struct page`,
>> + /// remains available for the duration of the borrowed lifetime.
>> + pub unsafe fn borrow_phys(phys: &PhysicalAddr) -> Option<&Self> {
>> + // SAFETY: This is always safe, as it is just arithmetic
>> + let pfn = unsafe { bindings::phys_to_pfn(*phys) };
>> + // SAFETY: This function is safe to call with any pfn
>> + if !unsafe { bindings::pfn_valid(pfn) && bindings::page_is_ram(pfn) != 0 } {
>> + None
>> + } else {
>> + // SAFETY: We have just checked that the pfn is valid above, so it must
>> + // have a corresponding struct page. By the safety requirements, we can
>> + // return a borrowed reference to it.
>> + Some(unsafe { &*(bindings::pfn_to_page(pfn) as *mut Self as *const Self) })
>> + }
>> + }
>> +
>> + /// Borrows a Page from a physical address, without taking over ownership
>> + /// nor checking for validity.
>> + ///
>> + /// # Safety
>> + /// The caller must ensure that the physical address is backed by a `struct page` and
>> + /// corresponds to System RAM. This is true when the address was returned by
>> + /// [`Page::into_phys()`].
>> + pub unsafe fn borrow_phys_unchecked(phys: &PhysicalAddr) -> &Self {
>
> Should this be
>
> pub unsafe fn borrow_phys_unchecked<'a>(phys: PhysicalAddr) -> &'a Self
>
> ? That's how the signature of these raw methods usually goes, and then
> your safety requirements say that the requirements must hold for the
> duration of 'a.
The idea was to have *some* lifetime bound on an incoming variable
instead of just being able to return any arbitrary lifetime, but I don't
know if that's something worth doing / idiomatic in any way. Obviously
we can't stop a caller from doing something wrong if they really want to.
>
>> + // SAFETY: This is always safe, as it is just arithmetic
>> + let pfn = unsafe { bindings::phys_to_pfn(*phys) };
>> + // SAFETY: The caller guarantees that the pfn is valid. By the safety
>> + // requirements, we can return a borrowed reference to it.
>> + unsafe { &*(bindings::pfn_to_page(pfn) as *mut Self as *const Self) }
>
> Can this just be
>
> &*bindings::pfn_to_page(pfn).cast()
Yeah, I'll fix it for v2!
>
> ?
>
> Alice
>
>> + }
>> }
>>
>> // SAFETY: `Owned<Page>` objects returned by Page::alloc_page() follow the requirements of
>>
>> --
>> 2.47.1
>>
>
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 4/6] rust: addr: Add a module to declare core address types
2025-02-03 15:04 ` Boqun Feng
@ 2025-02-04 11:50 ` Asahi Lina
2025-02-04 14:50 ` Boqun Feng
0 siblings, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-04 11:50 UTC (permalink / raw)
To: Boqun Feng
Cc: Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
On 2/4/25 12:04 AM, Boqun Feng wrote:
> On Sun, Feb 02, 2025 at 10:05:46PM +0900, Asahi Lina wrote:
>> Encapsulates the core physical/DMA address types, so they can be used by
>> Rust abstractions.
>>
>> Signed-off-by: Asahi Lina <lina@asahilina.net>
>> ---
>> rust/kernel/addr.rs | 15 +++++++++++++++
>> rust/kernel/lib.rs | 1 +
>> 2 files changed, 16 insertions(+)
>>
>> diff --git a/rust/kernel/addr.rs b/rust/kernel/addr.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..06aff10a0332355597060c5518d7fd6e114cf630
>> --- /dev/null
>> +++ b/rust/kernel/addr.rs
>> @@ -0,0 +1,15 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Kernel core address types.
>> +
>> +use bindings;
>> +use core::ffi;
>
> I think we should use crate::ffi here. See:
>
> d072acda4862 "rust: use custom FFI integer types"
>
>> +
>> +/// A physical memory address (which may be wider than the CPU pointer size)
>> +pub type PhysicalAddr = bindings::phys_addr_t;
>> +/// A DMA memory address (which may be narrower than `PhysicalAddr` on some systems)
>> +pub type DmaAddr = bindings::dma_addr_t;
>> +/// A physical resource size, typically the same width as `PhysicalAddr`
>> +pub type ResourceSize = bindings::resource_size_t;
>> +/// A raw page frame number, not to be confused with the C `pfn_t` which also encodes flags.
>> +pub type Pfn = ffi::c_ulong;
>> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
>> index e1065a7551a39e68d6379031d80d4be336e652a3..eb1a80ba8ff83ab2d1b3b1d11fed4fb704c7a4f5 100644
>> --- a/rust/kernel/lib.rs
>> +++ b/rust/kernel/lib.rs
>> @@ -29,6 +29,7 @@
>>
>> pub use ffi;
>>
>> +pub mod addr;
>
> I want to share my worry on this `pub mod` list in kernel/lib.rs may get
> too long ;-)
>
> I was about to suggest putting `addr` and `page` into `mm` after Alice's
> patchset merged, however, seems that `mm` mod only cover userspace
> memory management (which is not my impression of what "mm" in kernel
> development), thoughts? Alice, do you think we should extend `mm` mod to
> contain all memory management mods?
I think in kernel-speak "mm" is usually userspace memory management and
kernel page allocation and memory management, but doesn't include device
stuff (I/O ranges, DMA, etc.) or the low-level concepts of memory in an
architecture. The types I declare here are more relevant to device/arch
code, so I don't think they really belong under "mm", it's more general
(mm deals with them and so do other parts of the kernel code).
On the other hand, I think page.rs itself would fit under mm, since it
deals with the kernel mm's idea of pages and their allocation.
>
> Regards,
> Boqun
>
>> pub mod alloc;
>> #[cfg(CONFIG_BLOCK)]
>> pub mod block;
>>
>> --
>> 2.47.1
>>
>
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 11:41 ` Asahi Lina
@ 2025-02-04 11:59 ` David Hildenbrand
2025-02-04 13:05 ` Asahi Lina
0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-02-04 11:59 UTC (permalink / raw)
To: Asahi Lina, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
>>> Add DavidH and OscarS for memory hot-remove questions.
>>>
>>> IIUC, struct page could be freed if a chunk of memory is hot-removed.
>>
>> Right, but only after there are no users anymore (IOW, memory was freed
>> back to the buddy). PFN walkers might still stumble over them, but I
>> would not expect (or recommend) rust to do that.
>
> The physaddr to page function does look up pages by pfn, but it's
> intended to be used by drivers that know what they're doing. There are
> two variants of the API, one that is completely unchecked (a fast path
> for cases where the driver definitely allocated these pages itself, for
> example just grabbing the `struct page` back from a decoded PTE it
> wrote), and one that has this check:
>
> pfn_valid(pfn) && page_is_ram(pfn)
>
> Which is intended as a safety net to allow drivers to look up
> firmware-reserved pages too, and fail gracefully if the kernel doesn't
> know about them (because they weren't declared in the
> bootloader/firmware memory map correctly) or doesn't have them mapped in
> the direct map (because they were declared no-map).
> > Is there anything else that can reasonably be done here to make the API
> safer to call on an arbitrary pfn?
In PFN walkers we use pfn_to_online_page() to make sure that (1) the
memmap really exists; and (2) that it has meaning (e.g., was actually
initialized).
It can still race with memory offlining, and it refuses ZONE_DEVICE
pages. For the latter, we have a different way to check validity. See
memory_failure() that first calls pfn_to_online_page() to then check
get_dev_pagemap().
>
> If the answer is "no" then that's fine. It's still an unsafe function
> and we need to document in the safety section that it should only be
> used for memory that is either known to be allocated and pinned and will
> not be freed while the `struct page` is borrowed, or memory that is
> reserved and not owned by the buddy allocator, so in practice correct
> use would not be racy with memory hot-remove anyway.
>
> This is already the case for the drm/asahi use case, where the pfns
> looked up will only ever be one of:
>
> - GEM objects that are mapped to the GPU and whose physical pages are
> therefore pinned (and the VM is locked while this happens so the objects
> cannot become unpinned out from under the running code),
How exactly are these pages pinned/obtained?
> - Raw pages allocated from the page allocator for use as GPU page tables,
That makes sense.
> - System memory that is marked reserved by the firmware/bootloader,
E.g., in arch/x86/mm/ioremap.c:__ioremap_check_ram() we refuse anything
that has a valid memmap and is *not* marked as PageReserved, to prevent
remapping arbitrary *real* RAM.
Is that case here similar?
> - (Potentially) invalid PFNs that aren't part of the System RAM region
> at all and don't have a struct page to begin with, which we check for,
> so the API returns an error. This would only happen if the bootloader
> didn't declare some used firmware ranges at all, so Linux doesn't know
> about them.
>
>>
>>>
>>> Another case struct page can be freed is when hugetlb vmemmap
>>> optimization
>>> is used. Muchun (cc'd) is the maintainer of hugetlbfs.
>>
>> Here, the "struct page" remains valid though; it can still be accessed,
>> although we disallow writes (which would be wrong).
>>
>> If you only allocate a page and free it later, there is no need to worry
>> about either on the rust side.
>
> This is what the safe API does. (Also the unsafe physaddr APIs if all
> you ever do is convert an allocated page to a physaddr and back, which
> is the only thing the GPU page table code does during normal use. The
> walking leaf PFNs story is only for GPU device coredumps when the
> firmware crashes.)
I would hope that we can lock down this interface as much as possible.
Ideally, we would never go from pfn->page, unless
(a) we remember somehow that we came from page->pfn. E.g., we allocated
these pages or someone else provided us with these pages. The memmap
cannot go away. I know it's hard.
(b) the pages are flagged as being special, similar to
__ioremap_check_ram().
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 11:59 ` David Hildenbrand
@ 2025-02-04 13:05 ` Asahi Lina
2025-02-04 14:38 ` David Hildenbrand
0 siblings, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-04 13:05 UTC (permalink / raw)
To: David Hildenbrand, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
On 2/4/25 8:59 PM, David Hildenbrand wrote:
>>>> Add DavidH and OscarS for memory hot-remove questions.
>>>>
>>>> IIUC, struct page could be freed if a chunk of memory is hot-removed.
>>>
>>> Right, but only after there are no users anymore (IOW, memory was freed
>>> back to the buddy). PFN walkers might still stumble over them, but I
>>> would not expect (or recommend) rust to do that.
>>
>> The physaddr to page function does look up pages by pfn, but it's
>> intended to be used by drivers that know what they're doing. There are
>> two variants of the API, one that is completely unchecked (a fast path
>> for cases where the driver definitely allocated these pages itself, for
>> example just grabbing the `struct page` back from a decoded PTE it
>> wrote), and one that has this check:
>>
>> pfn_valid(pfn) && page_is_ram(pfn)
>>
>> Which is intended as a safety net to allow drivers to look up
>> firmware-reserved pages too, and fail gracefully if the kernel doesn't
>> know about them (because they weren't declared in the
>> bootloader/firmware memory map correctly) or doesn't have them mapped in
>> the direct map (because they were declared no-map).
>> > Is there anything else that can reasonably be done here to make the API
>> safer to call on an arbitrary pfn?
>
> In PFN walkers we use pfn_to_online_page() to make sure that (1) the
> memmap really exists; and (2) that it has meaning (e.g., was actually
> initialized).
>
> It can still race with memory offlining, and it refuses ZONE_DEVICE
> pages. For the latter, we have a different way to check validity. See
> memory_failure() that first calls pfn_to_online_page() to then check
> get_dev_pagemap().
I'll give it a shot with these functions. If they work for my use case,
then it's good to have extra checks and I'll add them for v2. Thanks!
>
>>
>> If the answer is "no" then that's fine. It's still an unsafe function
>> and we need to document in the safety section that it should only be
>> used for memory that is either known to be allocated and pinned and will
>> not be freed while the `struct page` is borrowed, or memory that is
>> reserved and not owned by the buddy allocator, so in practice correct
>> use would not be racy with memory hot-remove anyway.
>>
>> This is already the case for the drm/asahi use case, where the pfns
>> looked up will only ever be one of:
>>
>> - GEM objects that are mapped to the GPU and whose physical pages are
>> therefore pinned (and the VM is locked while this happens so the objects
>> cannot become unpinned out from under the running code),
>
> How exactly are these pages pinned/obtained?
Under the hood it's shmem. For pinning, it winds up at
`drm_gem_get_pages()`, which I think does a `shmem_read_folio_gfp()` on
a mapping set as unevictable.
I'm not very familiar with the innards of that codepath, but it's
definitely an invariant that GEM objects have to be pinned while they
are mapped in GPU page tables (otherwise the GPU would end up accessing
freed memory).
Since the code that walks the PT to dump pages is part of the same PT
object and takes a mutable reference, the Rust guarantees mean it's
impossible for the PT to be concurrently mutated or anything like that.
So if one of these objects *were* unpinned/freed somehow while the dump
code is running, that would be a major bug somewhere else, since there
would be dangling PTEs left over.
In practice, there's a big lock around each PT/VM at a higher level of
the driver, so any attempts to unmap/free any of those objects will be
stuck waiting for the lock on the VM they are mapped into.
>
>> - Raw pages allocated from the page allocator for use as GPU page tables,
>
> That makes sense.
>
>> - System memory that is marked reserved by the firmware/bootloader,
>
> E.g., in arch/x86/mm/ioremap.c:__ioremap_check_ram() we refuse anything
> that has a valid memmap and is *not* marked as PageReserved, to prevent
> remapping arbitrary *real* RAM.
>
> Is that case here similar?
I don't have an explicit check for that here but yes, the pages wind up
marked PageReserved. This includes both no-map ranges and map ranges.
The no-map ranges also have a valid struct page if within the declared
RAM range but they would oops if we try to access the contents via
direct map, so the page_is_ram() check is there to reject those (and
MMIO and anything else that isn't normally mapped as RAM even if it
winds up with a struct page).
>> - (Potentially) invalid PFNs that aren't part of the System RAM region
>> at all and don't have a struct page to begin with, which we check for,
>> so the API returns an error. This would only happen if the bootloader
>> didn't declare some used firmware ranges at all, so Linux doesn't know
>> about them.
>>
>>>
>>>>
>>>> Another case struct page can be freed is when hugetlb vmemmap
>>>> optimization
>>>> is used. Muchun (cc'd) is the maintainer of hugetlbfs.
>>>
>>> Here, the "struct page" remains valid though; it can still be accessed,
>>> although we disallow writes (which would be wrong).
>>>
>>> If you only allocate a page and free it later, there is no need to worry
>>> about either on the rust side.
>>
>> This is what the safe API does. (Also the unsafe physaddr APIs if all
>> you ever do is convert an allocated page to a physaddr and back, which
>> is the only thing the GPU page table code does during normal use. The
>> walking leaf PFNs story is only for GPU device coredumps when the
>> firmware crashes.)
>
> I would hope that we can lock down this interface as much as possible.
Right, that's why the safe API never does any of the weird pfn->page
stuff. Rust driver code has to use unsafe {} to access the raw pfn->page
interface, which requires a // SAFETY comment explaining why what it's
doing is safe, and then we need to document in the function signature
what the safety requirements are so those comments can be reviewed.
> Ideally, we would never go from pfn->page, unless
>
> (a) we remember somehow that we came from page->pfn. E.g., we allocated
> these pages or someone else provided us with these pages. The memmap
> cannot go away. I know it's hard.
This is the common case for the page tables. 99% of the time this is
what the driver will be doing, with a single exception (the root page
table of the firmware/privileged VM is a system reserved memory region,
and falls under (b). It's one single page globally in the system.).
The driver actually uses the completely unchecked interface in this
case, since it knows the pfns are definitely OK. I do a single check
with the checked interface at probe time for that one special-case pfn
so it can fail gracefully instead of oops if the DT config is
unusable/wrong.
> (b) the pages are flagged as being special, similar to
> __ioremap_check_ram().
This only ever happens during firmware crash dumps (plus the one
exception above).
The missing (c) case is the kernel/firmware shared memory GEM objects
during crash dumps. But I really need those to diagnose firmware
crashes. Of course, I could dump them separately through other APIs in
principle, but that would complicate the crashdump code quite a bit
since I'd have to go through all the kernel GPU memory allocators and
dig out all their backing GEM objects and copy the memory through their
vmap (they are all vmapped, which is yet another reason in practice the
pages are pinned) and merge it into the coredump file. I also wouldn't
have easy direct access to the matching GPU PTEs if I do that (I store
the PTE permission/caching bits in the coredump file, since those are
actually kind of critical to diagnose exactly what happened, as caching
issues are one major cause of firmware problems). Since I need the page
table walker code to grab the firmware pages anyway, I hope I can avoid
having to go through a completely different codepath for the kernel GEM
objects...
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 13:05 ` Asahi Lina
@ 2025-02-04 14:38 ` David Hildenbrand
2025-02-04 17:59 ` Asahi Lina
2025-02-05 7:40 ` Simona Vetter
0 siblings, 2 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-02-04 14:38 UTC (permalink / raw)
To: Asahi Lina, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
>> It can still race with memory offlining, and it refuses ZONE_DEVICE
>> pages. For the latter, we have a different way to check validity. See
>> memory_failure() that first calls pfn_to_online_page() to then check
>> get_dev_pagemap().
>
> I'll give it a shot with these functions. If they work for my use case,
> then it's good to have extra checks and I'll add them for v2. Thanks!
Let me know if you run into any issues.
>
>>
>>>
>>> If the answer is "no" then that's fine. It's still an unsafe function
>>> and we need to document in the safety section that it should only be
>>> used for memory that is either known to be allocated and pinned and will
>>> not be freed while the `struct page` is borrowed, or memory that is
>>> reserved and not owned by the buddy allocator, so in practice correct
>>> use would not be racy with memory hot-remove anyway.
>>>
>>> This is already the case for the drm/asahi use case, where the pfns
>>> looked up will only ever be one of:
>>>
>>> - GEM objects that are mapped to the GPU and whose physical pages are
>>> therefore pinned (and the VM is locked while this happens so the objects
>>> cannot become unpinned out from under the running code),
>>
>> How exactly are these pages pinned/obtained?
>
> Under the hood it's shmem. For pinning, it winds up at
> `drm_gem_get_pages()`, which I think does a `shmem_read_folio_gfp()` on
> a mapping set as unevictable.
Thanks. So we grab another folio reference via
shmem_read_folio_gfp()->shmem_get_folio_gfp().
Hm, I wonder if we might end up holding folios residing in
ZONE_MOVABLE/MIGRATE_CMA longer than we should.
Compared to memfd_pin_folios(), which simulates FOLL_LONGTERM and makes
sure to migrate pages out of ZONE_MOVABLE/MIGRATE_CMA.
But that's a different discussion, just pointing it out, maybe I'm
missing something :)
>
> I'm not very familiar with the innards of that codepath, but it's
> definitely an invariant that GEM objects have to be pinned while they
> are mapped in GPU page tables (otherwise the GPU would end up accessing
> freed memory).
Right, there must be a raised reference.
>
> Since the code that walks the PT to dump pages is part of the same PT
> object and takes a mutable reference, the Rust guarantees mean it's
> impossible for the PT to be concurrently mutated or anything like that.
> So if one of these objects *were* unpinned/freed somehow while the dump
> code is running, that would be a major bug somewhere else, since there
> would be dangling PTEs left over.
>
> In practice, there's a big lock around each PT/VM at a higher level of
> the driver, so any attempts to unmap/free any of those objects will be
> stuck waiting for the lock on the VM they are mapped into.
Understood, thanks.
[...]
>>>>>
>>>>> Another case struct page can be freed is when hugetlb vmemmap
>>>>> optimization
>>>>> is used. Muchun (cc'd) is the maintainer of hugetlbfs.
>>>>
>>>> Here, the "struct page" remains valid though; it can still be accessed,
>>>> although we disallow writes (which would be wrong).
>>>>
>>>> If you only allocate a page and free it later, there is no need to worry
>>>> about either on the rust side.
>>>
>>> This is what the safe API does. (Also the unsafe physaddr APIs if all
>>> you ever do is convert an allocated page to a physaddr and back, which
>>> is the only thing the GPU page table code does during normal use. The
>>> walking leaf PFNs story is only for GPU device coredumps when the
>>> firmware crashes.)
>>
>> I would hope that we can lock down this interface as much as possible.
>
> Right, that's why the safe API never does any of the weird pfn->page
> stuff. Rust driver code has to use unsafe {} to access the raw pfn->page
> interface, which requires a // SAFETY comment explaining why what it's
> doing is safe, and then we need to document in the function signature
> what the safety requirements are so those comments can be reviewed.
>
>> Ideally, we would never go from pfn->page, unless
>>
>> (a) we remember somehow that we came from page->pfn. E.g., we allocated
>> these pages or someone else provided us with these pages. The memmap
>> cannot go away. I know it's hard.
>
> This is the common case for the page tables. 99% of the time this is
> what the driver will be doing, with a single exception (the root page
> table of the firmware/privileged VM is a system reserved memory region,
> and falls under (b). It's one single page globally in the system.).
Makes sense.
>
> The driver actually uses the completely unchecked interface in this
> case, since it knows the pfns are definitely OK. I do a single check
> with the checked interface at probe time for that one special-case pfn
> so it can fail gracefully instead of oops if the DT config is
> unusable/wrong.
>
>> (b) the pages are flagged as being special, similar to
>> __ioremap_check_ram().
>
> This only ever happens during firmware crash dumps (plus the one
> exception above).
>
> The missing (c) case is the kernel/firmware shared memory GEM objects
> during crash dumps.
If it's only for crash dumps etc. that might even be opt-in, it makes
the whole thing a lot less scary. Maybe this could be opt-in somewhere,
to "unlock" this interface? Just an idea.
> But I really need those to diagnose firmware
> crashes. Of course, I could dump them separately through other APIs in
> principle, but that would complicate the crashdump code quite a bit
> since I'd have to go through all the kernel GPU memory allocators and
> dig out all their backing GEM objects and copy the memory through their
> vmap (they are all vmapped, which is yet another reason in practice the
> pages are pinned) and merge it into the coredump file. I also wouldn't
> have easy direct access to the matching GPU PTEs if I do that (I store
> the PTE permission/caching bits in the coredump file, since those are
> actually kind of critical to diagnose exactly what happened, as caching
> issues are one major cause of firmware problems). Since I need the page
> table walker code to grab the firmware pages anyway, I hope I can avoid
> having to go through a completely different codepath for the kernel GEM
> objects...
Makes sense.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 4/6] rust: addr: Add a module to declare core address types
2025-02-04 11:50 ` Asahi Lina
@ 2025-02-04 14:50 ` Boqun Feng
0 siblings, 0 replies; 67+ messages in thread
From: Boqun Feng @ 2025-02-04 14:50 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
On Tue, Feb 04, 2025 at 08:50:17PM +0900, Asahi Lina wrote:
>
>
> On 2/4/25 12:04 AM, Boqun Feng wrote:
> > On Sun, Feb 02, 2025 at 10:05:46PM +0900, Asahi Lina wrote:
> >> Encapsulates the core physical/DMA address types, so they can be used by
> >> Rust abstractions.
> >>
> >> Signed-off-by: Asahi Lina <lina@asahilina.net>
> >> ---
> >> rust/kernel/addr.rs | 15 +++++++++++++++
> >> rust/kernel/lib.rs | 1 +
> >> 2 files changed, 16 insertions(+)
> >>
> >> diff --git a/rust/kernel/addr.rs b/rust/kernel/addr.rs
> >> new file mode 100644
> >> index 0000000000000000000000000000000000000000..06aff10a0332355597060c5518d7fd6e114cf630
> >> --- /dev/null
> >> +++ b/rust/kernel/addr.rs
> >> @@ -0,0 +1,15 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +
> >> +//! Kernel core address types.
> >> +
> >> +use bindings;
> >> +use core::ffi;
> >
> > I think we should use crate::ffi here. See:
> >
> > d072acda4862 "rust: use custom FFI integer types"
> >
> >> +
> >> +/// A physical memory address (which may be wider than the CPU pointer size)
> >> +pub type PhysicalAddr = bindings::phys_addr_t;
> >> +/// A DMA memory address (which may be narrower than `PhysicalAddr` on some systems)
> >> +pub type DmaAddr = bindings::dma_addr_t;
> >> +/// A physical resource size, typically the same width as `PhysicalAddr`
> >> +pub type ResourceSize = bindings::resource_size_t;
> >> +/// A raw page frame number, not to be confused with the C `pfn_t` which also encodes flags.
> >> +pub type Pfn = ffi::c_ulong;
> >> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> >> index e1065a7551a39e68d6379031d80d4be336e652a3..eb1a80ba8ff83ab2d1b3b1d11fed4fb704c7a4f5 100644
> >> --- a/rust/kernel/lib.rs
> >> +++ b/rust/kernel/lib.rs
> >> @@ -29,6 +29,7 @@
> >>
> >> pub use ffi;
> >>
> >> +pub mod addr;
> >
> > I want to share my worry on this `pub mod` list in kernel/lib.rs may get
> > too long ;-)
> >
> > I was about to suggest putting `addr` and `page` into `mm` after Alice's
> > patchset merged, however, seems that `mm` mod only cover userspace
> > memory management (which is not my impression of what "mm" in kernel
> > development), thoughts? Alice, do you think we should extend `mm` mod to
> > contain all memory management mods?
>
> I think in kernel-speak "mm" is usually userspace memory management and
> kernel page allocation and memory management, but doesn't include device
> stuff (I/O ranges, DMA, etc.) or the low-level concepts of memory in an
> architecture. The types I declare here are more relevant to device/arch
> code, so I don't think they really belong under "mm", it's more general
> (mm deals with them and so do other parts of the kernel code).
>
Make sense.
> On the other hand, I think page.rs itself would fit under mm, since it
> deals with the kernel mm's idea of pages and their allocation.
>
Agreed. Probably some follow-up work after Alice's mm work merged.
Regards,
Boqun
> >
> > Regards,
> > Boqun
> >
> >> pub mod alloc;
> >> #[cfg(CONFIG_BLOCK)]
> >> pub mod block;
> >>
> >> --
> >> 2.47.1
> >>
> >
>
> ~~ Lina
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 14:38 ` David Hildenbrand
@ 2025-02-04 17:59 ` Asahi Lina
2025-02-04 20:10 ` David Hildenbrand
2025-02-05 7:40 ` Simona Vetter
1 sibling, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-04 17:59 UTC (permalink / raw)
To: David Hildenbrand, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
On 2/4/25 11:38 PM, David Hildenbrand wrote:
>>>> If the answer is "no" then that's fine. It's still an unsafe function
>>>> and we need to document in the safety section that it should only be
>>>> used for memory that is either known to be allocated and pinned and
>>>> will
>>>> not be freed while the `struct page` is borrowed, or memory that is
>>>> reserved and not owned by the buddy allocator, so in practice correct
>>>> use would not be racy with memory hot-remove anyway.
>>>>
>>>> This is already the case for the drm/asahi use case, where the pfns
>>>> looked up will only ever be one of:
>>>>
>>>> - GEM objects that are mapped to the GPU and whose physical pages are
>>>> therefore pinned (and the VM is locked while this happens so the
>>>> objects
>>>> cannot become unpinned out from under the running code),
>>>
>>> How exactly are these pages pinned/obtained?
>>
>> Under the hood it's shmem. For pinning, it winds up at
>> `drm_gem_get_pages()`, which I think does a `shmem_read_folio_gfp()` on
>> a mapping set as unevictable.
>
> Thanks. So we grab another folio reference via shmem_read_folio_gfp()-
>>shmem_get_folio_gfp().
>
> Hm, I wonder if we might end up holding folios residing in ZONE_MOVABLE/
> MIGRATE_CMA longer than we should.
>
> Compared to memfd_pin_folios(), which simulates FOLL_LONGTERM and makes
> sure to migrate pages out of ZONE_MOVABLE/MIGRATE_CMA.
>
> But that's a different discussion, just pointing it out, maybe I'm
> missing something :)
I think this is a little over my head. Though I only just realized that
we seem to be keeping the GEM objects pinned forever, even after unmap,
in the drm-shmem core API (I see no drm-shmem entry point that would
allow the sgt to be freed and its corresponding pages ref to be dropped,
other than a purge of purgeable objects or final destruction of the
object). I'll poke around since this feels wrong, I thought we were
supposed to be able to have shrinker support for swapping out whole GPU
VMs in the modern GPU MM model, but I guess there's no implementation of
that for gem-shmem drivers yet...?
That's a discussion for the DRM side though.
>
>>
>> I'm not very familiar with the innards of that codepath, but it's
>> definitely an invariant that GEM objects have to be pinned while they
>> are mapped in GPU page tables (otherwise the GPU would end up accessing
>> freed memory).
>
> Right, there must be a raised reference.
>
[...]
>>>>>> Another case struct page can be freed is when hugetlb vmemmap
>>>>>> optimization
>>>>>> is used. Muchun (cc'd) is the maintainer of hugetlbfs.
>>>>>
>>>>> Here, the "struct page" remains valid though; it can still be
>>>>> accessed,
>>>>> although we disallow writes (which would be wrong).
>>>>>
>>>>> If you only allocate a page and free it later, there is no need to
>>>>> worry
>>>>> about either on the rust side.
>>>>
>>>> This is what the safe API does. (Also the unsafe physaddr APIs if all
>>>> you ever do is convert an allocated page to a physaddr and back, which
>>>> is the only thing the GPU page table code does during normal use. The
>>>> walking leaf PFNs story is only for GPU device coredumps when the
>>>> firmware crashes.)
>>>
>>> I would hope that we can lock down this interface as much as possible.
>>
>> Right, that's why the safe API never does any of the weird pfn->page
>> stuff. Rust driver code has to use unsafe {} to access the raw pfn->page
>> interface, which requires a // SAFETY comment explaining why what it's
>> doing is safe, and then we need to document in the function signature
>> what the safety requirements are so those comments can be reviewed.
>>
>>> Ideally, we would never go from pfn->page, unless
>>>
>>> (a) we remember somehow that we came from page->pfn. E.g., we allocated
>>> these pages or someone else provided us with these pages. The
>>> memmap
>>> cannot go away. I know it's hard.
>>
>> This is the common case for the page tables. 99% of the time this is
>> what the driver will be doing, with a single exception (the root page
>> table of the firmware/privileged VM is a system reserved memory region,
>> and falls under (b). It's one single page globally in the system.).
>
> Makes sense.
>
>>
>> The driver actually uses the completely unchecked interface in this
>> case, since it knows the pfns are definitely OK. I do a single check
>> with the checked interface at probe time for that one special-case pfn
>> so it can fail gracefully instead of oops if the DT config is
>> unusable/wrong.
>>
>>> (b) the pages are flagged as being special, similar to
>>> __ioremap_check_ram().
>>
>> This only ever happens during firmware crash dumps (plus the one
>> exception above).
>>
>> The missing (c) case is the kernel/firmware shared memory GEM objects
>> during crash dumps.
>
> If it's only for crash dumps etc. that might even be opt-in, it makes
> the whole thing a lot less scary. Maybe this could be opt-in somewhere,
> to "unlock" this interface? Just an idea.
Just to make sure we're on the same page, I don't think there's anything
to unlock in the Rust abstraction side (this series). At the end of the
day, if nothing else, the unchecked interface (which the regular
non-crash page table management code uses for performance) will let you
use any pfn you want, it's up to documentation and human review to
specify how it should be used by drivers. What Rust gives us here is the
mandatory `unsafe {}`, so any attempts to use this API will necessarily
stick out during review as potentially dangerous code that needs extra
scrutiny.
For the client driver itself, I could gate the devcoredump stuff behind
a module parameter or something... but I don't think it's really worth
it. We don't have a way to reboot the firmware or recover from this
condition (platform limitations), so end users are stuck rebooting to
get back a usable machine anyway. If something goes wrong in the
crashdump code and the machine oopses or locks up worse... it doesn't
really make much of a difference for normal end users. I don't think
this will ever really happen given the constraints I described, but if
somehow it does (some other bug somewhere?), well... the machine was
already in an unrecoverable state anyway.
It would be nice to have userspace tooling deployed by default that
saves off the devcoredump somewhere, so we can have a chance at
debugging hard-to-hit firmware crashes... if it's opt-in, it would only
really be useful for developers and CI machines.
There *is* a system-global devcoredump disable, but it's not exposed
outside of the devcoredump core. It just lets coredumps happen and then
throws away the result. It might be worth sending out a patch to expose
that to drivers, so they can skip the whole coredump generation
machinery if it's unnecessary.
>> But I really need those to diagnose firmware
>> crashes. Of course, I could dump them separately through other APIs in
>> principle, but that would complicate the crashdump code quite a bit
>> since I'd have to go through all the kernel GPU memory allocators and
>> dig out all their backing GEM objects and copy the memory through their
>> vmap (they are all vmapped, which is yet another reason in practice the
>> pages are pinned) and merge it into the coredump file. I also wouldn't
>> have easy direct access to the matching GPU PTEs if I do that (I store
>> the PTE permission/caching bits in the coredump file, since those are
>> actually kind of critical to diagnose exactly what happened, as caching
>> issues are one major cause of firmware problems). Since I need the page
>> table walker code to grab the firmware pages anyway, I hope I can avoid
>> having to go through a completely different codepath for the kernel GEM
>> objects...
>
> Makes sense.
>
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 10:33 ` David Hildenbrand
@ 2025-02-04 18:39 ` Jason Gunthorpe
2025-02-04 19:01 ` Asahi Lina
2025-02-04 20:05 ` David Hildenbrand
0 siblings, 2 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-04 18:39 UTC (permalink / raw)
To: David Hildenbrand
Cc: Asahi Lina, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Tue, Feb 04, 2025 at 11:33:24AM +0100, David Hildenbrand wrote:
> On 03.02.25 10:58, Simona Vetter wrote:
> > On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
> > > This series refactors the existing Page wrapper to support borrowing
> > > `struct page` objects without ownership on the Rust side, and converting
> > > page references to/from physical memory addresses.
> > >
> > > The series overlaps with the earlier submission in [1] and follows a
> > > different approach, based on the discussion that happened there.
> > >
> > > The primary use case for this is implementing IOMMU-style page table
> > > management in Rust. This allows drivers for IOMMUs and MMU-containing
> > > SoC devices to be written in Rust (such as embedded GPUs). The intended
> > > logic is similar to how ARM SMMU page tables are managed in the
> > > drivers/iommu tree.
> > >
> > > First, introduce a concept of Owned<T> and an Ownable trait. These are
> > > similar to ARef<T> and AlwaysRefCounted, but are used for types which
> > > are not ref counted but rather have a single intended owner.
> > >
> > > Then, refactor the existing Page support to use the new mechanism. Pages
> > > returned from the page allocator are not intended to be ref counted by
> > > consumers (see previous discussion in [1]), so this keeps Rust's view of
> > > page ownership as a simple "owned or not". Of course, this is still
> > > composable as Arc<Owned<Page>> if Rust code needs to reference count its
> > > own Page allocations for whatever reason.
> >
> > I think there's a bit a potential mess here because the conversion to
> > folios isn't far enough yet that we can entirely ignore page refcounts and
> > just use folio refcounts. But I guess we can deal with that oddity if we
> > hit it (maybe folio conversion moves fast enough), since this only really
> > starts to become relevant for hmm/svm gpu stuff.
>
> I'll note that in the future only selected things will be folios (e.g.,
> pagecache, anonymous memory). Everything else will either get a separate
> memdesc (e.g., ptdesc), or work on bare pages.
>
> Likely, when talking about page tables, "ptdesc" might be what you want to
> allocate here, and not "folios".
I just posted a series to clean up the iommu code that this is
cribbing the interface from: add an ioptdesc, remove all the struct
page from the API and so forth:
https://lore.kernel.org/all/0-v1-416f64558c7c+2a5-iommu_pages_jgg@nvidia.com/
I strongly suggest that rust not expose these old schemes that will
need another cleanup like the above. Page tables just need an
allocator using simple void *, kmalloc is good enough for simple
cases.
Drivers should not be exposing or touching struct page just to
implement a page table.
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 18:39 ` Jason Gunthorpe
@ 2025-02-04 19:01 ` Asahi Lina
2025-02-04 20:05 ` David Hildenbrand
1 sibling, 0 replies; 67+ messages in thread
From: Asahi Lina @ 2025-02-04 19:01 UTC (permalink / raw)
To: Jason Gunthorpe, David Hildenbrand
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On 2/5/25 3:39 AM, Jason Gunthorpe wrote:
> On Tue, Feb 04, 2025 at 11:33:24AM +0100, David Hildenbrand wrote:
>> On 03.02.25 10:58, Simona Vetter wrote:
>>> On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
>>>> This series refactors the existing Page wrapper to support borrowing
>>>> `struct page` objects without ownership on the Rust side, and converting
>>>> page references to/from physical memory addresses.
>>>>
>>>> The series overlaps with the earlier submission in [1] and follows a
>>>> different approach, based on the discussion that happened there.
>>>>
>>>> The primary use case for this is implementing IOMMU-style page table
>>>> management in Rust. This allows drivers for IOMMUs and MMU-containing
>>>> SoC devices to be written in Rust (such as embedded GPUs). The intended
>>>> logic is similar to how ARM SMMU page tables are managed in the
>>>> drivers/iommu tree.
>>>>
>>>> First, introduce a concept of Owned<T> and an Ownable trait. These are
>>>> similar to ARef<T> and AlwaysRefCounted, but are used for types which
>>>> are not ref counted but rather have a single intended owner.
>>>>
>>>> Then, refactor the existing Page support to use the new mechanism. Pages
>>>> returned from the page allocator are not intended to be ref counted by
>>>> consumers (see previous discussion in [1]), so this keeps Rust's view of
>>>> page ownership as a simple "owned or not". Of course, this is still
>>>> composable as Arc<Owned<Page>> if Rust code needs to reference count its
>>>> own Page allocations for whatever reason.
>>>
>>> I think there's a bit a potential mess here because the conversion to
>>> folios isn't far enough yet that we can entirely ignore page refcounts and
>>> just use folio refcounts. But I guess we can deal with that oddity if we
>>> hit it (maybe folio conversion moves fast enough), since this only really
>>> starts to become relevant for hmm/svm gpu stuff.
>>
>> I'll note that in the future only selected things will be folios (e.g.,
>> pagecache, anonymous memory). Everything else will either get a separate
>> memdesc (e.g., ptdesc), or work on bare pages.
>>
>> Likely, when talking about page tables, "ptdesc" might be what you want to
>> allocate here, and not "folios".
>
> I just posted a series to clean up the iommu code that this is
> cribbing the interface from: add an ioptdesc, remove all the struct
> page from the API and so forth:
>
> https://lore.kernel.org/all/0-v1-416f64558c7c+2a5-iommu_pages_jgg@nvidia.com/
>
> I strongly suggest that rust not expose these old schemes that will
> need another cleanup like the above. Page tables just need an
> allocator using simple void *, kmalloc is good enough for simple
> cases.
>
> Drivers should not be exposing or touching struct page just to
> implement a page table.
iommu-pages.h looks like a private API internal to drivers/iommu. Is
there a plan to move that out so it can be used by non-iommu drivers?
If you think I should just use kmalloc, then I guess I could literally
just use KBox<PageTable> where PageTable is just a wrapper around [u64;
PTES_PER_PAGE] with an alignment attribute, or something like that. I
guess then just virt_to_phys() and back for the PTEs? So we still need
to add at least trivial wrappers to export those for Rust drivers to use.
I still need to be able to grab pointers to leaf pages for the crashdump
page table walker code though, and I want to keep the pfn and memory
type checks to make sure it won't crash trying to dereference the page
contents. So we also still need those too.
I'm not entirely sure what a higher-level Rust abstraction for this
looks like at this point, or whether it makes sense at all instead of
just trivially wrapping the C helpers... I don't particularly mind just
writing this part of the driver code "like C" (it's pretty much the
lowest-level bit of code in the driver anyway) but I guess we have to do
some thinking here.
Maybe this is one of those cases where we just do it ad-hoc for now, and
wait until a second driver that needs to implement page tables comes
around and we figure out what can be shared then, and what the API
should look like.
Hmm, starting to think maybe these should literally be impls on the
address types (PhysicalAddr::to_pfn(), PhysicalAddr::to_virt(), etc.).
Though that won't work without newtyping everything, but maybe we should?
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 18:39 ` Jason Gunthorpe
2025-02-04 19:01 ` Asahi Lina
@ 2025-02-04 20:05 ` David Hildenbrand
2025-02-04 20:26 ` Jason Gunthorpe
1 sibling, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-02-04 20:05 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Asahi Lina, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On 04.02.25 19:39, Jason Gunthorpe wrote:
> On Tue, Feb 04, 2025 at 11:33:24AM +0100, David Hildenbrand wrote:
>> On 03.02.25 10:58, Simona Vetter wrote:
>>> On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
>>>> This series refactors the existing Page wrapper to support borrowing
>>>> `struct page` objects without ownership on the Rust side, and converting
>>>> page references to/from physical memory addresses.
>>>>
>>>> The series overlaps with the earlier submission in [1] and follows a
>>>> different approach, based on the discussion that happened there.
>>>>
>>>> The primary use case for this is implementing IOMMU-style page table
>>>> management in Rust. This allows drivers for IOMMUs and MMU-containing
>>>> SoC devices to be written in Rust (such as embedded GPUs). The intended
>>>> logic is similar to how ARM SMMU page tables are managed in the
>>>> drivers/iommu tree.
>>>>
>>>> First, introduce a concept of Owned<T> and an Ownable trait. These are
>>>> similar to ARef<T> and AlwaysRefCounted, but are used for types which
>>>> are not ref counted but rather have a single intended owner.
>>>>
>>>> Then, refactor the existing Page support to use the new mechanism. Pages
>>>> returned from the page allocator are not intended to be ref counted by
>>>> consumers (see previous discussion in [1]), so this keeps Rust's view of
>>>> page ownership as a simple "owned or not". Of course, this is still
>>>> composable as Arc<Owned<Page>> if Rust code needs to reference count its
>>>> own Page allocations for whatever reason.
>>>
>>> I think there's a bit a potential mess here because the conversion to
>>> folios isn't far enough yet that we can entirely ignore page refcounts and
>>> just use folio refcounts. But I guess we can deal with that oddity if we
>>> hit it (maybe folio conversion moves fast enough), since this only really
>>> starts to become relevant for hmm/svm gpu stuff.
>>
>> I'll note that in the future only selected things will be folios (e.g.,
>> pagecache, anonymous memory). Everything else will either get a separate
>> memdesc (e.g., ptdesc), or work on bare pages.
>>
>> Likely, when talking about page tables, "ptdesc" might be what you want to
>> allocate here, and not "folios".
>
> I just posted a series to clean up the iommu code that this is
> cribbing the interface from: add an ioptdesc, remove all the struct
> page from the API and so forth:
>
> https://lore.kernel.org/all/0-v1-416f64558c7c+2a5-iommu_pages_jgg@nvidia.com/
>
> I strongly suggest that rust not expose these old schemes that will
> need another cleanup like the above. Page tables just need an
> allocator using simple void *, kmalloc is good enough for simple
> cases.
>
> Drivers should not be exposing or touching struct page just to
> implement a page table.
Fully agreed, this is going into the right direction. Dumping what's
mapped is a different story. Maybe that dumping logic could simply be
written in C for the time being?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 17:59 ` Asahi Lina
@ 2025-02-04 20:10 ` David Hildenbrand
2025-02-04 21:06 ` Asahi Lina
0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-02-04 20:10 UTC (permalink / raw)
To: Asahi Lina, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
On 04.02.25 18:59, Asahi Lina wrote:
> On 2/4/25 11:38 PM, David Hildenbrand wrote:
>>>>> If the answer is "no" then that's fine. It's still an unsafe function
>>>>> and we need to document in the safety section that it should only be
>>>>> used for memory that is either known to be allocated and pinned and
>>>>> will
>>>>> not be freed while the `struct page` is borrowed, or memory that is
>>>>> reserved and not owned by the buddy allocator, so in practice correct
>>>>> use would not be racy with memory hot-remove anyway.
>>>>>
>>>>> This is already the case for the drm/asahi use case, where the pfns
>>>>> looked up will only ever be one of:
>>>>>
>>>>> - GEM objects that are mapped to the GPU and whose physical pages are
>>>>> therefore pinned (and the VM is locked while this happens so the
>>>>> objects
>>>>> cannot become unpinned out from under the running code),
>>>>
>>>> How exactly are these pages pinned/obtained?
>>>
>>> Under the hood it's shmem. For pinning, it winds up at
>>> `drm_gem_get_pages()`, which I think does a `shmem_read_folio_gfp()` on
>>> a mapping set as unevictable.
>>
>> Thanks. So we grab another folio reference via shmem_read_folio_gfp()-
>>> shmem_get_folio_gfp().
>>
>> Hm, I wonder if we might end up holding folios residing in ZONE_MOVABLE/
>> MIGRATE_CMA longer than we should.
>>
>> Compared to memfd_pin_folios(), which simulates FOLL_LONGTERM and makes
>> sure to migrate pages out of ZONE_MOVABLE/MIGRATE_CMA.
>>
>> But that's a different discussion, just pointing it out, maybe I'm
>> missing something :)
>
> I think this is a little over my head. Though I only just realized that
> we seem to be keeping the GEM objects pinned forever, even after unmap,
> in the drm-shmem core API (I see no drm-shmem entry point that would
> allow the sgt to be freed and its corresponding pages ref to be dropped,
> other than a purge of purgeable objects or final destruction of the
> object). I'll poke around since this feels wrong, I thought we were
> supposed to be able to have shrinker support for swapping out whole GPU
> VMs in the modern GPU MM model, but I guess there's no implementation of
> that for gem-shmem drivers yet...?
I recall that shrinker as well, ... or at least a discussion around it.
[...]
>>
>> If it's only for crash dumps etc. that might even be opt-in, it makes
>> the whole thing a lot less scary. Maybe this could be opt-in somewhere,
>> to "unlock" this interface? Just an idea.
>
> Just to make sure we're on the same page, I don't think there's anything
> to unlock in the Rust abstraction side (this series). At the end of the
> day, if nothing else, the unchecked interface (which the regular
> non-crash page table management code uses for performance) will let you
> use any pfn you want, it's up to documentation and human review to
> specify how it should be used by drivers. What Rust gives us here is the
> mandatory `unsafe {}`, so any attempts to use this API will necessarily
> stick out during review as potentially dangerous code that needs extra
> scrutiny.
>
> For the client driver itself, I could gate the devcoredump stuff behind
> a module parameter or something... but I don't think it's really worth
> it. We don't have a way to reboot the firmware or recover from this
> condition (platform limitations), so end users are stuck rebooting to
> get back a usable machine anyway. If something goes wrong in the
> crashdump code and the machine oopses or locks up worse... it doesn't
> really make much of a difference for normal end users. I don't think
> this will ever really happen given the constraints I described, but if
> somehow it does (some other bug somewhere?), well... the machine was
> already in an unrecoverable state anyway.
>
> It would be nice to have userspace tooling deployed by default that
> saves off the devcoredump somewhere, so we can have a chance at
> debugging hard-to-hit firmware crashes... if it's opt-in, it would only
> really be useful for developers and CI machines.
Is this something that possibly kdump can save or analyze? Because that
is our default "oops, kernel crashed, let's dump the old content so we
can dump it" mechanism on production systems.
but ... I am not familiar with devcoredump. So I don't know when/how it
runs, and if the source system is still alive (and remains alive -- in
contrast to a kernel crash).
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 20:05 ` David Hildenbrand
@ 2025-02-04 20:26 ` Jason Gunthorpe
2025-02-04 20:41 ` David Hildenbrand
0 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-04 20:26 UTC (permalink / raw)
To: David Hildenbrand
Cc: Asahi Lina, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Tue, Feb 04, 2025 at 09:05:47PM +0100, David Hildenbrand wrote:
> Fully agreed, this is going into the right direction. Dumping what's mapped
> is a different story. Maybe that dumping logic could simply be written in C
> for the time being?
?
Isn't dumping just a
decode pte -> phys_to_virt() -> for_each_u64(virt) -> printk?
If you can populate the page table in rust you have to be able to dump
it, surely.
Or do you mean trying to decode the leaf entires into some struct page
detail? Does a GPU need to do that? Agree that would be better as some
C function to take in a KVA instead of a struct page and populate a
string with details about that KVA from the struct page.
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 20:26 ` Jason Gunthorpe
@ 2025-02-04 20:41 ` David Hildenbrand
2025-02-04 20:47 ` David Hildenbrand
2025-02-04 20:49 ` Jason Gunthorpe
0 siblings, 2 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-02-04 20:41 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Asahi Lina, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On 04.02.25 21:26, Jason Gunthorpe wrote:
> On Tue, Feb 04, 2025 at 09:05:47PM +0100, David Hildenbrand wrote:
>> Fully agreed, this is going into the right direction. Dumping what's mapped
>> is a different story. Maybe that dumping logic could simply be written in C
>> for the time being?
>
> ?
>
> Isn't dumping just a
> decode pte -> phys_to_virt() -> for_each_u64(virt) -> printk?
>
IIUC, the problematic bit is that you might not have a directmap such
that phys_to_virt() would tell you the whole story.
> If you can populate the page table in rust you have to be able to dump
> it, surely.
>
> Or do you mean trying to decode the leaf entires into some struct page
> detail? Does a GPU need to do that? Agree that would be better as some
> C function to take in a KVA instead of a struct page and populate a
> string with details about that KVA from the struct page.
>
> Jason
>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 20:41 ` David Hildenbrand
@ 2025-02-04 20:47 ` David Hildenbrand
2025-02-04 21:18 ` Asahi Lina
2025-02-04 20:49 ` Jason Gunthorpe
1 sibling, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-02-04 20:47 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Asahi Lina, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On 04.02.25 21:41, David Hildenbrand wrote:
> On 04.02.25 21:26, Jason Gunthorpe wrote:
>> On Tue, Feb 04, 2025 at 09:05:47PM +0100, David Hildenbrand wrote:
>>> Fully agreed, this is going into the right direction. Dumping what's mapped
>>> is a different story. Maybe that dumping logic could simply be written in C
>>> for the time being?
>>
>> ?
>>
>> Isn't dumping just a
>> decode pte -> phys_to_virt() -> for_each_u64(virt) -> printk?
>>
>
> IIUC, the problematic bit is that you might not have a directmap such
> that phys_to_virt() would tell you the whole story.
... but it's late and I am confused. For dumping the *page table* that
would not be required, only when dumping mapped page content (and at
this point I am not sure if that is a requirement).
So hopefully Asahi Lina can clarify what the issue was (if there is any
:) ).
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 20:41 ` David Hildenbrand
2025-02-04 20:47 ` David Hildenbrand
@ 2025-02-04 20:49 ` Jason Gunthorpe
2025-02-05 23:17 ` Matthew Wilcox
1 sibling, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-04 20:49 UTC (permalink / raw)
To: David Hildenbrand
Cc: Asahi Lina, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On Tue, Feb 04, 2025 at 09:41:29PM +0100, David Hildenbrand wrote:
> On 04.02.25 21:26, Jason Gunthorpe wrote:
> > On Tue, Feb 04, 2025 at 09:05:47PM +0100, David Hildenbrand wrote:
> > > Fully agreed, this is going into the right direction. Dumping what's mapped
> > > is a different story. Maybe that dumping logic could simply be written in C
> > > for the time being?
> >
> > ?
> >
> > Isn't dumping just a
> > decode pte -> phys_to_virt() -> for_each_u64(virt) -> printk?
> >
>
> IIUC, the problematic bit is that you might not have a directmap such that
> phys_to_virt() would tell you the whole story.
The phys_to_virt() I mean is on the page table pages themselves, not
on a leaf.
All page table pages came from alloc_pages_node() and you'd do
virt_to_phys() to stick them into a next-table PTE, then
phys_to_virt() to bring them back:
phys_to_virt(virt_to_phys(page_address(alloc_pages_node())))
Effectively.
The leaf pages could be anything and you should enver phys_to_virt
those, this is what I mean by:
> > Or do you mean trying to decode the leaf entires into some struct page
> > detail? Does a GPU need to do that? Agree that would be better as some
> > C function to take in a KVA instead of a struct page and populate a
> > string with details about that KVA from the struct page.
Though I should have said phys_addr_t not KVA
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 20:10 ` David Hildenbrand
@ 2025-02-04 21:06 ` Asahi Lina
2025-02-06 17:58 ` David Hildenbrand
0 siblings, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-04 21:06 UTC (permalink / raw)
To: David Hildenbrand, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
On 2/5/25 5:10 AM, David Hildenbrand wrote:
> On 04.02.25 18:59, Asahi Lina wrote:
>> On 2/4/25 11:38 PM, David Hildenbrand wrote:
>>>>>> If the answer is "no" then that's fine. It's still an unsafe function
>>>>>> and we need to document in the safety section that it should only be
>>>>>> used for memory that is either known to be allocated and pinned and
>>>>>> will
>>>>>> not be freed while the `struct page` is borrowed, or memory that is
>>>>>> reserved and not owned by the buddy allocator, so in practice correct
>>>>>> use would not be racy with memory hot-remove anyway.
>>>>>>
>>>>>> This is already the case for the drm/asahi use case, where the pfns
>>>>>> looked up will only ever be one of:
>>>>>>
>>>>>> - GEM objects that are mapped to the GPU and whose physical pages are
>>>>>> therefore pinned (and the VM is locked while this happens so the
>>>>>> objects
>>>>>> cannot become unpinned out from under the running code),
>>>>>
>>>>> How exactly are these pages pinned/obtained?
>>>>
>>>> Under the hood it's shmem. For pinning, it winds up at
>>>> `drm_gem_get_pages()`, which I think does a `shmem_read_folio_gfp()` on
>>>> a mapping set as unevictable.
>>>
>>> Thanks. So we grab another folio reference via shmem_read_folio_gfp()-
>>>> shmem_get_folio_gfp().
>>>
>>> Hm, I wonder if we might end up holding folios residing in ZONE_MOVABLE/
>>> MIGRATE_CMA longer than we should.
>>>
>>> Compared to memfd_pin_folios(), which simulates FOLL_LONGTERM and makes
>>> sure to migrate pages out of ZONE_MOVABLE/MIGRATE_CMA.
>>>
>>> But that's a different discussion, just pointing it out, maybe I'm
>>> missing something :)
>>
>> I think this is a little over my head. Though I only just realized that
>> we seem to be keeping the GEM objects pinned forever, even after unmap,
>> in the drm-shmem core API (I see no drm-shmem entry point that would
>> allow the sgt to be freed and its corresponding pages ref to be dropped,
>> other than a purge of purgeable objects or final destruction of the
>> object). I'll poke around since this feels wrong, I thought we were
>> supposed to be able to have shrinker support for swapping out whole GPU
>> VMs in the modern GPU MM model, but I guess there's no implementation of
>> that for gem-shmem drivers yet...?
>
> I recall that shrinker as well, ... or at least a discussion around it.
>
> [...]
>
>>>
>>> If it's only for crash dumps etc. that might even be opt-in, it makes
>>> the whole thing a lot less scary. Maybe this could be opt-in somewhere,
>>> to "unlock" this interface? Just an idea.
>>
>> Just to make sure we're on the same page, I don't think there's anything
>> to unlock in the Rust abstraction side (this series). At the end of the
>> day, if nothing else, the unchecked interface (which the regular
>> non-crash page table management code uses for performance) will let you
>> use any pfn you want, it's up to documentation and human review to
>> specify how it should be used by drivers. What Rust gives us here is the
>> mandatory `unsafe {}`, so any attempts to use this API will necessarily
>> stick out during review as potentially dangerous code that needs extra
>> scrutiny.
>>
>> For the client driver itself, I could gate the devcoredump stuff behind
>> a module parameter or something... but I don't think it's really worth
>> it. We don't have a way to reboot the firmware or recover from this
>> condition (platform limitations), so end users are stuck rebooting to
>> get back a usable machine anyway. If something goes wrong in the
>> crashdump code and the machine oopses or locks up worse... it doesn't
>> really make much of a difference for normal end users. I don't think
>> this will ever really happen given the constraints I described, but if
>> somehow it does (some other bug somewhere?), well... the machine was
>> already in an unrecoverable state anyway.
>>
>> It would be nice to have userspace tooling deployed by default that
>> saves off the devcoredump somewhere, so we can have a chance at
>> debugging hard-to-hit firmware crashes... if it's opt-in, it would only
>> really be useful for developers and CI machines.
>
> Is this something that possibly kdump can save or analyze? Because that
> is our default "oops, kernel crashed, let's dump the old content so we
> can dump it" mechanism on production systems.
kdump does not work on Apple ARM systems because kexec is broken and
cannot be fully fixed, due to multiple platform/firmware limitations. A
very limited version of kexec might work well enough for kdump, but I
don't think anyone has looked into making that work yet...
> but ... I am not familiar with devcoredump. So I don't know when/how it
> runs, and if the source system is still alive (and remains alive -- in
> contrast to a kernel crash).
Devcoredump just makes the dump available via /sys so it can be
collected by the user. The system is still alive, the GPU is just dead
and all future GPU job submissions fail. You can still SSH in or (at
least in theory, if enough moving parts are graceful about it) VT-switch
to a TTY. The display controller is not part of the GPU, it is separate
hardware.
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 20:47 ` David Hildenbrand
@ 2025-02-04 21:18 ` Asahi Lina
2025-02-06 18:02 ` David Hildenbrand
0 siblings, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-04 21:18 UTC (permalink / raw)
To: David Hildenbrand, Jason Gunthorpe
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On 2/5/25 5:47 AM, David Hildenbrand wrote:
> On 04.02.25 21:41, David Hildenbrand wrote:
>> On 04.02.25 21:26, Jason Gunthorpe wrote:
>>> On Tue, Feb 04, 2025 at 09:05:47PM +0100, David Hildenbrand wrote:
>>>> Fully agreed, this is going into the right direction. Dumping what's
>>>> mapped
>>>> is a different story. Maybe that dumping logic could simply be
>>>> written in C
>>>> for the time being?
>>>
>>> ?
>>>
>>> Isn't dumping just a
>>> decode pte -> phys_to_virt() -> for_each_u64(virt) -> printk?
>>>
>>
>> IIUC, the problematic bit is that you might not have a directmap such
>> that phys_to_virt() would tell you the whole story.
>
> ... but it's late and I am confused. For dumping the *page table* that
> would not be required, only when dumping mapped page content (and at
> this point I am not sure if that is a requirement).
>
> So hopefully Asahi Lina can clarify what the issue was (if there is
> any :) ).
Yes, the crash dumper has to dump the mapped page content. In fact, I
don't care about the page tables themselves other than PTE permission
bits, and the page tables alone are not useful for debugging firmware
crashes (and aren't even included in the dump verbatim, at least not the
kernel-allocated ones). The goal of the crash dumper is to take a
complete dump of firmware virtual memory address space (which includes
the kinds of memory I mentioned in [1]). The output is an ELF format
core dump with all memory that the GPU firmware can access, that the
kernel was able to successfully dump (which should be everything except
for MMIO if the bootloader/DT have the right reserved-memory setup).
I *think* phys_to_virt should work just fine for *this* specific use
case on this platform, but I'm not entirely sure. I still want to use
the various pfn check functions before doing that, to exclude ranges
that would definitely not work.
I don't really want to write any part of the driver in C, but whatever
logic is required, I don't think there should be a good reason for that.
We can export the C functions required to Rust code in some form, we
just need to decide what form that should be. Worst case, Rust drivers
can directly access C functions via the `bindings` crate, though that's
something I want to avoid (even if there's only a very thin or trivial
Rust abstraction layer, there should be *something* so at least we get
Rust docs and the ability to make subtle improvements to the
interface/types/etc).
[1]
https://lore.kernel.org/rust-for-linux/1e9ae833-4293-4e48-83b2-c0af36cb3fdc@asahilina.net/
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 14:38 ` David Hildenbrand
2025-02-04 17:59 ` Asahi Lina
@ 2025-02-05 7:40 ` Simona Vetter
2025-02-12 19:07 ` David Hildenbrand
1 sibling, 1 reply; 67+ messages in thread
From: Simona Vetter @ 2025-02-05 7:40 UTC (permalink / raw)
To: David Hildenbrand
Cc: Asahi Lina, Zi Yan, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, Jann Horn, Matthew Wilcox,
Paolo Bonzini, Danilo Krummrich, Wedson Almeida Filho,
Valentin Obst, Andrew Morton, linux-mm, airlied, Abdiel Janulgue,
rust-for-linux, linux-kernel, asahi, Oscar Salvador, Muchun Song
On Tue, Feb 04, 2025 at 03:38:17PM +0100, David Hildenbrand wrote:
> > > It can still race with memory offlining, and it refuses ZONE_DEVICE
> > > pages. For the latter, we have a different way to check validity. See
> > > memory_failure() that first calls pfn_to_online_page() to then check
> > > get_dev_pagemap().
> >
> > I'll give it a shot with these functions. If they work for my use case,
> > then it's good to have extra checks and I'll add them for v2. Thanks!
>
> Let me know if you run into any issues.
>
> >
> > >
> > > >
> > > > If the answer is "no" then that's fine. It's still an unsafe function
> > > > and we need to document in the safety section that it should only be
> > > > used for memory that is either known to be allocated and pinned and will
> > > > not be freed while the `struct page` is borrowed, or memory that is
> > > > reserved and not owned by the buddy allocator, so in practice correct
> > > > use would not be racy with memory hot-remove anyway.
> > > >
> > > > This is already the case for the drm/asahi use case, where the pfns
> > > > looked up will only ever be one of:
> > > >
> > > > - GEM objects that are mapped to the GPU and whose physical pages are
> > > > therefore pinned (and the VM is locked while this happens so the objects
> > > > cannot become unpinned out from under the running code),
> > >
> > > How exactly are these pages pinned/obtained?
> >
> > Under the hood it's shmem. For pinning, it winds up at
> > `drm_gem_get_pages()`, which I think does a `shmem_read_folio_gfp()` on
> > a mapping set as unevictable.
>
> Thanks. So we grab another folio reference via
> shmem_read_folio_gfp()->shmem_get_folio_gfp().
>
> Hm, I wonder if we might end up holding folios residing in
> ZONE_MOVABLE/MIGRATE_CMA longer than we should.
>
> Compared to memfd_pin_folios(), which simulates FOLL_LONGTERM and makes sure
> to migrate pages out of ZONE_MOVABLE/MIGRATE_CMA.
>
> But that's a different discussion, just pointing it out, maybe I'm missing
> something :)
Good GPU Drivers (tm) are supposed to have a shrinker so we can at least
nuke some of them again. Some folks even looked into hooking up a migrate
callback through the address_space (or wherever that hook was, this is
from memory) so we can make this somewhat reliable. So yeah we're hogging
ZONE_MOVEABLE unduly still.
The other side is that there's about 2-3 good drivers (msm, i915, xe
should have a shrinker now too but I didn't check). The others all fall
various levels of short, or still have 3 times cargo-culted versions of
i915's pin-as-a-lock design and get it completely wrong.
So yeah I'm aware this isn't great, and we're at least glacially slowly
moving towards a common shrinker infrastructure that maybe in a glorious
future gets all this right. I mean it took us 15+ years to get to a
cgroups controller after all too, and that was also a well known issue of
just being able to hog memory with no controls and potentially cause
havoc.
Cheers, Sima
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 20:49 ` Jason Gunthorpe
@ 2025-02-05 23:17 ` Matthew Wilcox
2025-02-06 18:04 ` David Hildenbrand
0 siblings, 1 reply; 67+ messages in thread
From: Matthew Wilcox @ 2025-02-05 23:17 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: David Hildenbrand, Asahi Lina, Miguel Ojeda, Alex Gaynor,
Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Jann Horn,
Paolo Bonzini, Danilo Krummrich, Wedson Almeida Filho,
Valentin Obst, Andrew Morton, linux-mm, airlied, Abdiel Janulgue,
rust-for-linux, linux-kernel, asahi
On Tue, Feb 04, 2025 at 04:49:47PM -0400, Jason Gunthorpe wrote:
> The phys_to_virt() I mean is on the page table pages themselves, not
> on a leaf.
>
> All page table pages came from alloc_pages_node() and you'd do
> virt_to_phys() to stick them into a next-table PTE, then
> phys_to_virt() to bring them back:
>
> phys_to_virt(virt_to_phys(page_address(alloc_pages_node())))
>
> Effectively.
except for s390 & ppc where the page table can be smaller than
PAGE_SIZE, so you really only have a physical address that can be
converted to a virtual address.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 21:06 ` Asahi Lina
@ 2025-02-06 17:58 ` David Hildenbrand
2025-02-06 19:18 ` Asahi Lina
0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-02-06 17:58 UTC (permalink / raw)
To: Asahi Lina, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
On 04.02.25 22:06, Asahi Lina wrote:
>
>
> On 2/5/25 5:10 AM, David Hildenbrand wrote:
>> On 04.02.25 18:59, Asahi Lina wrote:
>>> On 2/4/25 11:38 PM, David Hildenbrand wrote:
>>>>>>> If the answer is "no" then that's fine. It's still an unsafe function
>>>>>>> and we need to document in the safety section that it should only be
>>>>>>> used for memory that is either known to be allocated and pinned and
>>>>>>> will
>>>>>>> not be freed while the `struct page` is borrowed, or memory that is
>>>>>>> reserved and not owned by the buddy allocator, so in practice correct
>>>>>>> use would not be racy with memory hot-remove anyway.
>>>>>>>
>>>>>>> This is already the case for the drm/asahi use case, where the pfns
>>>>>>> looked up will only ever be one of:
>>>>>>>
>>>>>>> - GEM objects that are mapped to the GPU and whose physical pages are
>>>>>>> therefore pinned (and the VM is locked while this happens so the
>>>>>>> objects
>>>>>>> cannot become unpinned out from under the running code),
>>>>>>
>>>>>> How exactly are these pages pinned/obtained?
>>>>>
>>>>> Under the hood it's shmem. For pinning, it winds up at
>>>>> `drm_gem_get_pages()`, which I think does a `shmem_read_folio_gfp()` on
>>>>> a mapping set as unevictable.
>>>>
>>>> Thanks. So we grab another folio reference via shmem_read_folio_gfp()-
>>>>> shmem_get_folio_gfp().
>>>>
>>>> Hm, I wonder if we might end up holding folios residing in ZONE_MOVABLE/
>>>> MIGRATE_CMA longer than we should.
>>>>
>>>> Compared to memfd_pin_folios(), which simulates FOLL_LONGTERM and makes
>>>> sure to migrate pages out of ZONE_MOVABLE/MIGRATE_CMA.
>>>>
>>>> But that's a different discussion, just pointing it out, maybe I'm
>>>> missing something :)
>>>
>>> I think this is a little over my head. Though I only just realized that
>>> we seem to be keeping the GEM objects pinned forever, even after unmap,
>>> in the drm-shmem core API (I see no drm-shmem entry point that would
>>> allow the sgt to be freed and its corresponding pages ref to be dropped,
>>> other than a purge of purgeable objects or final destruction of the
>>> object). I'll poke around since this feels wrong, I thought we were
>>> supposed to be able to have shrinker support for swapping out whole GPU
>>> VMs in the modern GPU MM model, but I guess there's no implementation of
>>> that for gem-shmem drivers yet...?
>>
>> I recall that shrinker as well, ... or at least a discussion around it.
>>
>> [...]
>>
>>>>
>>>> If it's only for crash dumps etc. that might even be opt-in, it makes
>>>> the whole thing a lot less scary. Maybe this could be opt-in somewhere,
>>>> to "unlock" this interface? Just an idea.
>>>
>>> Just to make sure we're on the same page, I don't think there's anything
>>> to unlock in the Rust abstraction side (this series). At the end of the
>>> day, if nothing else, the unchecked interface (which the regular
>>> non-crash page table management code uses for performance) will let you
>>> use any pfn you want, it's up to documentation and human review to
>>> specify how it should be used by drivers. What Rust gives us here is the
>>> mandatory `unsafe {}`, so any attempts to use this API will necessarily
>>> stick out during review as potentially dangerous code that needs extra
>>> scrutiny.
>>>
>>> For the client driver itself, I could gate the devcoredump stuff behind
>>> a module parameter or something... but I don't think it's really worth
>>> it. We don't have a way to reboot the firmware or recover from this
>>> condition (platform limitations), so end users are stuck rebooting to
>>> get back a usable machine anyway. If something goes wrong in the
>>> crashdump code and the machine oopses or locks up worse... it doesn't
>>> really make much of a difference for normal end users. I don't think
>>> this will ever really happen given the constraints I described, but if
>>> somehow it does (some other bug somewhere?), well... the machine was
>>> already in an unrecoverable state anyway.
>>>
>>> It would be nice to have userspace tooling deployed by default that
>>> saves off the devcoredump somewhere, so we can have a chance at
>>> debugging hard-to-hit firmware crashes... if it's opt-in, it would only
>>> really be useful for developers and CI machines.
>>
>> Is this something that possibly kdump can save or analyze? Because that
>> is our default "oops, kernel crashed, let's dump the old content so we
>> can dump it" mechanism on production systems.
>
> kdump does not work on Apple ARM systems because kexec is broken and
> cannot be fully fixed, due to multiple platform/firmware limitations. A
> very limited version of kexec might work well enough for kdump, but I
> don't think anyone has looked into making that work yet...
>
>> but ... I am not familiar with devcoredump. So I don't know when/how it
>> runs, and if the source system is still alive (and remains alive -- in
>> contrast to a kernel crash).
>
> Devcoredump just makes the dump available via /sys so it can be
> collected by the user. The system is still alive, the GPU is just dead
> and all future GPU job submissions fail. You can still SSH in or (at
> least in theory, if enough moving parts are graceful about it) VT-switch
> to a TTY. The display controller is not part of the GPU, it is separate
> hardware.
Thanks for all the details (and sorry for the delay, I'm on PTO until
Monday ... :)
(regarding the other mail) Adding that stuff to rust just so we have a
devcoredump that ideally wouldn't exist is a bit unfortunate.
So I'm curious: we do have /proc/kcore, where we do all of the required
filtering, only allowing for reading memory that is online, not
hwpoisoned etc.
makedumpfile already supports /proc/kcore.
Would it be possible to avoid Devcoredump completely either by dumping
/proc/kcore directly or by having a user-space script that walks the
page tables to dump the content purely based on /proc/kcore?
If relevant memory ranges are inaccessible from /proc/kcore, we could
look into exposing them.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-04 21:18 ` Asahi Lina
@ 2025-02-06 18:02 ` David Hildenbrand
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-02-06 18:02 UTC (permalink / raw)
To: Asahi Lina, Jason Gunthorpe
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi
On 04.02.25 22:18, Asahi Lina wrote:
>
>
> On 2/5/25 5:47 AM, David Hildenbrand wrote:
>> On 04.02.25 21:41, David Hildenbrand wrote:
>>> On 04.02.25 21:26, Jason Gunthorpe wrote:
>>>> On Tue, Feb 04, 2025 at 09:05:47PM +0100, David Hildenbrand wrote:
>>>>> Fully agreed, this is going into the right direction. Dumping what's
>>>>> mapped
>>>>> is a different story. Maybe that dumping logic could simply be
>>>>> written in C
>>>>> for the time being?
>>>>
>>>> ?
>>>>
>>>> Isn't dumping just a
>>>> decode pte -> phys_to_virt() -> for_each_u64(virt) -> printk?
>>>>
>>>
>>> IIUC, the problematic bit is that you might not have a directmap such
>>> that phys_to_virt() would tell you the whole story.
>>
>> ... but it's late and I am confused. For dumping the *page table* that
>> would not be required, only when dumping mapped page content (and at
>> this point I am not sure if that is a requirement).
>>
>> So hopefully Asahi Lina can clarify what the issue was (if there is
>> any :) ).
>
> Yes, the crash dumper has to dump the mapped page content. In fact, I
> don't care about the page tables themselves other than PTE permission
> bits, and the page tables alone are not useful for debugging firmware
> crashes (and aren't even included in the dump verbatim, at least not the
> kernel-allocated ones). The goal of the crash dumper is to take a
> complete dump of firmware virtual memory address space (which includes
> the kinds of memory I mentioned in [1]). The output is an ELF format
> core dump with all memory that the GPU firmware can access, that the
> kernel was able to successfully dump (which should be everything except
> for MMIO if the bootloader/DT have the right reserved-memory setup).
>
> I *think* phys_to_virt should work just fine for *this* specific use
> case on this platform, but I'm not entirely sure. I still want to use
> the various pfn check functions before doing that, to exclude ranges
> that would definitely not work.
See my other mail, in /proc/kcore we do exactly that. See
fs/proc/kcore.c, especially the KCORE_RAM case in read_kcore_iter().
So I would hope that at least for the Devcoredump we can find an
alternative that does not require rust bindings, and maybe not even new
c code at all (in a perfect world where you could just reuse /proc/kcore).
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-05 23:17 ` Matthew Wilcox
@ 2025-02-06 18:04 ` David Hildenbrand
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-02-06 18:04 UTC (permalink / raw)
To: Matthew Wilcox, Jason Gunthorpe
Cc: Asahi Lina, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
On 06.02.25 00:17, Matthew Wilcox wrote:
> On Tue, Feb 04, 2025 at 04:49:47PM -0400, Jason Gunthorpe wrote:
>> The phys_to_virt() I mean is on the page table pages themselves, not
>> on a leaf.
>>
>> All page table pages came from alloc_pages_node() and you'd do
>> virt_to_phys() to stick them into a next-table PTE, then
>> phys_to_virt() to bring them back:
>>
>> phys_to_virt(virt_to_phys(page_address(alloc_pages_node())))
>>
>> Effectively.
>
> except for s390 & ppc where the page table can be smaller than
> PAGE_SIZE, so you really only have a physical address that can be
> converted to a virtual address.
I recall in the context of IOMMU page tables we don't perform this
packing. So what you say is certainly true for process page tables, but
likely not for IOMMU etc in the context of this series.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-06 17:58 ` David Hildenbrand
@ 2025-02-06 19:18 ` Asahi Lina
2025-02-06 19:27 ` Asahi Lina
2025-02-12 19:01 ` David Hildenbrand
0 siblings, 2 replies; 67+ messages in thread
From: Asahi Lina @ 2025-02-06 19:18 UTC (permalink / raw)
To: David Hildenbrand, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
On 2/7/25 2:58 AM, David Hildenbrand wrote:
> On 04.02.25 22:06, Asahi Lina wrote:
>>
>>
>> On 2/5/25 5:10 AM, David Hildenbrand wrote:
>>> On 04.02.25 18:59, Asahi Lina wrote:
>>>> On 2/4/25 11:38 PM, David Hildenbrand wrote:
>>>>>>>> If the answer is "no" then that's fine. It's still an unsafe
>>>>>>>> function
>>>>>>>> and we need to document in the safety section that it should
>>>>>>>> only be
>>>>>>>> used for memory that is either known to be allocated and pinned and
>>>>>>>> will
>>>>>>>> not be freed while the `struct page` is borrowed, or memory that is
>>>>>>>> reserved and not owned by the buddy allocator, so in practice
>>>>>>>> correct
>>>>>>>> use would not be racy with memory hot-remove anyway.
>>>>>>>>
>>>>>>>> This is already the case for the drm/asahi use case, where the pfns
>>>>>>>> looked up will only ever be one of:
>>>>>>>>
>>>>>>>> - GEM objects that are mapped to the GPU and whose physical
>>>>>>>> pages are
>>>>>>>> therefore pinned (and the VM is locked while this happens so the
>>>>>>>> objects
>>>>>>>> cannot become unpinned out from under the running code),
>>>>>>>
>>>>>>> How exactly are these pages pinned/obtained?
>>>>>>
>>>>>> Under the hood it's shmem. For pinning, it winds up at
>>>>>> `drm_gem_get_pages()`, which I think does a
>>>>>> `shmem_read_folio_gfp()` on
>>>>>> a mapping set as unevictable.
>>>>>
>>>>> Thanks. So we grab another folio reference via shmem_read_folio_gfp()-
>>>>>> shmem_get_folio_gfp().
>>>>>
>>>>> Hm, I wonder if we might end up holding folios residing in
>>>>> ZONE_MOVABLE/
>>>>> MIGRATE_CMA longer than we should.
>>>>>
>>>>> Compared to memfd_pin_folios(), which simulates FOLL_LONGTERM and
>>>>> makes
>>>>> sure to migrate pages out of ZONE_MOVABLE/MIGRATE_CMA.
>>>>>
>>>>> But that's a different discussion, just pointing it out, maybe I'm
>>>>> missing something :)
>>>>
>>>> I think this is a little over my head. Though I only just realized that
>>>> we seem to be keeping the GEM objects pinned forever, even after unmap,
>>>> in the drm-shmem core API (I see no drm-shmem entry point that would
>>>> allow the sgt to be freed and its corresponding pages ref to be
>>>> dropped,
>>>> other than a purge of purgeable objects or final destruction of the
>>>> object). I'll poke around since this feels wrong, I thought we were
>>>> supposed to be able to have shrinker support for swapping out whole GPU
>>>> VMs in the modern GPU MM model, but I guess there's no
>>>> implementation of
>>>> that for gem-shmem drivers yet...?
>>>
>>> I recall that shrinker as well, ... or at least a discussion around it.
>>>
>>> [...]
>>>
>>>>>
>>>>> If it's only for crash dumps etc. that might even be opt-in, it makes
>>>>> the whole thing a lot less scary. Maybe this could be opt-in
>>>>> somewhere,
>>>>> to "unlock" this interface? Just an idea.
>>>>
>>>> Just to make sure we're on the same page, I don't think there's
>>>> anything
>>>> to unlock in the Rust abstraction side (this series). At the end of the
>>>> day, if nothing else, the unchecked interface (which the regular
>>>> non-crash page table management code uses for performance) will let you
>>>> use any pfn you want, it's up to documentation and human review to
>>>> specify how it should be used by drivers. What Rust gives us here is
>>>> the
>>>> mandatory `unsafe {}`, so any attempts to use this API will necessarily
>>>> stick out during review as potentially dangerous code that needs extra
>>>> scrutiny.
>>>>
>>>> For the client driver itself, I could gate the devcoredump stuff behind
>>>> a module parameter or something... but I don't think it's really worth
>>>> it. We don't have a way to reboot the firmware or recover from this
>>>> condition (platform limitations), so end users are stuck rebooting to
>>>> get back a usable machine anyway. If something goes wrong in the
>>>> crashdump code and the machine oopses or locks up worse... it doesn't
>>>> really make much of a difference for normal end users. I don't think
>>>> this will ever really happen given the constraints I described, but if
>>>> somehow it does (some other bug somewhere?), well... the machine was
>>>> already in an unrecoverable state anyway.
>>>>
>>>> It would be nice to have userspace tooling deployed by default that
>>>> saves off the devcoredump somewhere, so we can have a chance at
>>>> debugging hard-to-hit firmware crashes... if it's opt-in, it would only
>>>> really be useful for developers and CI machines.
>>>
>>> Is this something that possibly kdump can save or analyze? Because that
>>> is our default "oops, kernel crashed, let's dump the old content so we
>>> can dump it" mechanism on production systems.
>>
>> kdump does not work on Apple ARM systems because kexec is broken and
>> cannot be fully fixed, due to multiple platform/firmware limitations. A
>> very limited version of kexec might work well enough for kdump, but I
>> don't think anyone has looked into making that work yet...
>>
>>> but ... I am not familiar with devcoredump. So I don't know when/how it
>>> runs, and if the source system is still alive (and remains alive -- in
>>> contrast to a kernel crash).
>>
>> Devcoredump just makes the dump available via /sys so it can be
>> collected by the user. The system is still alive, the GPU is just dead
>> and all future GPU job submissions fail. You can still SSH in or (at
>> least in theory, if enough moving parts are graceful about it) VT-switch
>> to a TTY. The display controller is not part of the GPU, it is separate
>> hardware.
>
>
> Thanks for all the details (and sorry for the delay, I'm on PTO until
> Monday ... :)
>
> (regarding the other mail) Adding that stuff to rust just so we have a
> devcoredump that ideally wouldn't exist is a bit unfortunate.
>
> So I'm curious: we do have /proc/kcore, where we do all of the required
> filtering, only allowing for reading memory that is online, not
> hwpoisoned etc.
>
> makedumpfile already supports /proc/kcore.
>
> Would it be possible to avoid Devcoredump completely either by dumping /
> proc/kcore directly or by having a user-space script that walks the page
> tables to dump the content purely based on /proc/kcore?
>
> If relevant memory ranges are inaccessible from /proc/kcore, we could
> look into exposing them.
I'm not sure that's a good idea... the dump code runs when the GPU
crashes, and makes copies of all the memory pages into newly allocated
pages (this is around 16MB for a typical dump, and if allocation fails
we just bail and clean up). Then userspace can read the coredump at its
leisure. AIUI, this is exactly the intended use case of devcoredump. It
also means that anyone can grab a core dump with just a `cp`, without
needing any bespoke tools.
After the snapshot is taken, the kernel will complete (fail) all GPU
jobs, which means much of the shared memory will be freed and some
structures will change contents. If we defer the coredump to userspace,
then it would not be able to capture the state of all relevant memory
exactly at the crash time, which could be very confusing.
In theory I could change the allocators to not free or touch anything
after a crash, and add guards to any mutations in the driver to avoid
any changes after a crash... but that feels a lot more brittle and
error-prone than just taking the core dump at the right time.
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-06 19:18 ` Asahi Lina
@ 2025-02-06 19:27 ` Asahi Lina
2025-02-12 19:06 ` David Hildenbrand
2025-02-12 19:01 ` David Hildenbrand
1 sibling, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-06 19:27 UTC (permalink / raw)
To: David Hildenbrand, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
On 2/7/25 4:18 AM, Asahi Lina wrote:
>
>
> On 2/7/25 2:58 AM, David Hildenbrand wrote:
>> On 04.02.25 22:06, Asahi Lina wrote:
>>>
>>>
>>> On 2/5/25 5:10 AM, David Hildenbrand wrote:
>>>> On 04.02.25 18:59, Asahi Lina wrote:
>>>>> On 2/4/25 11:38 PM, David Hildenbrand wrote:
>>>>>>>>> If the answer is "no" then that's fine. It's still an unsafe
>>>>>>>>> function
>>>>>>>>> and we need to document in the safety section that it should
>>>>>>>>> only be
>>>>>>>>> used for memory that is either known to be allocated and pinned and
>>>>>>>>> will
>>>>>>>>> not be freed while the `struct page` is borrowed, or memory that is
>>>>>>>>> reserved and not owned by the buddy allocator, so in practice
>>>>>>>>> correct
>>>>>>>>> use would not be racy with memory hot-remove anyway.
>>>>>>>>>
>>>>>>>>> This is already the case for the drm/asahi use case, where the pfns
>>>>>>>>> looked up will only ever be one of:
>>>>>>>>>
>>>>>>>>> - GEM objects that are mapped to the GPU and whose physical
>>>>>>>>> pages are
>>>>>>>>> therefore pinned (and the VM is locked while this happens so the
>>>>>>>>> objects
>>>>>>>>> cannot become unpinned out from under the running code),
>>>>>>>>
>>>>>>>> How exactly are these pages pinned/obtained?
>>>>>>>
>>>>>>> Under the hood it's shmem. For pinning, it winds up at
>>>>>>> `drm_gem_get_pages()`, which I think does a
>>>>>>> `shmem_read_folio_gfp()` on
>>>>>>> a mapping set as unevictable.
>>>>>>
>>>>>> Thanks. So we grab another folio reference via shmem_read_folio_gfp()-
>>>>>>> shmem_get_folio_gfp().
>>>>>>
>>>>>> Hm, I wonder if we might end up holding folios residing in
>>>>>> ZONE_MOVABLE/
>>>>>> MIGRATE_CMA longer than we should.
>>>>>>
>>>>>> Compared to memfd_pin_folios(), which simulates FOLL_LONGTERM and
>>>>>> makes
>>>>>> sure to migrate pages out of ZONE_MOVABLE/MIGRATE_CMA.
>>>>>>
>>>>>> But that's a different discussion, just pointing it out, maybe I'm
>>>>>> missing something :)
>>>>>
>>>>> I think this is a little over my head. Though I only just realized that
>>>>> we seem to be keeping the GEM objects pinned forever, even after unmap,
>>>>> in the drm-shmem core API (I see no drm-shmem entry point that would
>>>>> allow the sgt to be freed and its corresponding pages ref to be
>>>>> dropped,
>>>>> other than a purge of purgeable objects or final destruction of the
>>>>> object). I'll poke around since this feels wrong, I thought we were
>>>>> supposed to be able to have shrinker support for swapping out whole GPU
>>>>> VMs in the modern GPU MM model, but I guess there's no
>>>>> implementation of
>>>>> that for gem-shmem drivers yet...?
>>>>
>>>> I recall that shrinker as well, ... or at least a discussion around it.
>>>>
>>>> [...]
>>>>
>>>>>>
>>>>>> If it's only for crash dumps etc. that might even be opt-in, it makes
>>>>>> the whole thing a lot less scary. Maybe this could be opt-in
>>>>>> somewhere,
>>>>>> to "unlock" this interface? Just an idea.
>>>>>
>>>>> Just to make sure we're on the same page, I don't think there's
>>>>> anything
>>>>> to unlock in the Rust abstraction side (this series). At the end of the
>>>>> day, if nothing else, the unchecked interface (which the regular
>>>>> non-crash page table management code uses for performance) will let you
>>>>> use any pfn you want, it's up to documentation and human review to
>>>>> specify how it should be used by drivers. What Rust gives us here is
>>>>> the
>>>>> mandatory `unsafe {}`, so any attempts to use this API will necessarily
>>>>> stick out during review as potentially dangerous code that needs extra
>>>>> scrutiny.
>>>>>
>>>>> For the client driver itself, I could gate the devcoredump stuff behind
>>>>> a module parameter or something... but I don't think it's really worth
>>>>> it. We don't have a way to reboot the firmware or recover from this
>>>>> condition (platform limitations), so end users are stuck rebooting to
>>>>> get back a usable machine anyway. If something goes wrong in the
>>>>> crashdump code and the machine oopses or locks up worse... it doesn't
>>>>> really make much of a difference for normal end users. I don't think
>>>>> this will ever really happen given the constraints I described, but if
>>>>> somehow it does (some other bug somewhere?), well... the machine was
>>>>> already in an unrecoverable state anyway.
>>>>>
>>>>> It would be nice to have userspace tooling deployed by default that
>>>>> saves off the devcoredump somewhere, so we can have a chance at
>>>>> debugging hard-to-hit firmware crashes... if it's opt-in, it would only
>>>>> really be useful for developers and CI machines.
>>>>
>>>> Is this something that possibly kdump can save or analyze? Because that
>>>> is our default "oops, kernel crashed, let's dump the old content so we
>>>> can dump it" mechanism on production systems.
>>>
>>> kdump does not work on Apple ARM systems because kexec is broken and
>>> cannot be fully fixed, due to multiple platform/firmware limitations. A
>>> very limited version of kexec might work well enough for kdump, but I
>>> don't think anyone has looked into making that work yet...
>>>
>>>> but ... I am not familiar with devcoredump. So I don't know when/how it
>>>> runs, and if the source system is still alive (and remains alive -- in
>>>> contrast to a kernel crash).
>>>
>>> Devcoredump just makes the dump available via /sys so it can be
>>> collected by the user. The system is still alive, the GPU is just dead
>>> and all future GPU job submissions fail. You can still SSH in or (at
>>> least in theory, if enough moving parts are graceful about it) VT-switch
>>> to a TTY. The display controller is not part of the GPU, it is separate
>>> hardware.
>>
>>
>> Thanks for all the details (and sorry for the delay, I'm on PTO until
>> Monday ... :)
>>
>> (regarding the other mail) Adding that stuff to rust just so we have a
>> devcoredump that ideally wouldn't exist is a bit unfortunate.
>>
>> So I'm curious: we do have /proc/kcore, where we do all of the required
>> filtering, only allowing for reading memory that is online, not
>> hwpoisoned etc.
>>
>> makedumpfile already supports /proc/kcore.
>>
>> Would it be possible to avoid Devcoredump completely either by dumping /
>> proc/kcore directly or by having a user-space script that walks the page
>> tables to dump the content purely based on /proc/kcore?
>>
>> If relevant memory ranges are inaccessible from /proc/kcore, we could
>> look into exposing them.
>
> I'm not sure that's a good idea... the dump code runs when the GPU
> crashes, and makes copies of all the memory pages into newly allocated
> pages (this is around 16MB for a typical dump, and if allocation fails
> we just bail and clean up). Then userspace can read the coredump at its
> leisure. AIUI, this is exactly the intended use case of devcoredump. It
> also means that anyone can grab a core dump with just a `cp`, without
> needing any bespoke tools.
>
> After the snapshot is taken, the kernel will complete (fail) all GPU
> jobs, which means much of the shared memory will be freed and some
> structures will change contents. If we defer the coredump to userspace,
> then it would not be able to capture the state of all relevant memory
> exactly at the crash time, which could be very confusing.
>
> In theory I could change the allocators to not free or touch anything
> after a crash, and add guards to any mutations in the driver to avoid
> any changes after a crash... but that feels a lot more brittle and
> error-prone than just taking the core dump at the right time.
>
If the arbitrary page lookups are that big a problem, I think I would
rather just memremap the all the bootloader-mapped firmware areas, hook
into all the allocators to provide a backdoor into the backing objects,
and just piece everything together by mapping page addresses to those.
It would be a bunch of extra code and scaffolding in the driver, and
require device tree and bootloader changes to link up the GPU node to
its firmware nodes, but it's still better than trying to do it all from
userspace IMO...
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-06 19:18 ` Asahi Lina
2025-02-06 19:27 ` Asahi Lina
@ 2025-02-12 19:01 ` David Hildenbrand
1 sibling, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-02-12 19:01 UTC (permalink / raw)
To: Asahi Lina, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
On 06.02.25 20:18, Asahi Lina wrote:
>
>
> On 2/7/25 2:58 AM, David Hildenbrand wrote:
>> On 04.02.25 22:06, Asahi Lina wrote:
>>>
>>>
>>> On 2/5/25 5:10 AM, David Hildenbrand wrote:
>>>> On 04.02.25 18:59, Asahi Lina wrote:
>>>>> On 2/4/25 11:38 PM, David Hildenbrand wrote:
>>>>>>>>> If the answer is "no" then that's fine. It's still an unsafe
>>>>>>>>> function
>>>>>>>>> and we need to document in the safety section that it should
>>>>>>>>> only be
>>>>>>>>> used for memory that is either known to be allocated and pinned and
>>>>>>>>> will
>>>>>>>>> not be freed while the `struct page` is borrowed, or memory that is
>>>>>>>>> reserved and not owned by the buddy allocator, so in practice
>>>>>>>>> correct
>>>>>>>>> use would not be racy with memory hot-remove anyway.
>>>>>>>>>
>>>>>>>>> This is already the case for the drm/asahi use case, where the pfns
>>>>>>>>> looked up will only ever be one of:
>>>>>>>>>
>>>>>>>>> - GEM objects that are mapped to the GPU and whose physical
>>>>>>>>> pages are
>>>>>>>>> therefore pinned (and the VM is locked while this happens so the
>>>>>>>>> objects
>>>>>>>>> cannot become unpinned out from under the running code),
>>>>>>>>
>>>>>>>> How exactly are these pages pinned/obtained?
>>>>>>>
>>>>>>> Under the hood it's shmem. For pinning, it winds up at
>>>>>>> `drm_gem_get_pages()`, which I think does a
>>>>>>> `shmem_read_folio_gfp()` on
>>>>>>> a mapping set as unevictable.
>>>>>>
>>>>>> Thanks. So we grab another folio reference via shmem_read_folio_gfp()-
>>>>>>> shmem_get_folio_gfp().
>>>>>>
>>>>>> Hm, I wonder if we might end up holding folios residing in
>>>>>> ZONE_MOVABLE/
>>>>>> MIGRATE_CMA longer than we should.
>>>>>>
>>>>>> Compared to memfd_pin_folios(), which simulates FOLL_LONGTERM and
>>>>>> makes
>>>>>> sure to migrate pages out of ZONE_MOVABLE/MIGRATE_CMA.
>>>>>>
>>>>>> But that's a different discussion, just pointing it out, maybe I'm
>>>>>> missing something :)
>>>>>
>>>>> I think this is a little over my head. Though I only just realized that
>>>>> we seem to be keeping the GEM objects pinned forever, even after unmap,
>>>>> in the drm-shmem core API (I see no drm-shmem entry point that would
>>>>> allow the sgt to be freed and its corresponding pages ref to be
>>>>> dropped,
>>>>> other than a purge of purgeable objects or final destruction of the
>>>>> object). I'll poke around since this feels wrong, I thought we were
>>>>> supposed to be able to have shrinker support for swapping out whole GPU
>>>>> VMs in the modern GPU MM model, but I guess there's no
>>>>> implementation of
>>>>> that for gem-shmem drivers yet...?
>>>>
>>>> I recall that shrinker as well, ... or at least a discussion around it.
>>>>
>>>> [...]
>>>>
>>>>>>
>>>>>> If it's only for crash dumps etc. that might even be opt-in, it makes
>>>>>> the whole thing a lot less scary. Maybe this could be opt-in
>>>>>> somewhere,
>>>>>> to "unlock" this interface? Just an idea.
>>>>>
>>>>> Just to make sure we're on the same page, I don't think there's
>>>>> anything
>>>>> to unlock in the Rust abstraction side (this series). At the end of the
>>>>> day, if nothing else, the unchecked interface (which the regular
>>>>> non-crash page table management code uses for performance) will let you
>>>>> use any pfn you want, it's up to documentation and human review to
>>>>> specify how it should be used by drivers. What Rust gives us here is
>>>>> the
>>>>> mandatory `unsafe {}`, so any attempts to use this API will necessarily
>>>>> stick out during review as potentially dangerous code that needs extra
>>>>> scrutiny.
>>>>>
>>>>> For the client driver itself, I could gate the devcoredump stuff behind
>>>>> a module parameter or something... but I don't think it's really worth
>>>>> it. We don't have a way to reboot the firmware or recover from this
>>>>> condition (platform limitations), so end users are stuck rebooting to
>>>>> get back a usable machine anyway. If something goes wrong in the
>>>>> crashdump code and the machine oopses or locks up worse... it doesn't
>>>>> really make much of a difference for normal end users. I don't think
>>>>> this will ever really happen given the constraints I described, but if
>>>>> somehow it does (some other bug somewhere?), well... the machine was
>>>>> already in an unrecoverable state anyway.
>>>>>
>>>>> It would be nice to have userspace tooling deployed by default that
>>>>> saves off the devcoredump somewhere, so we can have a chance at
>>>>> debugging hard-to-hit firmware crashes... if it's opt-in, it would only
>>>>> really be useful for developers and CI machines.
>>>>
>>>> Is this something that possibly kdump can save or analyze? Because that
>>>> is our default "oops, kernel crashed, let's dump the old content so we
>>>> can dump it" mechanism on production systems.
>>>
>>> kdump does not work on Apple ARM systems because kexec is broken and
>>> cannot be fully fixed, due to multiple platform/firmware limitations. A
>>> very limited version of kexec might work well enough for kdump, but I
>>> don't think anyone has looked into making that work yet...
>>>
>>>> but ... I am not familiar with devcoredump. So I don't know when/how it
>>>> runs, and if the source system is still alive (and remains alive -- in
>>>> contrast to a kernel crash).
>>>
>>> Devcoredump just makes the dump available via /sys so it can be
>>> collected by the user. The system is still alive, the GPU is just dead
>>> and all future GPU job submissions fail. You can still SSH in or (at
>>> least in theory, if enough moving parts are graceful about it) VT-switch
>>> to a TTY. The display controller is not part of the GPU, it is separate
>>> hardware.
>>
>>
>> Thanks for all the details (and sorry for the delay, I'm on PTO until
>> Monday ... :)
>>
>> (regarding the other mail) Adding that stuff to rust just so we have a
>> devcoredump that ideally wouldn't exist is a bit unfortunate.
>>
>> So I'm curious: we do have /proc/kcore, where we do all of the required
>> filtering, only allowing for reading memory that is online, not
>> hwpoisoned etc.
>>
>> makedumpfile already supports /proc/kcore.
>>
>> Would it be possible to avoid Devcoredump completely either by dumping /
>> proc/kcore directly or by having a user-space script that walks the page
>> tables to dump the content purely based on /proc/kcore?
>>
>> If relevant memory ranges are inaccessible from /proc/kcore, we could
>> look into exposing them.
>
> I'm not sure that's a good idea... the dump code runs when the GPU
> crashes, and makes copies of all the memory pages into newly allocated
> pages (this is around 16MB for a typical dump, and if allocation fails
> we just bail and clean up). Then userspace can read the coredump at its
> leisure. AIUI, this is exactly the intended use case of devcoredump. It
> also means that anyone can grab a core dump with just a `cp`, without
> needing any bespoke tools.
>
> After the snapshot is taken, the kernel will complete (fail) all GPU
> jobs, which means much of the shared memory will be freed and some
> structures will change contents.
Ah, okay, that's an issue.
> If we defer the coredump to userspace,
> then it would not be able to capture the state of all relevant memory
> exactly at the crash time, which could be very confusing.
>
> In theory I could change the allocators to not free or touch anything
> after a crash, and add guards to any mutations in the driver to avoid
> any changes after a crash... but that feels a lot more brittle and
> error-prone than just taking the core dump at the right time.
Agreed.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-06 19:27 ` Asahi Lina
@ 2025-02-12 19:06 ` David Hildenbrand
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-02-12 19:06 UTC (permalink / raw)
To: Asahi Lina, Zi Yan
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Danilo Krummrich, Wedson Almeida Filho, Valentin Obst,
Andrew Morton, linux-mm, airlied, Abdiel Janulgue, rust-for-linux,
linux-kernel, asahi, Oscar Salvador, Muchun Song
On 06.02.25 20:27, Asahi Lina wrote:
>
>
> On 2/7/25 4:18 AM, Asahi Lina wrote:
>>
>>
>> On 2/7/25 2:58 AM, David Hildenbrand wrote:
>>> On 04.02.25 22:06, Asahi Lina wrote:
>>>>
>>>>
>>>> On 2/5/25 5:10 AM, David Hildenbrand wrote:
>>>>> On 04.02.25 18:59, Asahi Lina wrote:
>>>>>> On 2/4/25 11:38 PM, David Hildenbrand wrote:
>>>>>>>>>> If the answer is "no" then that's fine. It's still an unsafe
>>>>>>>>>> function
>>>>>>>>>> and we need to document in the safety section that it should
>>>>>>>>>> only be
>>>>>>>>>> used for memory that is either known to be allocated and pinned and
>>>>>>>>>> will
>>>>>>>>>> not be freed while the `struct page` is borrowed, or memory that is
>>>>>>>>>> reserved and not owned by the buddy allocator, so in practice
>>>>>>>>>> correct
>>>>>>>>>> use would not be racy with memory hot-remove anyway.
>>>>>>>>>>
>>>>>>>>>> This is already the case for the drm/asahi use case, where the pfns
>>>>>>>>>> looked up will only ever be one of:
>>>>>>>>>>
>>>>>>>>>> - GEM objects that are mapped to the GPU and whose physical
>>>>>>>>>> pages are
>>>>>>>>>> therefore pinned (and the VM is locked while this happens so the
>>>>>>>>>> objects
>>>>>>>>>> cannot become unpinned out from under the running code),
>>>>>>>>>
>>>>>>>>> How exactly are these pages pinned/obtained?
>>>>>>>>
>>>>>>>> Under the hood it's shmem. For pinning, it winds up at
>>>>>>>> `drm_gem_get_pages()`, which I think does a
>>>>>>>> `shmem_read_folio_gfp()` on
>>>>>>>> a mapping set as unevictable.
>>>>>>>
>>>>>>> Thanks. So we grab another folio reference via shmem_read_folio_gfp()-
>>>>>>>> shmem_get_folio_gfp().
>>>>>>>
>>>>>>> Hm, I wonder if we might end up holding folios residing in
>>>>>>> ZONE_MOVABLE/
>>>>>>> MIGRATE_CMA longer than we should.
>>>>>>>
>>>>>>> Compared to memfd_pin_folios(), which simulates FOLL_LONGTERM and
>>>>>>> makes
>>>>>>> sure to migrate pages out of ZONE_MOVABLE/MIGRATE_CMA.
>>>>>>>
>>>>>>> But that's a different discussion, just pointing it out, maybe I'm
>>>>>>> missing something :)
>>>>>>
>>>>>> I think this is a little over my head. Though I only just realized that
>>>>>> we seem to be keeping the GEM objects pinned forever, even after unmap,
>>>>>> in the drm-shmem core API (I see no drm-shmem entry point that would
>>>>>> allow the sgt to be freed and its corresponding pages ref to be
>>>>>> dropped,
>>>>>> other than a purge of purgeable objects or final destruction of the
>>>>>> object). I'll poke around since this feels wrong, I thought we were
>>>>>> supposed to be able to have shrinker support for swapping out whole GPU
>>>>>> VMs in the modern GPU MM model, but I guess there's no
>>>>>> implementation of
>>>>>> that for gem-shmem drivers yet...?
>>>>>
>>>>> I recall that shrinker as well, ... or at least a discussion around it.
>>>>>
>>>>> [...]
>>>>>
>>>>>>>
>>>>>>> If it's only for crash dumps etc. that might even be opt-in, it makes
>>>>>>> the whole thing a lot less scary. Maybe this could be opt-in
>>>>>>> somewhere,
>>>>>>> to "unlock" this interface? Just an idea.
>>>>>>
>>>>>> Just to make sure we're on the same page, I don't think there's
>>>>>> anything
>>>>>> to unlock in the Rust abstraction side (this series). At the end of the
>>>>>> day, if nothing else, the unchecked interface (which the regular
>>>>>> non-crash page table management code uses for performance) will let you
>>>>>> use any pfn you want, it's up to documentation and human review to
>>>>>> specify how it should be used by drivers. What Rust gives us here is
>>>>>> the
>>>>>> mandatory `unsafe {}`, so any attempts to use this API will necessarily
>>>>>> stick out during review as potentially dangerous code that needs extra
>>>>>> scrutiny.
>>>>>>
>>>>>> For the client driver itself, I could gate the devcoredump stuff behind
>>>>>> a module parameter or something... but I don't think it's really worth
>>>>>> it. We don't have a way to reboot the firmware or recover from this
>>>>>> condition (platform limitations), so end users are stuck rebooting to
>>>>>> get back a usable machine anyway. If something goes wrong in the
>>>>>> crashdump code and the machine oopses or locks up worse... it doesn't
>>>>>> really make much of a difference for normal end users. I don't think
>>>>>> this will ever really happen given the constraints I described, but if
>>>>>> somehow it does (some other bug somewhere?), well... the machine was
>>>>>> already in an unrecoverable state anyway.
>>>>>>
>>>>>> It would be nice to have userspace tooling deployed by default that
>>>>>> saves off the devcoredump somewhere, so we can have a chance at
>>>>>> debugging hard-to-hit firmware crashes... if it's opt-in, it would only
>>>>>> really be useful for developers and CI machines.
>>>>>
>>>>> Is this something that possibly kdump can save or analyze? Because that
>>>>> is our default "oops, kernel crashed, let's dump the old content so we
>>>>> can dump it" mechanism on production systems.
>>>>
>>>> kdump does not work on Apple ARM systems because kexec is broken and
>>>> cannot be fully fixed, due to multiple platform/firmware limitations. A
>>>> very limited version of kexec might work well enough for kdump, but I
>>>> don't think anyone has looked into making that work yet...
>>>>
>>>>> but ... I am not familiar with devcoredump. So I don't know when/how it
>>>>> runs, and if the source system is still alive (and remains alive -- in
>>>>> contrast to a kernel crash).
>>>>
>>>> Devcoredump just makes the dump available via /sys so it can be
>>>> collected by the user. The system is still alive, the GPU is just dead
>>>> and all future GPU job submissions fail. You can still SSH in or (at
>>>> least in theory, if enough moving parts are graceful about it) VT-switch
>>>> to a TTY. The display controller is not part of the GPU, it is separate
>>>> hardware.
>>>
>>>
>>> Thanks for all the details (and sorry for the delay, I'm on PTO until
>>> Monday ... :)
>>>
>>> (regarding the other mail) Adding that stuff to rust just so we have a
>>> devcoredump that ideally wouldn't exist is a bit unfortunate.
>>>
>>> So I'm curious: we do have /proc/kcore, where we do all of the required
>>> filtering, only allowing for reading memory that is online, not
>>> hwpoisoned etc.
>>>
>>> makedumpfile already supports /proc/kcore.
>>>
>>> Would it be possible to avoid Devcoredump completely either by dumping /
>>> proc/kcore directly or by having a user-space script that walks the page
>>> tables to dump the content purely based on /proc/kcore?
>>>
>>> If relevant memory ranges are inaccessible from /proc/kcore, we could
>>> look into exposing them.
>>
>> I'm not sure that's a good idea... the dump code runs when the GPU
>> crashes, and makes copies of all the memory pages into newly allocated
>> pages (this is around 16MB for a typical dump, and if allocation fails
>> we just bail and clean up). Then userspace can read the coredump at its
>> leisure. AIUI, this is exactly the intended use case of devcoredump. It
>> also means that anyone can grab a core dump with just a `cp`, without
>> needing any bespoke tools.
>>
>> After the snapshot is taken, the kernel will complete (fail) all GPU
>> jobs, which means much of the shared memory will be freed and some
>> structures will change contents. If we defer the coredump to userspace,
>> then it would not be able to capture the state of all relevant memory
>> exactly at the crash time, which could be very confusing.
>>
>> In theory I could change the allocators to not free or touch anything
>> after a crash, and add guards to any mutations in the driver to avoid
>> any changes after a crash... but that feels a lot more brittle and
>> error-prone than just taking the core dump at the right time.
>>
>
> If the arbitrary page lookups are that big a problem, I think I would
> rather just memremap the all the bootloader-mapped firmware areas, hook
> into all the allocators to provide a backdoor into the backing objects,
> and just piece everything together by mapping page addresses to those.
> It would be a bunch of extra code and scaffolding in the driver, and
> require device tree and bootloader changes to link up the GPU node to
> its firmware nodes, but it's still better than trying to do it all from
> userspace IMO...
Yes. Ideally, we'd not open up the can of worms of arbitrary pfn -> page
conversions (including the pfn_to_online_page() etc nastiness) if it can
be avoided in rust. Once there is an interface do do it, it's likely
that new users will pop up that are not just "create a simple dump, I
know what I am doing and only want sanity checks".
So best if we could prevent new pfn walkers in rust somehow; they are
already a pain to maintain+fix in C (including the upcoming more severe
folio/memdesc work), that would be good.
But if it's too hard to avoid, then it also doesn't make sense to
overcomplicate things to work around it.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-05 7:40 ` Simona Vetter
@ 2025-02-12 19:07 ` David Hildenbrand
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-02-12 19:07 UTC (permalink / raw)
To: Asahi Lina, Zi Yan, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, Jann Horn, Matthew Wilcox,
Paolo Bonzini, Danilo Krummrich, Wedson Almeida Filho,
Valentin Obst, Andrew Morton, linux-mm, airlied, Abdiel Janulgue,
rust-for-linux, linux-kernel, asahi, Oscar Salvador, Muchun Song
On 05.02.25 08:40, Simona Vetter wrote:
> On Tue, Feb 04, 2025 at 03:38:17PM +0100, David Hildenbrand wrote:
>>>> It can still race with memory offlining, and it refuses ZONE_DEVICE
>>>> pages. For the latter, we have a different way to check validity. See
>>>> memory_failure() that first calls pfn_to_online_page() to then check
>>>> get_dev_pagemap().
>>>
>>> I'll give it a shot with these functions. If they work for my use case,
>>> then it's good to have extra checks and I'll add them for v2. Thanks!
>>
>> Let me know if you run into any issues.
>>
>>>
>>>>
>>>>>
>>>>> If the answer is "no" then that's fine. It's still an unsafe function
>>>>> and we need to document in the safety section that it should only be
>>>>> used for memory that is either known to be allocated and pinned and will
>>>>> not be freed while the `struct page` is borrowed, or memory that is
>>>>> reserved and not owned by the buddy allocator, so in practice correct
>>>>> use would not be racy with memory hot-remove anyway.
>>>>>
>>>>> This is already the case for the drm/asahi use case, where the pfns
>>>>> looked up will only ever be one of:
>>>>>
>>>>> - GEM objects that are mapped to the GPU and whose physical pages are
>>>>> therefore pinned (and the VM is locked while this happens so the objects
>>>>> cannot become unpinned out from under the running code),
>>>>
>>>> How exactly are these pages pinned/obtained?
>>>
>>> Under the hood it's shmem. For pinning, it winds up at
>>> `drm_gem_get_pages()`, which I think does a `shmem_read_folio_gfp()` on
>>> a mapping set as unevictable.
>>
>> Thanks. So we grab another folio reference via
>> shmem_read_folio_gfp()->shmem_get_folio_gfp().
>>
>> Hm, I wonder if we might end up holding folios residing in
>> ZONE_MOVABLE/MIGRATE_CMA longer than we should.
>>
>> Compared to memfd_pin_folios(), which simulates FOLL_LONGTERM and makes sure
>> to migrate pages out of ZONE_MOVABLE/MIGRATE_CMA.
>>
>> But that's a different discussion, just pointing it out, maybe I'm missing
>> something :)
>
> Good GPU Drivers (tm) are supposed to have a shrinker so we can at least
> nuke some of them again. Some folks even looked into hooking up a migrate
> callback through the address_space (or wherever that hook was, this is
> from memory) so we can make this somewhat reliable. So yeah we're hogging
> ZONE_MOVEABLE unduly still.
Hmmm, so we should really be migrating pages out of
ZONE_MOVABLE/MIGRATE_CMA here, just like we now properly do in
memfd_pin_folios().
>
> The other side is that there's about 2-3 good drivers (msm, i915, xe
> should have a shrinker now too but I didn't check). The others all fall
> various levels of short, or still have 3 times cargo-culted versions of
> i915's pin-as-a-lock design and get it completely wrong.
:(
>
> So yeah I'm aware this isn't great, and we're at least glacially slowly
> moving towards a common shrinker infrastructure that maybe in a glorious
> future gets all this right. I mean it took us 15+ years to get to a
> cgroups controller after all too, and that was also a well known issue of
> just being able to hog memory with no controls and potentially cause
> havoc.
I guess that means job security for us, haha :)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-03 14:41 ` Asahi Lina
@ 2025-02-15 19:47 ` Asahi Lina
2025-02-17 8:50 ` Abdiel Janulgue
0 siblings, 1 reply; 67+ messages in thread
From: Asahi Lina @ 2025-02-15 19:47 UTC (permalink / raw)
To: Abdiel Janulgue
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, rust-for-linux, linux-kernel, asahi, Danilo Krummrich
On 2/3/25 11:41 PM, Asahi Lina wrote:
>
>
> On 2/3/25 7:22 PM, Danilo Krummrich wrote:
>> Hi Lina,
>>
>> On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote:
>>> This series refactors the existing Page wrapper to support borrowing
>>> `struct page` objects without ownership on the Rust side, and converting
>>> page references to/from physical memory addresses.
>>
>> Thanks for picking this up!
>>
>> As you know, this has been previously worked on by Abdiel. Kindly drop a note
>> if you intend to pick up something up next time, such that we don't end up doing
>> duplicated work.
>
> Sorry! I wasn't sure if this was going to end up submitted over the
> holidays so I wasn't in a rush, but I ended up switching gears in the
> past couple of weeks and I really needed this feature now (for crashdump
> support to debug a really annoying firmware crash).
>
> I actually only just noticed that the previous v2 already had
> Owned/Ownable... I only saw the v3 which didn't, so I didn't realize
> there was already a version of that approach in the past.
>
> ~~ Lina
Given the discussion in the other subthreads, it looks like this is not
the best approach for page table management and using it for crash dumps
of arbitrary physical memory pages raises some eyebrows, so I'm going to
abandon this series.
Abdiel, if the first two patches are helpful to your work, please feel
free to incorporate them into a v4. Otherwise, just go ahead with
whatever works for you!
~~ Lina
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-15 19:47 ` Asahi Lina
@ 2025-02-17 8:50 ` Abdiel Janulgue
2025-02-19 9:24 ` Andreas Hindborg
0 siblings, 1 reply; 67+ messages in thread
From: Abdiel Janulgue @ 2025-02-17 8:50 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Jann Horn, Matthew Wilcox, Paolo Bonzini,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, rust-for-linux, linux-kernel, asahi, Danilo Krummrich
Hi,
On 15/02/2025 21:47, Asahi Lina wrote:
>
> Given the discussion in the other subthreads, it looks like this is not
> the best approach for page table management and using it for crash dumps
> of arbitrary physical memory pages raises some eyebrows, so I'm going to
> abandon this series.
>
> Abdiel, if the first two patches are helpful to your work, please feel
> free to incorporate them into a v4. Otherwise, just go ahead with
> whatever works for you!
>
Thanks you for improving the series, much appreciated! I will try to
incorporate your improvements.
Btw, I'll most probably pick this up again once I'm finished with the
rust dma coherent allocator bindings.
Kind regards,
Abdiel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 1/6] rust: types: Add Ownable/Owned types
2025-02-03 19:17 ` Asahi Lina
@ 2025-02-19 8:34 ` Andreas Hindborg
0 siblings, 0 replies; 67+ messages in thread
From: Andreas Hindborg @ 2025-02-19 8:34 UTC (permalink / raw)
To: Asahi Lina
Cc: Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Trevor Gross, Jann Horn,
Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
"Asahi Lina" <lina@asahilina.net> writes:
> On 2/4/25 3:17 AM, Alice Ryhl wrote:
>> On Mon, Feb 3, 2025 at 3:17 PM Asahi Lina <lina@asahilina.net> wrote:
>>>
>>>
>>>
>>> On 2/3/25 6:13 PM, Alice Ryhl wrote:
>>>> On Sun, Feb 2, 2025 at 2:06 PM Asahi Lina <lina@asahilina.net> wrote:
>>>>> + /// Consumes the `Owned`, returning a raw pointer.
>>>>> + ///
>>>>> + /// This function does not actually relinquish ownership of the object.
>>>>> + /// After calling this function, the caller is responsible for ownership previously managed
>>>>> + /// by the `Owned`.
>>>>> + #[allow(dead_code)]
>>>>> + pub(crate) fn into_raw(me: Self) -> NonNull<T> {
>>>>
>>>> I would just make these methods public, like the ARef ones. Then you
>>>> can drop the #[allow(dead_code)] annotation.
>>>
>>> Does it make sense to ever have drivers doing this? I feel like these
>>> methods should be limited to the kernel crate.
>>
>> Not having drivers use this is the ideal, but I don't think we should
>> always expect it to be possible. The Binder driver has a C component
>> for the binderfs component, and it also has some code that's
>> essentially an abstraction inside the driver that I was asked to move
>> into Binder because it's so specific to Binder that it's not useful
>> for anyone else.
>
> That's fair, I'll make it pub.
Also, the kernel crate will split at some point. We might as well have
them public now and avoid the churn.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 1/6] rust: types: Add Ownable/Owned types
2025-02-02 13:05 ` [PATCH 1/6] rust: types: Add Ownable/Owned types Asahi Lina
2025-02-03 9:13 ` Alice Ryhl
@ 2025-02-19 8:37 ` Andreas Hindborg
1 sibling, 0 replies; 67+ messages in thread
From: Andreas Hindborg @ 2025-02-19 8:37 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
"Asahi Lina" <lina@asahilina.net> writes:
> By analogy to AlwaysRefCounted and ARef, an Ownable type is a (typically
> C FFI) type that *may* be owned by Rust, but need not be. Unlike
> AlwaysRefCounted, this mechanism expects the reference to be unique
> within Rust, and does not allow cloning.
>
> Conceptually, this is similar to a KBox<T>, except that it delegates
> resource management to the T instead of using a generic allocator.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
Over all looks good to me - I agree with the points already discussed.
Could you put it in a separate module and file? `kernel::types::ownable`
and then `pub use` in `types.rs`. `types.rs` is getting quite busy.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 2/6] rust: page: Convert to Ownable
2025-02-02 13:05 ` [PATCH 2/6] rust: page: Convert to Ownable Asahi Lina
2025-02-03 9:17 ` Alice Ryhl
2025-02-03 9:39 ` Fiona Behrens
@ 2025-02-19 8:46 ` Andreas Hindborg
2 siblings, 0 replies; 67+ messages in thread
From: Andreas Hindborg @ 2025-02-19 8:46 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
"Asahi Lina" <lina@asahilina.net> writes:
> This allows Page references to be returned as borrowed references,
> without necessarily owning the struct page.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
> ---
> rust/kernel/page.rs | 30 ++++++++++++++++--------------
> 1 file changed, 16 insertions(+), 14 deletions(-)
>
> diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
> index fdac6c375fe46e1ba589f1640afeae3e001e39ae..0b6cbe02522ab6e6e1810288ad23af4e4aa587d8 100644
> --- a/rust/kernel/page.rs
> +++ b/rust/kernel/page.rs
> @@ -7,6 +7,7 @@
> bindings,
> error::code::*,
> error::Result,
> + types::{Opaque, Ownable, Owned},
> uaccess::UserSliceReader,
> };
> use core::ptr::{self, NonNull};
> @@ -30,13 +31,10 @@ pub const fn page_align(addr: usize) -> usize {
> (addr + (PAGE_SIZE - 1)) & PAGE_MASK
> }
>
> -/// A pointer to a page that owns the page allocation.
> -///
> -/// # Invariants
> -///
> -/// The pointer is valid, and has ownership over the page.
> +/// An object representing a memory page in the kernel (`struct page`).
Can you link to the header file containing `struct page`?
.. memory page in the kernel ([`struct page`]).
[`struct page`]: srctree/include/linux/mm_types.h
Otherwise looks good 👍
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 3/6] rust: page: Make with_page_mapped() and with_pointer_into_page() public
2025-02-02 13:05 ` [PATCH 3/6] rust: page: Make with_page_mapped() and with_pointer_into_page() public Asahi Lina
2025-02-03 9:10 ` Alice Ryhl
2025-02-03 9:43 ` Fiona Behrens
@ 2025-02-19 8:48 ` Andreas Hindborg
2 siblings, 0 replies; 67+ messages in thread
From: Andreas Hindborg @ 2025-02-19 8:48 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
"Asahi Lina" <lina@asahilina.net> writes:
> Lets users do (unsafe) complex page read/write operations without having
> to repeatedly call into read_raw()/write_raw() (which may be expensive
> in some cases).
>
> The functions themselves are not unsafe, but they do take a closure that
> receives a raw pointer, so actually making the access requires unsafe
> code.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 4/6] rust: addr: Add a module to declare core address types
2025-02-02 13:05 ` [PATCH 4/6] rust: addr: Add a module to declare core address types Asahi Lina
2025-02-03 9:09 ` Alice Ryhl
2025-02-03 15:04 ` Boqun Feng
@ 2025-02-19 8:51 ` Andreas Hindborg
2 siblings, 0 replies; 67+ messages in thread
From: Andreas Hindborg @ 2025-02-19 8:51 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
"Asahi Lina" <lina@asahilina.net> writes:
> Encapsulates the core physical/DMA address types, so they can be used by
> Rust abstractions.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
> ---
> rust/kernel/addr.rs | 15 +++++++++++++++
> rust/kernel/lib.rs | 1 +
> 2 files changed, 16 insertions(+)
>
> diff --git a/rust/kernel/addr.rs b/rust/kernel/addr.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..06aff10a0332355597060c5518d7fd6e114cf630
> --- /dev/null
> +++ b/rust/kernel/addr.rs
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Kernel core address types.
How about just "Kernel address tyeps"? What does "core" mean here?
> +
> +use bindings;
> +use core::ffi;
> +
> +/// A physical memory address (which may be wider than the CPU pointer size)
Missing period at end of sentence.
> +pub type PhysicalAddr = bindings::phys_addr_t;
> +/// A DMA memory address (which may be narrower than `PhysicalAddr` on some systems)
Same.
> +pub type DmaAddr = bindings::dma_addr_t;
> +/// A physical resource size, typically the same width as `PhysicalAddr`
Same.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 5/6] rust: page: Add physical address conversion functions
2025-02-02 13:05 ` [PATCH 5/6] rust: page: Add physical address conversion functions Asahi Lina
2025-02-03 9:35 ` Alice Ryhl
2025-02-03 9:53 ` Fiona Behrens
@ 2025-02-19 9:06 ` Andreas Hindborg
2 siblings, 0 replies; 67+ messages in thread
From: Andreas Hindborg @ 2025-02-19 9:06 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
"Asahi Lina" <lina@asahilina.net> writes:
[...]
> /// A bitwise shift for the page size.
> @@ -249,6 +251,69 @@ pub unsafe fn copy_from_user_slice_raw(
> reader.read_raw(unsafe { core::slice::from_raw_parts_mut(dst.cast(), len) })
> })
> }
> +
> + /// Returns the physical address of this page.
> + pub fn phys(&self) -> PhysicalAddr {
> + // SAFETY: `page` is valid due to the type invariants on `Page`.
> + unsafe { bindings::page_to_phys(self.as_ptr()) }
> + }
> +
> + /// Converts a Rust-owned Page into its physical address.
> + ///
> + /// The caller is responsible for calling [`Page::from_phys()`] to avoid leaking memory.
> + pub fn into_phys(this: Owned<Self>) -> PhysicalAddr {
> + ManuallyDrop::new(this).phys()
> + }
> +
> + /// Converts a physical address to a Rust-owned Page.
> + ///
> + /// # Safety
> + /// The caller must ensure that the physical address was previously returned by a call to
> + /// [`Page::into_phys()`], and that the physical address is no longer used after this call,
> + /// nor is [`Page::from_phys()`] called again on it.
Do we really need the `PhysicalAddr` to come from a call to
`Page::into_phys`? Could we relax this and say that we don't care how
you came about the `PhysicalAddr` as long as you can guarantee that
ownership is correct? That would make interop with C easer in some cases.
> + pub unsafe fn from_phys(phys: PhysicalAddr) -> Owned<Self> {
> + // SAFETY: By the safety requirements, the physical address must be valid and
> + // have come from `into_phys()`, so phys_to_page() cannot fail and
> + // must return the original struct page pointer.
> + unsafe { Owned::from_raw(NonNull::new_unchecked(bindings::phys_to_page(phys)).cast()) }
> + }
> +
> + /// Borrows a Page from a physical address, without taking over ownership.
> + ///
> + /// If the physical address does not have a `struct page` entry or is not
> + /// part of a System RAM region, returns None.
> + ///
> + /// # Safety
> + /// The caller must ensure that the physical address, if it is backed by a `struct page`,
> + /// remains available for the duration of the borrowed lifetime.
> + pub unsafe fn borrow_phys(phys: &PhysicalAddr) -> Option<&Self> {
> + // SAFETY: This is always safe, as it is just arithmetic
> + let pfn = unsafe { bindings::phys_to_pfn(*phys) };
> + // SAFETY: This function is safe to call with any pfn
> + if !unsafe { bindings::pfn_valid(pfn) && bindings::page_is_ram(pfn) != 0 } {
> + None
> + } else {
> + // SAFETY: We have just checked that the pfn is valid above, so it must
> + // have a corresponding struct page. By the safety requirements, we can
> + // return a borrowed reference to it.
> + Some(unsafe { &*(bindings::pfn_to_page(pfn) as *mut Self as *const Self) })
I think you can maybe go
`bindings::pfn_to_page(pfn).cast::<Self>().cast_const()` here.
> + }
> + }
> +
> + /// Borrows a Page from a physical address, without taking over ownership
> + /// nor checking for validity.
> + ///
> + /// # Safety
> + /// The caller must ensure that the physical address is backed by a `struct page` and
> + /// corresponds to System RAM. This is true when the address was returned by
> + /// [`Page::into_phys()`].
> + pub unsafe fn borrow_phys_unchecked(phys: &PhysicalAddr) -> &Self {
> + // SAFETY: This is always safe, as it is just arithmetic
> + let pfn = unsafe { bindings::phys_to_pfn(*phys) };
> + // SAFETY: The caller guarantees that the pfn is valid. By the safety
> + // requirements, we can return a borrowed reference to it.
> + unsafe { &*(bindings::pfn_to_page(pfn) as *mut Self as *const Self) }
Same applies here.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 6/6] rust: page: Make Page::as_ptr() pub(crate)
2025-02-02 13:05 ` [PATCH 6/6] rust: page: Make Page::as_ptr() pub(crate) Asahi Lina
2025-02-03 9:08 ` Alice Ryhl
@ 2025-02-19 9:08 ` Andreas Hindborg
1 sibling, 0 replies; 67+ messages in thread
From: Andreas Hindborg @ 2025-02-19 9:08 UTC (permalink / raw)
To: Asahi Lina
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
Jann Horn, Matthew Wilcox, Paolo Bonzini, Danilo Krummrich,
Wedson Almeida Filho, Valentin Obst, Andrew Morton, linux-mm,
airlied, Abdiel Janulgue, rust-for-linux, linux-kernel, asahi
"Asahi Lina" <lina@asahilina.net> writes:
> There's no good reason for drivers to need access to the raw `struct
> page` pointer, since that should be handled by Rust abstractions. Make
> this method pub(crate) so it is only visible to the kernel crate.
>
As I mentioned on another patch, we are planning to split the kernel
crate per subsystem, and this would need to be pub again then.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
2025-02-17 8:50 ` Abdiel Janulgue
@ 2025-02-19 9:24 ` Andreas Hindborg
0 siblings, 0 replies; 67+ messages in thread
From: Andreas Hindborg @ 2025-02-19 9:24 UTC (permalink / raw)
To: Abdiel Janulgue
Cc: Asahi Lina, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
Jann Horn, Matthew Wilcox, Paolo Bonzini, Wedson Almeida Filho,
Valentin Obst, Andrew Morton, linux-mm, airlied, rust-for-linux,
linux-kernel, asahi, Danilo Krummrich
"Abdiel Janulgue" <abdiel.janulgue@gmail.com> writes:
> Hi,
>
> On 15/02/2025 21:47, Asahi Lina wrote:
>>
>> Given the discussion in the other subthreads, it looks like this is not
>> the best approach for page table management and using it for crash dumps
>> of arbitrary physical memory pages raises some eyebrows, so I'm going to
>> abandon this series.
>>
>> Abdiel, if the first two patches are helpful to your work, please feel
>> free to incorporate them into a v4. Otherwise, just go ahead with
>> whatever works for you!
>>
>
> Thanks you for improving the series, much appreciated! I will try to
> incorporate your improvements.
>
> Btw, I'll most probably pick this up again once I'm finished with the
> rust dma coherent allocator bindings.
The series has application in `rnull` as well and I will probably add it
as a dependency for next version.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion
@ 2025-03-06 19:21 Oliver Mangold
0 siblings, 0 replies; 67+ messages in thread
From: Oliver Mangold @ 2025-03-06 19:21 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Abdiel Janulgue, Asahi Lina, Miguel Ojeda, Alex Gaynor,
Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Alice Ryhl, Trevor Gross, Jann Horn, Matthew Wilcox,
Paolo Bonzini, Wedson Almeida Filho, Valentin Obst, Andrew Morton,
linux-mm, airlied, rust-for-linux, linux-kernel, asahi,
Danilo Krummrich
On 250219 1024, Andreas Hindborg wrote:
>
>
> The series has application in `rnull` as well and I will probably add it
> as a dependency for next version.
>
>
Hi,
would anyone mind if I pick this up? I'm working on that patch:
https://lore.kernel.org/all/20250305-unique-ref-v4-1-a8fdef7b1c2c@pm.me/
There is a lot of overlap, so it would make sense, I think, to base
it on this one.
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2025-03-06 19:21 UTC | newest]
Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-02 13:05 [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Asahi Lina
2025-02-02 13:05 ` [PATCH 1/6] rust: types: Add Ownable/Owned types Asahi Lina
2025-02-03 9:13 ` Alice Ryhl
2025-02-03 14:17 ` Asahi Lina
2025-02-03 18:17 ` Alice Ryhl
2025-02-03 19:17 ` Asahi Lina
2025-02-19 8:34 ` Andreas Hindborg
2025-02-19 8:37 ` Andreas Hindborg
2025-02-02 13:05 ` [PATCH 2/6] rust: page: Convert to Ownable Asahi Lina
2025-02-03 9:17 ` Alice Ryhl
2025-02-03 9:39 ` Fiona Behrens
2025-02-19 8:46 ` Andreas Hindborg
2025-02-02 13:05 ` [PATCH 3/6] rust: page: Make with_page_mapped() and with_pointer_into_page() public Asahi Lina
2025-02-03 9:10 ` Alice Ryhl
2025-02-03 9:43 ` Fiona Behrens
2025-02-19 8:48 ` Andreas Hindborg
2025-02-02 13:05 ` [PATCH 4/6] rust: addr: Add a module to declare core address types Asahi Lina
2025-02-03 9:09 ` Alice Ryhl
2025-02-03 15:04 ` Boqun Feng
2025-02-04 11:50 ` Asahi Lina
2025-02-04 14:50 ` Boqun Feng
2025-02-19 8:51 ` Andreas Hindborg
2025-02-02 13:05 ` [PATCH 5/6] rust: page: Add physical address conversion functions Asahi Lina
2025-02-03 9:35 ` Alice Ryhl
2025-02-04 11:43 ` Asahi Lina
2025-02-03 9:53 ` Fiona Behrens
2025-02-03 10:01 ` Alice Ryhl
2025-02-19 9:06 ` Andreas Hindborg
2025-02-02 13:05 ` [PATCH 6/6] rust: page: Make Page::as_ptr() pub(crate) Asahi Lina
2025-02-03 9:08 ` Alice Ryhl
2025-02-19 9:08 ` Andreas Hindborg
2025-02-03 9:58 ` [PATCH 0/6] rust: page: Support borrowing `struct page` and physaddr conversion Simona Vetter
2025-02-03 14:32 ` Asahi Lina
2025-02-03 21:05 ` Zi Yan
2025-02-04 10:26 ` David Hildenbrand
2025-02-04 11:41 ` Asahi Lina
2025-02-04 11:59 ` David Hildenbrand
2025-02-04 13:05 ` Asahi Lina
2025-02-04 14:38 ` David Hildenbrand
2025-02-04 17:59 ` Asahi Lina
2025-02-04 20:10 ` David Hildenbrand
2025-02-04 21:06 ` Asahi Lina
2025-02-06 17:58 ` David Hildenbrand
2025-02-06 19:18 ` Asahi Lina
2025-02-06 19:27 ` Asahi Lina
2025-02-12 19:06 ` David Hildenbrand
2025-02-12 19:01 ` David Hildenbrand
2025-02-05 7:40 ` Simona Vetter
2025-02-12 19:07 ` David Hildenbrand
2025-02-04 10:33 ` David Hildenbrand
2025-02-04 18:39 ` Jason Gunthorpe
2025-02-04 19:01 ` Asahi Lina
2025-02-04 20:05 ` David Hildenbrand
2025-02-04 20:26 ` Jason Gunthorpe
2025-02-04 20:41 ` David Hildenbrand
2025-02-04 20:47 ` David Hildenbrand
2025-02-04 21:18 ` Asahi Lina
2025-02-06 18:02 ` David Hildenbrand
2025-02-04 20:49 ` Jason Gunthorpe
2025-02-05 23:17 ` Matthew Wilcox
2025-02-06 18:04 ` David Hildenbrand
2025-02-03 10:22 ` Danilo Krummrich
2025-02-03 14:41 ` Asahi Lina
2025-02-15 19:47 ` Asahi Lina
2025-02-17 8:50 ` Abdiel Janulgue
2025-02-19 9:24 ` Andreas Hindborg
-- strict thread matches above, loose matches on Subject: below --
2025-03-06 19:21 Oliver Mangold
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).