* [PATCH v4] io: add io_pgtable abstraction
@ 2025-12-19 10:50 Alice Ryhl
2025-12-19 11:04 ` Daniel Almeida
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Alice Ryhl @ 2025-12-19 10:50 UTC (permalink / raw)
To: Miguel Ojeda, Will Deacon, Daniel Almeida, Boris Brezillon,
Robin Murphy, Jason Gunthorpe
Cc: Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, Danilo Krummrich, Joerg Roedel,
Lorenzo Stoakes, Liam R. Howlett, Asahi Lina, linux-kernel,
rust-for-linux, iommu, linux-mm, Alice Ryhl
From: Asahi Lina <lina+kernel@asahilina.net>
This will be used by the Tyr driver to create and modify the page table
of each address space on the GPU. Each time a mapping gets created or
removed by userspace, Tyr will call into GPUVM, which will figure out
which calls to map_pages and unmap_pages are required to map the data in
question in the page table so that the GPU may access those pages when
using that address space.
The Rust type wraps the struct using a raw pointer rather than the usual
Opaque+ARef approach because Opaque+ARef requires the target type to be
refcounted.
Signed-off-by: Asahi Lina <lina+kernel@asahilina.net>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Co-developed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
Changes in v4:
- Rename prot::PRIV to prot::PRIVILEGED
- Adjust map_pages to return the length even on error.
- Explain return value in docs of map_pages and unmap_pages.
- Explain in map_pages that the caller must explicitly flush the TLB
before accessing the resulting mapping.
- Add a safety requirement that access to a given range is required to
be exclusive.
- Reword comment on NOOP_FLUSH_OPS.
- Rebase on v6.19-rc1 and pick up tags.
- Link to v3: https://lore.kernel.org/r/20251112-io-pgtable-v3-1-b00c2e6b951a@google.com
Changes in v3:
- Almost entirely rewritten from scratch.
- Link to v2: https://lore.kernel.org/all/20250623-io_pgtable-v2-1-fd72daac75f1@collabora.com/
---
rust/bindings/bindings_helper.h | 3 +-
rust/kernel/io.rs | 1 +
rust/kernel/io/pgtable.rs | 278 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 281 insertions(+), 1 deletion(-)
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index a067038b4b422b4256f4a2b75fe644d47e6e82c8..1b05a5e4cfb4780fdc27813d708a8f1a6a2d9913 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -56,9 +56,10 @@
#include <linux/fdtable.h>
#include <linux/file.h>
#include <linux/firmware.h>
-#include <linux/interrupt.h>
#include <linux/fs.h>
#include <linux/i2c.h>
+#include <linux/interrupt.h>
+#include <linux/io-pgtable.h>
#include <linux/ioport.h>
#include <linux/jiffies.h>
#include <linux/jump_label.h>
diff --git a/rust/kernel/io.rs b/rust/kernel/io.rs
index 98e8b84e68d11ef74b2026d8c3d847a127f4672d..88253158448cbf493ca200a87ef9ba958255e761 100644
--- a/rust/kernel/io.rs
+++ b/rust/kernel/io.rs
@@ -10,6 +10,7 @@
};
pub mod mem;
+pub mod pgtable;
pub mod poll;
pub mod resource;
diff --git a/rust/kernel/io/pgtable.rs b/rust/kernel/io/pgtable.rs
new file mode 100644
index 0000000000000000000000000000000000000000..11096acfa41d45125e866876e41459a347e9afe6
--- /dev/null
+++ b/rust/kernel/io/pgtable.rs
@@ -0,0 +1,278 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! IOMMU page table management.
+//!
+//! C header: [`include/io-pgtable.h`](srctree/include/io-pgtable.h)
+
+use core::{
+ marker::PhantomData,
+ ptr::NonNull, //
+};
+
+use crate::{
+ alloc,
+ bindings,
+ device::{Bound, Device},
+ devres::Devres,
+ error::to_result,
+ io::PhysAddr,
+ prelude::*, //
+};
+
+use bindings::io_pgtable_fmt;
+
+/// Protection flags used with IOMMU mappings.
+pub mod prot {
+ /// Read access.
+ pub const READ: u32 = bindings::IOMMU_READ;
+ /// Write access.
+ pub const WRITE: u32 = bindings::IOMMU_WRITE;
+ /// Request cache coherency.
+ pub const CACHE: u32 = bindings::IOMMU_CACHE;
+ /// Request no-execute permission.
+ pub const NOEXEC: u32 = bindings::IOMMU_NOEXEC;
+ /// MMIO peripheral mapping.
+ pub const MMIO: u32 = bindings::IOMMU_MMIO;
+ /// Privileged mapping.
+ pub const PRIVILEGED: u32 = bindings::IOMMU_PRIV;
+}
+
+/// Represents a requested `io_pgtable` configuration.
+pub struct Config {
+ /// Quirk bitmask (type-specific).
+ pub quirks: usize,
+ /// Valid page sizes, as a bitmask of powers of two.
+ pub pgsize_bitmap: usize,
+ /// Input address space size in bits.
+ pub ias: u32,
+ /// Output address space size in bits.
+ pub oas: u32,
+ /// IOMMU uses coherent accesses for page table walks.
+ pub coherent_walk: bool,
+}
+
+/// An io page table using a specific format.
+///
+/// # Invariants
+///
+/// The pointer references a valid io page table.
+pub struct IoPageTable<F: IoPageTableFmt> {
+ ptr: NonNull<bindings::io_pgtable_ops>,
+ _marker: PhantomData<F>,
+}
+
+// SAFETY: `struct io_pgtable_ops` is not restricted to a single thread.
+unsafe impl<F: IoPageTableFmt> Send for IoPageTable<F> {}
+// SAFETY: `struct io_pgtable_ops` may be accessed concurrently.
+unsafe impl<F: IoPageTableFmt> Sync for IoPageTable<F> {}
+
+/// The format used by this page table.
+pub trait IoPageTableFmt: 'static {
+ /// The value representing this format.
+ const FORMAT: io_pgtable_fmt;
+}
+
+impl<F: IoPageTableFmt> IoPageTable<F> {
+ /// Create a new `IoPageTable` as a device resource.
+ #[inline]
+ pub fn new(
+ dev: &Device<Bound>,
+ config: Config,
+ ) -> impl PinInit<Devres<IoPageTable<F>>, Error> + '_ {
+ // SAFETY: Devres ensures that the value is dropped during device unbind.
+ Devres::new(dev, unsafe { Self::new_raw(dev, config) })
+ }
+
+ /// Create a new `IoPageTable`.
+ ///
+ /// # Safety
+ ///
+ /// If successful, then the returned value must be dropped before the device is unbound.
+ #[inline]
+ pub unsafe fn new_raw(dev: &Device<Bound>, config: Config) -> Result<IoPageTable<F>> {
+ let mut raw_cfg = bindings::io_pgtable_cfg {
+ quirks: config.quirks,
+ pgsize_bitmap: config.pgsize_bitmap,
+ ias: config.ias,
+ oas: config.oas,
+ coherent_walk: config.coherent_walk,
+ tlb: &raw const NOOP_FLUSH_OPS,
+ iommu_dev: dev.as_raw(),
+ // SAFETY: All zeroes is a valid value for `struct io_pgtable_cfg`.
+ ..unsafe { core::mem::zeroed() }
+ };
+
+ // SAFETY:
+ // * The raw_cfg pointer is valid for the duration of this call.
+ // * The provided `FLUSH_OPS` contains valid function pointers that accept a null pointer
+ // as cookie.
+ // * The caller ensures that the io pgtable does not outlive the device.
+ let ops = unsafe {
+ bindings::alloc_io_pgtable_ops(F::FORMAT, &mut raw_cfg, core::ptr::null_mut())
+ };
+ // INVARIANT: We successfully created a valid page table.
+ Ok(IoPageTable {
+ ptr: NonNull::new(ops).ok_or(ENOMEM)?,
+ _marker: PhantomData,
+ })
+ }
+
+ /// Obtain a raw pointer to the underlying `struct io_pgtable_ops`.
+ #[inline]
+ pub fn raw_ops(&self) -> *mut bindings::io_pgtable_ops {
+ self.ptr.as_ptr()
+ }
+
+ /// Obtain a raw pointer to the underlying `struct io_pgtable`.
+ #[inline]
+ pub fn raw_pgtable(&self) -> *mut bindings::io_pgtable {
+ // SAFETY: The io_pgtable_ops of an io-pgtable is always the ops field of a io_pgtable.
+ unsafe { kernel::container_of!(self.raw_ops(), bindings::io_pgtable, ops) }
+ }
+
+ /// Obtain a raw pointer to the underlying `struct io_pgtable_cfg`.
+ #[inline]
+ pub fn raw_cfg(&self) -> *mut bindings::io_pgtable_cfg {
+ // SAFETY: The `raw_pgtable()` method returns a valid pointer.
+ unsafe { &raw mut (*self.raw_pgtable()).cfg }
+ }
+
+ /// Map a physically contiguous range of pages of the same size.
+ ///
+ /// Even if successful, this operation may not map the entire range. In that case, only a
+ /// prefix of the range is mapped, and the returned integer indicates its length in bytes. In
+ /// this case, the caller will usually call `map_pages` again for the remaining range.
+ ///
+ /// The returned [`Result`] indicates whether an error was encountered while mapping pages.
+ /// Note that this may return a non-zero length even if an error was encountered. The caller
+ /// will usually [unmap the relevant pages](Self::unmap_pages) on error.
+ ///
+ /// The caller must flush the TLB before using the pgtable to access the newly created mapping.
+ ///
+ /// # Safety
+ ///
+ /// * No other io-pgtable operation may access the range `iova .. iova+pgsize*pgcount` while
+ /// this `map_pages` operation executes.
+ /// * This page table must not contain any mapping that overlaps with the mapping created by
+ /// this call.
+ /// * If this page table is live, then the caller must ensure that it's okay to access the
+ /// physical address being mapped for the duration in which it is mapped.
+ #[inline]
+ #[must_use]
+ pub unsafe fn map_pages(
+ &self,
+ iova: usize,
+ paddr: PhysAddr,
+ pgsize: usize,
+ pgcount: usize,
+ prot: u32,
+ flags: alloc::Flags,
+ ) -> (usize, Result) {
+ let mut mapped: usize = 0;
+
+ // SAFETY: The `map_pages` function in `io_pgtable_ops` is never null.
+ let map_pages = unsafe { (*self.raw_ops()).map_pages.unwrap_unchecked() };
+
+ // SAFETY: The safety requirements of this method are sufficient to call `map_pages`.
+ let ret = to_result(unsafe {
+ (map_pages)(
+ self.raw_ops(),
+ iova,
+ paddr,
+ pgsize,
+ pgcount,
+ prot as i32,
+ flags.as_raw(),
+ &mut mapped,
+ )
+ });
+
+ (mapped, ret)
+ }
+
+ /// Unmap a range of virtually contiguous pages of the same size.
+ ///
+ /// This may not unmap the entire range, and returns the length of the unmapped prefix in
+ /// bytes.
+ ///
+ /// # Safety
+ ///
+ /// * No other io-pgtable operation may access the range `iova .. iova+pgsize*pgcount` while
+ /// this `unmap_pages` operation executes.
+ /// * This page table must contain one or more consecutive mappings starting at `iova` whose
+ /// total size is `pgcount * pgsize`.
+ #[inline]
+ #[must_use]
+ pub unsafe fn unmap_pages(&self, iova: usize, pgsize: usize, pgcount: usize) -> usize {
+ // SAFETY: The `unmap_pages` function in `io_pgtable_ops` is never null.
+ let unmap_pages = unsafe { (*self.raw_ops()).unmap_pages.unwrap_unchecked() };
+
+ // SAFETY: The safety requirements of this method are sufficient to call `unmap_pages`.
+ unsafe { (unmap_pages)(self.raw_ops(), iova, pgsize, pgcount, core::ptr::null_mut()) }
+ }
+}
+
+// For now, we do not provide the ability to flush the TLB via the built-in callback mechanism.
+// Instead, the `map_pages` function requires the caller to explicitly flush the TLB before the
+// pgtable is used to access the newly created range.
+//
+// This is done because the initial user of this abstraction may perform many calls to `map_pages`
+// in a single batched operation, and wishes to only flush the TLB once after performing the entire
+// batch of mappings. These callbacks would flush too often for that use-case.
+//
+// Support for flushing the TLB in these callbacks may be added in the future.
+static NOOP_FLUSH_OPS: bindings::iommu_flush_ops = bindings::iommu_flush_ops {
+ tlb_flush_all: Some(rust_tlb_flush_all_noop),
+ tlb_flush_walk: Some(rust_tlb_flush_walk_noop),
+ tlb_add_page: None,
+};
+
+#[no_mangle]
+extern "C" fn rust_tlb_flush_all_noop(_cookie: *mut core::ffi::c_void) {}
+
+#[no_mangle]
+extern "C" fn rust_tlb_flush_walk_noop(
+ _iova: usize,
+ _size: usize,
+ _granule: usize,
+ _cookie: *mut core::ffi::c_void,
+) {
+}
+
+impl<F: IoPageTableFmt> Drop for IoPageTable<F> {
+ fn drop(&mut self) {
+ // SAFETY: The caller of `ttbr` promised that the page table is not live when this
+ // destructor runs.
+ unsafe { bindings::free_io_pgtable_ops(self.raw_ops()) };
+ }
+}
+
+/// The `ARM_64_LPAE_S1` page table format.
+pub enum ARM64LPAES1 {}
+
+impl IoPageTableFmt for ARM64LPAES1 {
+ const FORMAT: io_pgtable_fmt = bindings::io_pgtable_fmt_ARM_64_LPAE_S1 as io_pgtable_fmt;
+}
+
+impl IoPageTable<ARM64LPAES1> {
+ /// Access the `ttbr` field of the configuration.
+ ///
+ /// This is the physical address of the page table, which may be passed to the device that
+ /// needs to use it.
+ ///
+ /// # Safety
+ ///
+ /// The caller must ensure that the device stops using the page table before dropping it.
+ #[inline]
+ pub unsafe fn ttbr(&self) -> u64 {
+ // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES1`.
+ unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.ttbr }
+ }
+
+ /// Access the `mair` field of the configuration.
+ #[inline]
+ pub fn mair(&self) -> u64 {
+ // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES1`.
+ unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.mair }
+ }
+}
---
base-commit: 3e7f562e20ee87a25e104ef4fce557d39d62fa85
change-id: 20251111-io-pgtable-fe0822b4ebdd
Best regards,
--
Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v4] io: add io_pgtable abstraction
2025-12-19 10:50 [PATCH v4] io: add io_pgtable abstraction Alice Ryhl
@ 2025-12-19 11:04 ` Daniel Almeida
2025-12-19 11:43 ` Alice Ryhl
2025-12-19 14:05 ` Jason Gunthorpe
2025-12-21 0:06 ` kernel test robot
2 siblings, 1 reply; 12+ messages in thread
From: Daniel Almeida @ 2025-12-19 11:04 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Will Deacon, Boris Brezillon, Robin Murphy,
Jason Gunthorpe, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
Joerg Roedel, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm
Hi Alice,
> On 19 Dec 2025, at 07:50, Alice Ryhl <aliceryhl@google.com> wrote:
>
> From: Asahi Lina <lina+kernel@asahilina.net>
>
> This will be used by the Tyr driver to create and modify the page table
> of each address space on the GPU. Each time a mapping gets created or
> removed by userspace, Tyr will call into GPUVM, which will figure out
> which calls to map_pages and unmap_pages are required to map the data in
> question in the page table so that the GPU may access those pages when
> using that address space.
>
> The Rust type wraps the struct using a raw pointer rather than the usual
> Opaque+ARef approach because Opaque+ARef requires the target type to be
> refcounted.
>
> Signed-off-by: Asahi Lina <lina+kernel@asahilina.net>
> Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
> Co-developed-by: Alice Ryhl <aliceryhl@google.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> Changes in v4:
> - Rename prot::PRIV to prot::PRIVILEGED
> - Adjust map_pages to return the length even on error.
> - Explain return value in docs of map_pages and unmap_pages.
> - Explain in map_pages that the caller must explicitly flush the TLB
> before accessing the resulting mapping.
> - Add a safety requirement that access to a given range is required to
> be exclusive.
> - Reword comment on NOOP_FLUSH_OPS.
> - Rebase on v6.19-rc1 and pick up tags.
> - Link to v3: https://lore.kernel.org/r/20251112-io-pgtable-v3-1-b00c2e6b951a@google.com
>
> Changes in v3:
> - Almost entirely rewritten from scratch.
> - Link to v2: https://lore.kernel.org/all/20250623-io_pgtable-v2-1-fd72daac75f1@collabora.com/
> ---
> rust/bindings/bindings_helper.h | 3 +-
> rust/kernel/io.rs | 1 +
> rust/kernel/io/pgtable.rs | 278 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 281 insertions(+), 1 deletion(-)
>
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index a067038b4b422b4256f4a2b75fe644d47e6e82c8..1b05a5e4cfb4780fdc27813d708a8f1a6a2d9913 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -56,9 +56,10 @@
> #include <linux/fdtable.h>
> #include <linux/file.h>
> #include <linux/firmware.h>
> -#include <linux/interrupt.h>
> #include <linux/fs.h>
> #include <linux/i2c.h>
> +#include <linux/interrupt.h>
> +#include <linux/io-pgtable.h>
> #include <linux/ioport.h>
> #include <linux/jiffies.h>
> #include <linux/jump_label.h>
> diff --git a/rust/kernel/io.rs b/rust/kernel/io.rs
> index 98e8b84e68d11ef74b2026d8c3d847a127f4672d..88253158448cbf493ca200a87ef9ba958255e761 100644
> --- a/rust/kernel/io.rs
> +++ b/rust/kernel/io.rs
> @@ -10,6 +10,7 @@
> };
>
> pub mod mem;
> +pub mod pgtable;
> pub mod poll;
> pub mod resource;
>
> diff --git a/rust/kernel/io/pgtable.rs b/rust/kernel/io/pgtable.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..11096acfa41d45125e866876e41459a347e9afe6
> --- /dev/null
> +++ b/rust/kernel/io/pgtable.rs
> @@ -0,0 +1,278 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! IOMMU page table management.
> +//!
> +//! C header: [`include/io-pgtable.h`](srctree/include/io-pgtable.h)
> +
> +use core::{
> + marker::PhantomData,
> + ptr::NonNull, //
> +};
> +
> +use crate::{
> + alloc,
> + bindings,
> + device::{Bound, Device},
> + devres::Devres,
> + error::to_result,
> + io::PhysAddr,
> + prelude::*, //
> +};
> +
> +use bindings::io_pgtable_fmt;
> +
> +/// Protection flags used with IOMMU mappings.
> +pub mod prot {
> + /// Read access.
> + pub const READ: u32 = bindings::IOMMU_READ;
> + /// Write access.
> + pub const WRITE: u32 = bindings::IOMMU_WRITE;
> + /// Request cache coherency.
> + pub const CACHE: u32 = bindings::IOMMU_CACHE;
> + /// Request no-execute permission.
> + pub const NOEXEC: u32 = bindings::IOMMU_NOEXEC;
> + /// MMIO peripheral mapping.
> + pub const MMIO: u32 = bindings::IOMMU_MMIO;
> + /// Privileged mapping.
> + pub const PRIVILEGED: u32 = bindings::IOMMU_PRIV;
> +}
> +
> +/// Represents a requested `io_pgtable` configuration.
> +pub struct Config {
> + /// Quirk bitmask (type-specific).
> + pub quirks: usize,
> + /// Valid page sizes, as a bitmask of powers of two.
> + pub pgsize_bitmap: usize,
> + /// Input address space size in bits.
> + pub ias: u32,
> + /// Output address space size in bits.
> + pub oas: u32,
> + /// IOMMU uses coherent accesses for page table walks.
> + pub coherent_walk: bool,
> +}
> +
> +/// An io page table using a specific format.
> +///
> +/// # Invariants
> +///
> +/// The pointer references a valid io page table.
> +pub struct IoPageTable<F: IoPageTableFmt> {
> + ptr: NonNull<bindings::io_pgtable_ops>,
> + _marker: PhantomData<F>,
> +}
> +
> +// SAFETY: `struct io_pgtable_ops` is not restricted to a single thread.
> +unsafe impl<F: IoPageTableFmt> Send for IoPageTable<F> {}
> +// SAFETY: `struct io_pgtable_ops` may be accessed concurrently.
> +unsafe impl<F: IoPageTableFmt> Sync for IoPageTable<F> {}
> +
> +/// The format used by this page table.
> +pub trait IoPageTableFmt: 'static {
> + /// The value representing this format.
> + const FORMAT: io_pgtable_fmt;
> +}
> +
> +impl<F: IoPageTableFmt> IoPageTable<F> {
I don’t see a reason to keep struct Foo and impl Foo separate.
IMHO, these should always be together, as the first thing one wants
to read after a type declaration is its implementation.
> + /// Create a new `IoPageTable` as a device resource.
> + #[inline]
> + pub fn new(
> + dev: &Device<Bound>,
> + config: Config,
> + ) -> impl PinInit<Devres<IoPageTable<F>>, Error> + '_ {
> + // SAFETY: Devres ensures that the value is dropped during device unbind.
> + Devres::new(dev, unsafe { Self::new_raw(dev, config) })
> + }
> +
> + /// Create a new `IoPageTable`.
> + ///
> + /// # Safety
> + ///
> + /// If successful, then the returned value must be dropped before the device is unbound.
> + #[inline]
> + pub unsafe fn new_raw(dev: &Device<Bound>, config: Config) -> Result<IoPageTable<F>> {
> + let mut raw_cfg = bindings::io_pgtable_cfg {
> + quirks: config.quirks,
> + pgsize_bitmap: config.pgsize_bitmap,
> + ias: config.ias,
> + oas: config.oas,
> + coherent_walk: config.coherent_walk,
> + tlb: &raw const NOOP_FLUSH_OPS,
> + iommu_dev: dev.as_raw(),
> + // SAFETY: All zeroes is a valid value for `struct io_pgtable_cfg`.
> + ..unsafe { core::mem::zeroed() }
> + };
> +
> + // SAFETY:
> + // * The raw_cfg pointer is valid for the duration of this call.
> + // * The provided `FLUSH_OPS` contains valid function pointers that accept a null pointer
> + // as cookie.
> + // * The caller ensures that the io pgtable does not outlive the device.
We should probably tailor the sentence above for Devres?
> + let ops = unsafe {
> + bindings::alloc_io_pgtable_ops(F::FORMAT, &mut raw_cfg, core::ptr::null_mut())
> + };
I’d add a blank here.
> + // INVARIANT: We successfully created a valid page table.
> + Ok(IoPageTable {
> + ptr: NonNull::new(ops).ok_or(ENOMEM)?,
> + _marker: PhantomData,
> + })
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable_ops`.
> + #[inline]
> + pub fn raw_ops(&self) -> *mut bindings::io_pgtable_ops {
> + self.ptr.as_ptr()
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable`.
> + #[inline]
> + pub fn raw_pgtable(&self) -> *mut bindings::io_pgtable {
> + // SAFETY: The io_pgtable_ops of an io-pgtable is always the ops field of a io_pgtable.
> + unsafe { kernel::container_of!(self.raw_ops(), bindings::io_pgtable, ops) }
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable_cfg`.
> + #[inline]
> + pub fn raw_cfg(&self) -> *mut bindings::io_pgtable_cfg {
> + // SAFETY: The `raw_pgtable()` method returns a valid pointer.
> + unsafe { &raw mut (*self.raw_pgtable()).cfg }
> + }
> +
> + /// Map a physically contiguous range of pages of the same size.
> + ///
> + /// Even if successful, this operation may not map the entire range. In that case, only a
> + /// prefix of the range is mapped, and the returned integer indicates its length in bytes. In
> + /// this case, the caller will usually call `map_pages` again for the remaining range.
> + ///
> + /// The returned [`Result`] indicates whether an error was encountered while mapping pages.
> + /// Note that this may return a non-zero length even if an error was encountered. The caller
> + /// will usually [unmap the relevant pages](Self::unmap_pages) on error.
> + ///
> + /// The caller must flush the TLB before using the pgtable to access the newly created mapping.
> + ///
> + /// # Safety
> + ///
> + /// * No other io-pgtable operation may access the range `iova .. iova+pgsize*pgcount` while
> + /// this `map_pages` operation executes.
> + /// * This page table must not contain any mapping that overlaps with the mapping created by
> + /// this call.
> + /// * If this page table is live, then the caller must ensure that it's okay to access the
> + /// physical address being mapped for the duration in which it is mapped.
> + #[inline]
> + #[must_use]
> + pub unsafe fn map_pages(
> + &self,
> + iova: usize,
> + paddr: PhysAddr,
> + pgsize: usize,
> + pgcount: usize,
> + prot: u32,
> + flags: alloc::Flags,
> + ) -> (usize, Result) {
> + let mut mapped: usize = 0;
> +
> + // SAFETY: The `map_pages` function in `io_pgtable_ops` is never null.
> + let map_pages = unsafe { (*self.raw_ops()).map_pages.unwrap_unchecked() };
> +
> + // SAFETY: The safety requirements of this method are sufficient to call `map_pages`.
> + let ret = to_result(unsafe {
> + (map_pages)(
> + self.raw_ops(),
> + iova,
> + paddr,
> + pgsize,
> + pgcount,
> + prot as i32,
> + flags.as_raw(),
> + &mut mapped,
> + )
> + });
> +
> + (mapped, ret)
> + }
> +
> + /// Unmap a range of virtually contiguous pages of the same size.
> + ///
> + /// This may not unmap the entire range, and returns the length of the unmapped prefix in
> + /// bytes.
> + ///
> + /// # Safety
> + ///
> + /// * No other io-pgtable operation may access the range `iova .. iova+pgsize*pgcount` while
> + /// this `unmap_pages` operation executes.
> + /// * This page table must contain one or more consecutive mappings starting at `iova` whose
> + /// total size is `pgcount * pgsize`.
> + #[inline]
> + #[must_use]
> + pub unsafe fn unmap_pages(&self, iova: usize, pgsize: usize, pgcount: usize) -> usize {
> + // SAFETY: The `unmap_pages` function in `io_pgtable_ops` is never null.
> + let unmap_pages = unsafe { (*self.raw_ops()).unmap_pages.unwrap_unchecked() };
> +
> + // SAFETY: The safety requirements of this method are sufficient to call `unmap_pages`.
> + unsafe { (unmap_pages)(self.raw_ops(), iova, pgsize, pgcount, core::ptr::null_mut()) }
> + }
> +}
> +
> +// For now, we do not provide the ability to flush the TLB via the built-in callback mechanism.
> +// Instead, the `map_pages` function requires the caller to explicitly flush the TLB before the
> +// pgtable is used to access the newly created range.
> +//
> +// This is done because the initial user of this abstraction may perform many calls to `map_pages`
> +// in a single batched operation, and wishes to only flush the TLB once after performing the entire
> +// batch of mappings. These callbacks would flush too often for that use-case.
> +//
> +// Support for flushing the TLB in these callbacks may be added in the future.
> +static NOOP_FLUSH_OPS: bindings::iommu_flush_ops = bindings::iommu_flush_ops {
> + tlb_flush_all: Some(rust_tlb_flush_all_noop),
> + tlb_flush_walk: Some(rust_tlb_flush_walk_noop),
> + tlb_add_page: None,
> +};
> +
> +#[no_mangle]
> +extern "C" fn rust_tlb_flush_all_noop(_cookie: *mut core::ffi::c_void) {}
> +
> +#[no_mangle]
> +extern "C" fn rust_tlb_flush_walk_noop(
> + _iova: usize,
> + _size: usize,
> + _granule: usize,
> + _cookie: *mut core::ffi::c_void,
> +) {
> +}
> +
> +impl<F: IoPageTableFmt> Drop for IoPageTable<F> {
> + fn drop(&mut self) {
> + // SAFETY: The caller of `ttbr` promised that the page table is not live when this
> + // destructor runs.
Not sure I understand this sentence. Perhaps we should remove the word “ttbr” from here? ttbr is a register.
> + unsafe { bindings::free_io_pgtable_ops(self.raw_ops()) };
> + }
> +}
> +
> +/// The `ARM_64_LPAE_S1` page table format.
> +pub enum ARM64LPAES1 {}
> +
> +impl IoPageTableFmt for ARM64LPAES1 {
> + const FORMAT: io_pgtable_fmt = bindings::io_pgtable_fmt_ARM_64_LPAE_S1 as io_pgtable_fmt;
> +}
> +
> +impl IoPageTable<ARM64LPAES1> {
> + /// Access the `ttbr` field of the configuration.
> + ///
> + /// This is the physical address of the page table, which may be passed to the device that
> + /// needs to use it.
> + ///
> + /// # Safety
> + ///
> + /// The caller must ensure that the device stops using the page table before dropping it.
> + #[inline]
> + pub unsafe fn ttbr(&self) -> u64 {
> + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES1`.
> + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.ttbr }
> + }
> +
> + /// Access the `mair` field of the configuration.
> + #[inline]
> + pub fn mair(&self) -> u64 {
> + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES1`.
> + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.mair }
> + }
> +}
>
> ---
> base-commit: 3e7f562e20ee87a25e104ef4fce557d39d62fa85
> change-id: 20251111-io-pgtable-fe0822b4ebdd
>
> Best regards,
> --
> Alice Ryhl <aliceryhl@google.com>
>
Looks good to me. Please wait for Deborah Brouwer’s Tested-By tag if
there’s no further comments from others.
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] io: add io_pgtable abstraction
2025-12-19 11:04 ` Daniel Almeida
@ 2025-12-19 11:43 ` Alice Ryhl
2025-12-19 11:50 ` Daniel Almeida
0 siblings, 1 reply; 12+ messages in thread
From: Alice Ryhl @ 2025-12-19 11:43 UTC (permalink / raw)
To: Daniel Almeida
Cc: Miguel Ojeda, Will Deacon, Boris Brezillon, Robin Murphy,
Jason Gunthorpe, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
Joerg Roedel, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm
On Fri, Dec 19, 2025 at 08:04:17AM -0300, Daniel Almeida wrote:
> Hi Alice,
>
> > On 19 Dec 2025, at 07:50, Alice Ryhl <aliceryhl@google.com> wrote:
> >
> > From: Asahi Lina <lina+kernel@asahilina.net>
> >
> > This will be used by the Tyr driver to create and modify the page table
> > of each address space on the GPU. Each time a mapping gets created or
> > removed by userspace, Tyr will call into GPUVM, which will figure out
> > which calls to map_pages and unmap_pages are required to map the data in
> > question in the page table so that the GPU may access those pages when
> > using that address space.
> >
> > The Rust type wraps the struct using a raw pointer rather than the usual
> > Opaque+ARef approach because Opaque+ARef requires the target type to be
> > refcounted.
> >
> > Signed-off-by: Asahi Lina <lina+kernel@asahilina.net>
> > Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Co-developed-by: Alice Ryhl <aliceryhl@google.com>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > +/// An io page table using a specific format.
> > +///
> > +/// # Invariants
> > +///
> > +/// The pointer references a valid io page table.
> > +pub struct IoPageTable<F: IoPageTableFmt> {
> > + ptr: NonNull<bindings::io_pgtable_ops>,
> > + _marker: PhantomData<F>,
> > +}
> > +
> > +// SAFETY: `struct io_pgtable_ops` is not restricted to a single thread.
> > +unsafe impl<F: IoPageTableFmt> Send for IoPageTable<F> {}
> > +// SAFETY: `struct io_pgtable_ops` may be accessed concurrently.
> > +unsafe impl<F: IoPageTableFmt> Sync for IoPageTable<F> {}
> > +
> > +/// The format used by this page table.
> > +pub trait IoPageTableFmt: 'static {
> > + /// The value representing this format.
> > + const FORMAT: io_pgtable_fmt;
> > +}
> > +
> > +impl<F: IoPageTableFmt> IoPageTable<F> {
>
> I don’t see a reason to keep struct Foo and impl Foo separate.
>
> IMHO, these should always be together, as the first thing one wants
> to read after a type declaration is its implementation.
I thought it was pretty natural like this. First we describe the page
table, then we say it's thread safe, then we describe that a page table
must specify a FORMAT, then we describe that it has a constructor,
then we say you can map pages, etc. etc.
> > + /// Create a new `IoPageTable` as a device resource.
> > + #[inline]
> > + pub fn new(
> > + dev: &Device<Bound>,
> > + config: Config,
> > + ) -> impl PinInit<Devres<IoPageTable<F>>, Error> + '_ {
> > + // SAFETY: Devres ensures that the value is dropped during device unbind.
> > + Devres::new(dev, unsafe { Self::new_raw(dev, config) })
> > + }
> > +
> > + /// Create a new `IoPageTable`.
> > + ///
> > + /// # Safety
> > + ///
> > + /// If successful, then the returned value must be dropped before the device is unbound.
> > + #[inline]
> > + pub unsafe fn new_raw(dev: &Device<Bound>, config: Config) -> Result<IoPageTable<F>> {
> > + let mut raw_cfg = bindings::io_pgtable_cfg {
> > + quirks: config.quirks,
> > + pgsize_bitmap: config.pgsize_bitmap,
> > + ias: config.ias,
> > + oas: config.oas,
> > + coherent_walk: config.coherent_walk,
> > + tlb: &raw const NOOP_FLUSH_OPS,
> > + iommu_dev: dev.as_raw(),
> > + // SAFETY: All zeroes is a valid value for `struct io_pgtable_cfg`.
> > + ..unsafe { core::mem::zeroed() }
> > + };
> > +
> > + // SAFETY:
> > + // * The raw_cfg pointer is valid for the duration of this call.
> > + // * The provided `FLUSH_OPS` contains valid function pointers that accept a null pointer
> > + // as cookie.
> > + // * The caller ensures that the io pgtable does not outlive the device.
>
> We should probably tailor the sentence above for Devres?
Maybe "does not outlive device unbind" is better worded, but not sure
what you're looking for with Devres tailoring.
> > + let ops = unsafe {
> > + bindings::alloc_io_pgtable_ops(F::FORMAT, &mut raw_cfg, core::ptr::null_mut())
> > + };
>
> I’d add a blank here.
>
> > +impl<F: IoPageTableFmt> Drop for IoPageTable<F> {
> > + fn drop(&mut self) {
> > + // SAFETY: The caller of `ttbr` promised that the page table is not live when this
> > + // destructor runs.
>
>
> Not sure I understand this sentence. Perhaps we should remove the word “ttbr” from here? ttbr is a register.
ttbr is a method defined below with a safety requirement.
Alice
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] io: add io_pgtable abstraction
2025-12-19 11:43 ` Alice Ryhl
@ 2025-12-19 11:50 ` Daniel Almeida
2025-12-19 11:56 ` Alice Ryhl
0 siblings, 1 reply; 12+ messages in thread
From: Daniel Almeida @ 2025-12-19 11:50 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Will Deacon, Boris Brezillon, Robin Murphy,
Jason Gunthorpe, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
Joerg Roedel, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm
> On 19 Dec 2025, at 08:43, Alice Ryhl <aliceryhl@google.com> wrote:
>
> On Fri, Dec 19, 2025 at 08:04:17AM -0300, Daniel Almeida wrote:
>> Hi Alice,
>>
>>> On 19 Dec 2025, at 07:50, Alice Ryhl <aliceryhl@google.com> wrote:
>>>
>>> From: Asahi Lina <lina+kernel@asahilina.net>
>>>
>>> This will be used by the Tyr driver to create and modify the page table
>>> of each address space on the GPU. Each time a mapping gets created or
>>> removed by userspace, Tyr will call into GPUVM, which will figure out
>>> which calls to map_pages and unmap_pages are required to map the data in
>>> question in the page table so that the GPU may access those pages when
>>> using that address space.
>>>
>>> The Rust type wraps the struct using a raw pointer rather than the usual
>>> Opaque+ARef approach because Opaque+ARef requires the target type to be
>>> refcounted.
>>>
>>> Signed-off-by: Asahi Lina <lina+kernel@asahilina.net>
>>> Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
>>> Co-developed-by: Alice Ryhl <aliceryhl@google.com>
>>> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>
>>> +/// An io page table using a specific format.
>>> +///
>>> +/// # Invariants
>>> +///
>>> +/// The pointer references a valid io page table.
>>> +pub struct IoPageTable<F: IoPageTableFmt> {
>>> + ptr: NonNull<bindings::io_pgtable_ops>,
>>> + _marker: PhantomData<F>,
>>> +}
>>> +
>>> +// SAFETY: `struct io_pgtable_ops` is not restricted to a single thread.
>>> +unsafe impl<F: IoPageTableFmt> Send for IoPageTable<F> {}
>>> +// SAFETY: `struct io_pgtable_ops` may be accessed concurrently.
>>> +unsafe impl<F: IoPageTableFmt> Sync for IoPageTable<F> {}
>>> +
>>> +/// The format used by this page table.
>>> +pub trait IoPageTableFmt: 'static {
>>> + /// The value representing this format.
>>> + const FORMAT: io_pgtable_fmt;
>>> +}
>>> +
>>> +impl<F: IoPageTableFmt> IoPageTable<F> {
>>
>> I don’t see a reason to keep struct Foo and impl Foo separate.
>>
>> IMHO, these should always be together, as the first thing one wants
>> to read after a type declaration is its implementation.
>
> I thought it was pretty natural like this. First we describe the page
> table, then we say it's thread safe, then we describe that a page table
> must specify a FORMAT, then we describe that it has a constructor,
> then we say you can map pages, etc. etc.
Right, this is more a personal preference thing anyways. Fine with me if you
want to keep it like this.
>
>>> + /// Create a new `IoPageTable` as a device resource.
>>> + #[inline]
>>> + pub fn new(
>>> + dev: &Device<Bound>,
>>> + config: Config,
>>> + ) -> impl PinInit<Devres<IoPageTable<F>>, Error> + '_ {
>>> + // SAFETY: Devres ensures that the value is dropped during device unbind.
>>> + Devres::new(dev, unsafe { Self::new_raw(dev, config) })
>>> + }
>>> +
>>> + /// Create a new `IoPageTable`.
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// If successful, then the returned value must be dropped before the device is unbound.
>>> + #[inline]
>>> + pub unsafe fn new_raw(dev: &Device<Bound>, config: Config) -> Result<IoPageTable<F>> {
>>> + let mut raw_cfg = bindings::io_pgtable_cfg {
>>> + quirks: config.quirks,
>>> + pgsize_bitmap: config.pgsize_bitmap,
>>> + ias: config.ias,
>>> + oas: config.oas,
>>> + coherent_walk: config.coherent_walk,
>>> + tlb: &raw const NOOP_FLUSH_OPS,
>>> + iommu_dev: dev.as_raw(),
>>> + // SAFETY: All zeroes is a valid value for `struct io_pgtable_cfg`.
>>> + ..unsafe { core::mem::zeroed() }
>>> + };
>>> +
>>> + // SAFETY:
>>> + // * The raw_cfg pointer is valid for the duration of this call.
>>> + // * The provided `FLUSH_OPS` contains valid function pointers that accept a null pointer
>>> + // as cookie.
>>> + // * The caller ensures that the io pgtable does not outlive the device.
>>
>> We should probably tailor the sentence above for Devres?
>
> Maybe "does not outlive device unbind" is better worded, but not sure
> what you're looking for with Devres tailoring.
What about “Devres ensures that the io potable does not outlive device
unbind by revoking access”, or something along these lines?
>
>>> + let ops = unsafe {
>>> + bindings::alloc_io_pgtable_ops(F::FORMAT, &mut raw_cfg, core::ptr::null_mut())
>>> + };
>>
>> I’d add a blank here.
>>
>>> +impl<F: IoPageTableFmt> Drop for IoPageTable<F> {
>>> + fn drop(&mut self) {
>>> + // SAFETY: The caller of `ttbr` promised that the page table is not live when this
>>> + // destructor runs.
>>
>>
>> Not sure I understand this sentence. Perhaps we should remove the word “ttbr” from here? ttbr is a register.
>
> ttbr is a method defined below with a safety requirement.
Can't we link to that then? i.e.: [`ttbr`]: Self::ttbr, or whatever the right
syntax is. Because it’s more natural to think about ttbr the register vs
ttbr the method.
>
> Alice
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] io: add io_pgtable abstraction
2025-12-19 11:50 ` Daniel Almeida
@ 2025-12-19 11:56 ` Alice Ryhl
0 siblings, 0 replies; 12+ messages in thread
From: Alice Ryhl @ 2025-12-19 11:56 UTC (permalink / raw)
To: Daniel Almeida
Cc: Miguel Ojeda, Will Deacon, Boris Brezillon, Robin Murphy,
Jason Gunthorpe, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
Joerg Roedel, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm
On Fri, Dec 19, 2025 at 08:50:06AM -0300, Daniel Almeida wrote:
>
>
> > On 19 Dec 2025, at 08:43, Alice Ryhl <aliceryhl@google.com> wrote:
> >
> > On Fri, Dec 19, 2025 at 08:04:17AM -0300, Daniel Almeida wrote:
> >> Hi Alice,
> >>
> >>> On 19 Dec 2025, at 07:50, Alice Ryhl <aliceryhl@google.com> wrote:
> >>>
> >>> From: Asahi Lina <lina+kernel@asahilina.net>
> >>>
> >>> This will be used by the Tyr driver to create and modify the page table
> >>> of each address space on the GPU. Each time a mapping gets created or
> >>> removed by userspace, Tyr will call into GPUVM, which will figure out
> >>> which calls to map_pages and unmap_pages are required to map the data in
> >>> question in the page table so that the GPU may access those pages when
> >>> using that address space.
> >>>
> >>> The Rust type wraps the struct using a raw pointer rather than the usual
> >>> Opaque+ARef approach because Opaque+ARef requires the target type to be
> >>> refcounted.
> >>>
> >>> Signed-off-by: Asahi Lina <lina+kernel@asahilina.net>
> >>> Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
> >>> Co-developed-by: Alice Ryhl <aliceryhl@google.com>
> >>> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> >
> >>> +/// An io page table using a specific format.
> >>> +///
> >>> +/// # Invariants
> >>> +///
> >>> +/// The pointer references a valid io page table.
> >>> +pub struct IoPageTable<F: IoPageTableFmt> {
> >>> + ptr: NonNull<bindings::io_pgtable_ops>,
> >>> + _marker: PhantomData<F>,
> >>> +}
> >>> +
> >>> +// SAFETY: `struct io_pgtable_ops` is not restricted to a single thread.
> >>> +unsafe impl<F: IoPageTableFmt> Send for IoPageTable<F> {}
> >>> +// SAFETY: `struct io_pgtable_ops` may be accessed concurrently.
> >>> +unsafe impl<F: IoPageTableFmt> Sync for IoPageTable<F> {}
> >>> +
> >>> +/// The format used by this page table.
> >>> +pub trait IoPageTableFmt: 'static {
> >>> + /// The value representing this format.
> >>> + const FORMAT: io_pgtable_fmt;
> >>> +}
> >>> +
> >>> +impl<F: IoPageTableFmt> IoPageTable<F> {
> >>
> >> I don’t see a reason to keep struct Foo and impl Foo separate.
> >>
> >> IMHO, these should always be together, as the first thing one wants
> >> to read after a type declaration is its implementation.
> >
> > I thought it was pretty natural like this. First we describe the page
> > table, then we say it's thread safe, then we describe that a page table
> > must specify a FORMAT, then we describe that it has a constructor,
> > then we say you can map pages, etc. etc.
>
> Right, this is more a personal preference thing anyways. Fine with me if you
> want to keep it like this.
>
> >
> >>> + /// Create a new `IoPageTable` as a device resource.
> >>> + #[inline]
> >>> + pub fn new(
> >>> + dev: &Device<Bound>,
> >>> + config: Config,
> >>> + ) -> impl PinInit<Devres<IoPageTable<F>>, Error> + '_ {
> >>> + // SAFETY: Devres ensures that the value is dropped during device unbind.
> >>> + Devres::new(dev, unsafe { Self::new_raw(dev, config) })
> >>> + }
> >>> +
> >>> + /// Create a new `IoPageTable`.
> >>> + ///
> >>> + /// # Safety
> >>> + ///
> >>> + /// If successful, then the returned value must be dropped before the device is unbound.
> >>> + #[inline]
> >>> + pub unsafe fn new_raw(dev: &Device<Bound>, config: Config) -> Result<IoPageTable<F>> {
> >>> + let mut raw_cfg = bindings::io_pgtable_cfg {
> >>> + quirks: config.quirks,
> >>> + pgsize_bitmap: config.pgsize_bitmap,
> >>> + ias: config.ias,
> >>> + oas: config.oas,
> >>> + coherent_walk: config.coherent_walk,
> >>> + tlb: &raw const NOOP_FLUSH_OPS,
> >>> + iommu_dev: dev.as_raw(),
> >>> + // SAFETY: All zeroes is a valid value for `struct io_pgtable_cfg`.
> >>> + ..unsafe { core::mem::zeroed() }
> >>> + };
> >>> +
> >>> + // SAFETY:
> >>> + // * The raw_cfg pointer is valid for the duration of this call.
> >>> + // * The provided `FLUSH_OPS` contains valid function pointers that accept a null pointer
> >>> + // as cookie.
> >>> + // * The caller ensures that the io pgtable does not outlive the device.
> >>
> >> We should probably tailor the sentence above for Devres?
> >
> > Maybe "does not outlive device unbind" is better worded, but not sure
> > what you're looking for with Devres tailoring.
>
> What about “Devres ensures that the io potable does not outlive device
> unbind by revoking access”, or something along these lines?
The `new_raw` method does not require the caller to use `Devres` to do
that, so it's not necessarily the case that it is Devres that ensures
this. An end-user could call `new_raw` directly and use some other
mechanism if they wish.
> >>> + let ops = unsafe {
> >>> + bindings::alloc_io_pgtable_ops(F::FORMAT, &mut raw_cfg, core::ptr::null_mut())
> >>> + };
> >>
> >> I’d add a blank here.
> >>
> >>> +impl<F: IoPageTableFmt> Drop for IoPageTable<F> {
> >>> + fn drop(&mut self) {
> >>> + // SAFETY: The caller of `ttbr` promised that the page table is not live when this
> >>> + // destructor runs.
> >>
> >>
> >> Not sure I understand this sentence. Perhaps we should remove the word “ttbr” from here? ttbr is a register.
> >
> > ttbr is a method defined below with a safety requirement.
>
> Can't we link to that then? i.e.: [`ttbr`]: Self::ttbr, or whatever the right
> syntax is. Because it’s more natural to think about ttbr the register vs
> ttbr the method.
This isn't a doc comment. There's no such thing as a link in normal
comments. But I can write:
The caller of `Self::ttbr()` promised that the ...
Alice
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] io: add io_pgtable abstraction
2025-12-19 10:50 [PATCH v4] io: add io_pgtable abstraction Alice Ryhl
2025-12-19 11:04 ` Daniel Almeida
@ 2025-12-19 14:05 ` Jason Gunthorpe
2025-12-19 14:38 ` Alice Ryhl
2025-12-21 0:06 ` kernel test robot
2 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2025-12-19 14:05 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Will Deacon, Daniel Almeida, Boris Brezillon,
Robin Murphy, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
Joerg Roedel, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm
On Fri, Dec 19, 2025 at 10:50:52AM +0000, Alice Ryhl wrote:
> +// For now, we do not provide the ability to flush the TLB via the built-in callback mechanism.
> +// Instead, the `map_pages` function requires the caller to explicitly flush the TLB before the
> +// pgtable is used to access the newly created range.
> +//
> +// This is done because the initial user of this abstraction may perform many calls to `map_pages`
> +// in a single batched operation, and wishes to only flush the TLB once after performing the entire
> +// batch of mappings. These callbacks would flush too often for that use-case.
> +//
> +// Support for flushing the TLB in these callbacks may be added in the future.
> +static NOOP_FLUSH_OPS: bindings::iommu_flush_ops = bindings::iommu_flush_ops {
> + tlb_flush_all: Some(rust_tlb_flush_all_noop),
> + tlb_flush_walk: Some(rust_tlb_flush_walk_noop),
> + tlb_add_page: None,
> +};
This comment seems quite off..
Usually you don't flush on map, you flush on unmap. The TLB should be
empty upon mapping and not need flushing - except for the rarer
special cases of clearing the walk cache which cannot be detected any
other way than using these callbacks. Doing a big flush on map to deal
with the walk cache would be worse than implementing these callbacks.
The flush on unmap, at least for ARM style invalidations, needs these
callbacks because they provide required information. If the actual HW
does not use an ARM style invalidation system then this page table
code is not optimal for it.
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] io: add io_pgtable abstraction
2025-12-19 14:05 ` Jason Gunthorpe
@ 2025-12-19 14:38 ` Alice Ryhl
2025-12-19 15:11 ` Boris Brezillon
0 siblings, 1 reply; 12+ messages in thread
From: Alice Ryhl @ 2025-12-19 14:38 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Miguel Ojeda, Will Deacon, Daniel Almeida, Boris Brezillon,
Robin Murphy, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
Joerg Roedel, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm
On Fri, Dec 19, 2025 at 10:05:57AM -0400, Jason Gunthorpe wrote:
> On Fri, Dec 19, 2025 at 10:50:52AM +0000, Alice Ryhl wrote:
> > +// For now, we do not provide the ability to flush the TLB via the built-in callback mechanism.
> > +// Instead, the `map_pages` function requires the caller to explicitly flush the TLB before the
> > +// pgtable is used to access the newly created range.
> > +//
> > +// This is done because the initial user of this abstraction may perform many calls to `map_pages`
> > +// in a single batched operation, and wishes to only flush the TLB once after performing the entire
> > +// batch of mappings. These callbacks would flush too often for that use-case.
> > +//
> > +// Support for flushing the TLB in these callbacks may be added in the future.
> > +static NOOP_FLUSH_OPS: bindings::iommu_flush_ops = bindings::iommu_flush_ops {
> > + tlb_flush_all: Some(rust_tlb_flush_all_noop),
> > + tlb_flush_walk: Some(rust_tlb_flush_walk_noop),
> > + tlb_add_page: None,
> > +};
>
> This comment seems quite off..
>
> Usually you don't flush on map, you flush on unmap. The TLB should be
> empty upon mapping and not need flushing - except for the rarer
> special cases of clearing the walk cache which cannot be detected any
> other way than using these callbacks. Doing a big flush on map to deal
> with the walk cache would be worse than implementing these callbacks.
>
> The flush on unmap, at least for ARM style invalidations, needs these
> callbacks because they provide required information. If the actual HW
> does not use an ARM style invalidation system then this page table
> code is not optimal for it.
You should not assume that the way I worded something implies that the
GPU hardware does something weird. It's more likely that I just got
something wrong.
It looks like panthor / tyr flush the range that was modified after both
map and unmap operations.
Alice
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] io: add io_pgtable abstraction
2025-12-19 14:38 ` Alice Ryhl
@ 2025-12-19 15:11 ` Boris Brezillon
2025-12-19 15:14 ` Jason Gunthorpe
0 siblings, 1 reply; 12+ messages in thread
From: Boris Brezillon @ 2025-12-19 15:11 UTC (permalink / raw)
To: Alice Ryhl
Cc: Jason Gunthorpe, Miguel Ojeda, Will Deacon, Daniel Almeida,
Robin Murphy, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
Joerg Roedel, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm
On Fri, 19 Dec 2025 14:38:51 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> On Fri, Dec 19, 2025 at 10:05:57AM -0400, Jason Gunthorpe wrote:
> > On Fri, Dec 19, 2025 at 10:50:52AM +0000, Alice Ryhl wrote:
> > > +// For now, we do not provide the ability to flush the TLB via the built-in callback mechanism.
> > > +// Instead, the `map_pages` function requires the caller to explicitly flush the TLB before the
> > > +// pgtable is used to access the newly created range.
> > > +//
> > > +// This is done because the initial user of this abstraction may perform many calls to `map_pages`
> > > +// in a single batched operation, and wishes to only flush the TLB once after performing the entire
> > > +// batch of mappings. These callbacks would flush too often for that use-case.
> > > +//
> > > +// Support for flushing the TLB in these callbacks may be added in the future.
> > > +static NOOP_FLUSH_OPS: bindings::iommu_flush_ops = bindings::iommu_flush_ops {
> > > + tlb_flush_all: Some(rust_tlb_flush_all_noop),
> > > + tlb_flush_walk: Some(rust_tlb_flush_walk_noop),
> > > + tlb_add_page: None,
> > > +};
> >
> > This comment seems quite off..
> >
> > Usually you don't flush on map, you flush on unmap. The TLB should be
> > empty upon mapping and not need flushing - except for the rarer
> > special cases of clearing the walk cache which cannot be detected any
> > other way than using these callbacks. Doing a big flush on map to deal
> > with the walk cache would be worse than implementing these callbacks.
> >
> > The flush on unmap, at least for ARM style invalidations, needs these
> > callbacks because they provide required information. If the actual HW
> > does not use an ARM style invalidation system then this page table
> > code is not optimal for it.
>
> You should not assume that the way I worded something implies that the
> GPU hardware does something weird. It's more likely that I just got
> something wrong.
>
> It looks like panthor / tyr flush the range that was modified after both
> map and unmap operations.
There's actually a confusion between TLB invalidation and L1/L2 cache
flush/invalidation. The things we can decide to flush/invalidate around
map/unmap ops are L1/L2 caches. The TLB invalidate, we don't have
direct control on: it happens as part of the LOCK+UNLOCK sequence, and
no matter what you execute (map or unmap), you have to surround it with
a LOCK/UNLOCK to provide support for atomic updates (GPU is blocked if
anything accesses the locked range while an update is on-going).
Robin, feel free to correct me if I'm wrong.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] io: add io_pgtable abstraction
2025-12-19 15:11 ` Boris Brezillon
@ 2025-12-19 15:14 ` Jason Gunthorpe
2025-12-19 15:27 ` Boris Brezillon
0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2025-12-19 15:14 UTC (permalink / raw)
To: Boris Brezillon
Cc: Alice Ryhl, Miguel Ojeda, Will Deacon, Daniel Almeida,
Robin Murphy, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
Joerg Roedel, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm
On Fri, Dec 19, 2025 at 04:11:53PM +0100, Boris Brezillon wrote:
> There's actually a confusion between TLB invalidation and L1/L2 cache
> flush/invalidation. The things we can decide to flush/invalidate around
> map/unmap ops are L1/L2 caches. The TLB invalidate, we don't have
> direct control on: it happens as part of the LOCK+UNLOCK sequence, and
> no matter what you execute (map or unmap), you have to surround it with
> a LOCK/UNLOCK to provide support for atomic updates (GPU is blocked if
> anything accesses the locked range while an update is on-going).
That makes more sense, so these GPU drivers just flush the entire TLB
every time they change it - built into the UNLOCK operation?
Sounds slow, but is a reasonable explanation why they don't use the
callbacks.
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] io: add io_pgtable abstraction
2025-12-19 15:14 ` Jason Gunthorpe
@ 2025-12-19 15:27 ` Boris Brezillon
2025-12-19 17:32 ` Jason Gunthorpe
0 siblings, 1 reply; 12+ messages in thread
From: Boris Brezillon @ 2025-12-19 15:27 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alice Ryhl, Miguel Ojeda, Will Deacon, Daniel Almeida,
Robin Murphy, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
Joerg Roedel, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm
On Fri, 19 Dec 2025 11:14:34 -0400
Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Fri, Dec 19, 2025 at 04:11:53PM +0100, Boris Brezillon wrote:
>
> > There's actually a confusion between TLB invalidation and L1/L2 cache
> > flush/invalidation. The things we can decide to flush/invalidate around
> > map/unmap ops are L1/L2 caches. The TLB invalidate, we don't have
> > direct control on: it happens as part of the LOCK+UNLOCK sequence, and
> > no matter what you execute (map or unmap), you have to surround it with
> > a LOCK/UNLOCK to provide support for atomic updates (GPU is blocked if
> > anything accesses the locked range while an update is on-going).
>
> That makes more sense, so these GPU drivers just flush the entire TLB
> every time they change it - built into the UNLOCK operation?
I don't have implementation details, so I can't really tell what
happens internally. What's sure is that LOCK takes a range, so they
might be optimizing the TLB flush to only evict entries covered by this
range, dunno.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] io: add io_pgtable abstraction
2025-12-19 15:27 ` Boris Brezillon
@ 2025-12-19 17:32 ` Jason Gunthorpe
0 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2025-12-19 17:32 UTC (permalink / raw)
To: Boris Brezillon
Cc: Alice Ryhl, Miguel Ojeda, Will Deacon, Daniel Almeida,
Robin Murphy, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
Joerg Roedel, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm
On Fri, Dec 19, 2025 at 04:27:34PM +0100, Boris Brezillon wrote:
> On Fri, 19 Dec 2025 11:14:34 -0400
> Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> > On Fri, Dec 19, 2025 at 04:11:53PM +0100, Boris Brezillon wrote:
> >
> > > There's actually a confusion between TLB invalidation and L1/L2 cache
> > > flush/invalidation. The things we can decide to flush/invalidate around
> > > map/unmap ops are L1/L2 caches. The TLB invalidate, we don't have
> > > direct control on: it happens as part of the LOCK+UNLOCK sequence, and
> > > no matter what you execute (map or unmap), you have to surround it with
> > > a LOCK/UNLOCK to provide support for atomic updates (GPU is blocked if
> > > anything accesses the locked range while an update is on-going).
> >
> > That makes more sense, so these GPU drivers just flush the entire TLB
> > every time they change it - built into the UNLOCK operation?
>
> I don't have implementation details, so I can't really tell what
> happens internally. What's sure is that LOCK takes a range, so they
> might be optimizing the TLB flush to only evict entries covered by this
> range, dunno.
So, I'd probably just simplify that comment:
For the initial users of these rust bindings the GPU FW is managing the
IOTLB and performs all required invalidations using a range. There is no
need for it get ARM style invalidation instructions from the page
table code.
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] io: add io_pgtable abstraction
2025-12-19 10:50 [PATCH v4] io: add io_pgtable abstraction Alice Ryhl
2025-12-19 11:04 ` Daniel Almeida
2025-12-19 14:05 ` Jason Gunthorpe
@ 2025-12-21 0:06 ` kernel test robot
2 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2025-12-21 0:06 UTC (permalink / raw)
To: Alice Ryhl, Miguel Ojeda, Will Deacon, Daniel Almeida,
Boris Brezillon, Robin Murphy, Jason Gunthorpe
Cc: oe-kbuild-all, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
Joerg Roedel, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm, Alice Ryhl
Hi Alice,
kernel test robot noticed the following build warnings:
[auto build test WARNING on 3e7f562e20ee87a25e104ef4fce557d39d62fa85]
url: https://github.com/intel-lab-lkp/linux/commits/Alice-Ryhl/io-add-io_pgtable-abstraction/20251219-185557
base: 3e7f562e20ee87a25e104ef4fce557d39d62fa85
patch link: https://lore.kernel.org/r/20251219-io-pgtable-v4-1-68aaa7a40380%40google.com
patch subject: [PATCH v4] io: add io_pgtable abstraction
config: x86_64-rhel-9.4-rust (https://download.01.org/0day-ci/archive/20251221/202512210130.LVNt1EfI-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251221/202512210130.LVNt1EfI-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512210130.LVNt1EfI-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> warning: this function has a `#[must_use]` attribute with no message, but returns a type already marked as `#[must_use]`
--> rust/kernel/io/pgtable.rs:162:5
|
162 | / pub unsafe fn map_pages(
163 | | &self,
164 | | iova: usize,
165 | | paddr: PhysAddr,
... |
169 | | flags: alloc::Flags,
170 | | ) -> (usize, Result) {
| |________________________^
|
= help: either add some descriptive message or remove the attribute
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#double_must_use
= note: `-W clippy::double-must-use` implied by `-W clippy::all`
= help: to override `-W clippy::all` add `#[allow(clippy::double_must_use)]`
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-12-21 0:07 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-19 10:50 [PATCH v4] io: add io_pgtable abstraction Alice Ryhl
2025-12-19 11:04 ` Daniel Almeida
2025-12-19 11:43 ` Alice Ryhl
2025-12-19 11:50 ` Daniel Almeida
2025-12-19 11:56 ` Alice Ryhl
2025-12-19 14:05 ` Jason Gunthorpe
2025-12-19 14:38 ` Alice Ryhl
2025-12-19 15:11 ` Boris Brezillon
2025-12-19 15:14 ` Jason Gunthorpe
2025-12-19 15:27 ` Boris Brezillon
2025-12-19 17:32 ` Jason Gunthorpe
2025-12-21 0:06 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).