[PATCH v12 0/3] rust: add dma coherent allocator abstraction

rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v12 0/3] rust: add dma coherent allocator abstraction
@ 2025-02-24 11:49 Abdiel Janulgue
  2025-02-24 11:49 ` [PATCH v12 1/3] rust: error: Add EOVERFLOW Abdiel Janulgue
                   ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Abdiel Janulgue @ 2025-02-24 11:49 UTC (permalink / raw)
  To: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS,
	Abdiel Janulgue

This series adds the rust bindings for the dma coherent allocator which
is needed for nova-core[0]. This is tested on a Nvidia GA104 GPU device
using PoC code which parses and loads the GSP firmware via DMA.

Changes since v11:
- Ensure robust handling for potential pointer arithmetic overflows
  in bounds checking (Petr Tesařík).
- Add dma mask helpers (Daniel Almeida).
- Clarification in the safety aspects of the as_slice/as_slice_mut API,
  Use ManuallyDrop trait as a replacement for into_parts(),
  Add dma_read!/dma_write! helper macros (Alice Ryhl).
- Link to v11: https://lore.kernel.org/lkml/20250123104333.1340512-1-abdiel.janulgue@gmail.com/

Changes since v10:
- rename read() to as_slice() (Boqun Feng)
- Do a bitwise copy of ARef<Device> in into_parts() when returning the
  device reference (Alice Ryhl).

Changes since v9:
- Use ARef<Device> in the constructor arguments, docs clarification avoid
  manually dropping the refcount for the device in into_parts(), use
  add() instead of wrapping_add() in the pointer arithmetic for performance
  (Alice Ryhl).

Changes since v8:
- Add MAINTAINERS entry
- Fix build issues due to switch from core::ffi to crate:ffi in bindgen.
- Ensure the wrapped attribute is non-pub in struct Attrs, declare it 
  #[repr(transparent)] as well (Daniel Sedlak)

Changes since v7:
- Remove cpu_buf() and cpu_buf_mut() as exporting a r/w interface via
  a slice is undefined behaviour due to slice's requirement that the
  underlying pointer should not be modified (Alice Ryhl, Robin Murphy).
- Reintroduce r/w helpers instead which includes proper safety
  invariants (Daniel Almeida).

Changes since v6:
- Include the dma_attrs in the constructor, use alloc::Flags as inpiration

Changes since v5:
- Remove unnecessary lifetime annotation when returning the CPU buffer.

Changes since v4:
- Documentation and example fixes, use Markdown formatting (Miguel Ojeda).
- Discard read()/write() helpers to remove bound on Copy and fix overhead
  (Daniel Almeida).
- Improve error-handling in the constructor block (Andreas Hindborg).

Changes since v3:
- Reject ZST types by checking the type size in the constructor in
  addition to requiring FromBytes/AsBytes traits for the type (Alice Ryhl).

Changes since v2:
- Fixed missing header for generating the bindings.

Changes since v1:
- Fix missing info in commit log where EOVERFLOW is used.
- Restrict the dma coherent allocator to numeric types for now for valid
  behaviour (Daniel Almeida).
- Build slice dynamically.

[0] https://lore.kernel.org/lkml/20250131220432.17717-1-dakr@kernel.org/

Abdiel Janulgue (3):
  rust: error: Add EOVERFLOW
  rust: add dma coherent allocator abstraction.
  MAINTAINERS: add entry for Rust dma mapping helpers device driver API

 MAINTAINERS                     |  12 +
 rust/bindings/bindings_helper.h |   1 +
 rust/helpers/dma.c              |  14 ++
 rust/helpers/helpers.c          |   1 +
 rust/kernel/dma.rs              | 411 ++++++++++++++++++++++++++++++++
 rust/kernel/error.rs            |   1 +
 rust/kernel/lib.rs              |   1 +
 7 files changed, 441 insertions(+)
 create mode 100644 rust/helpers/dma.c
 create mode 100644 rust/kernel/dma.rs


base-commit: beeb78d46249cab8b2b8359a2ce8fa5376b5ad2d
-- 
2.43.0


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v12 1/3] rust: error: Add EOVERFLOW
  2025-02-24 11:49 [PATCH v12 0/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
@ 2025-02-24 11:49 ` Abdiel Janulgue
  2025-02-24 13:11   ` Andreas Hindborg
  2025-02-24 11:49 ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
  2025-02-24 11:49 ` [PATCH v12 3/3] MAINTAINERS: add entry for Rust dma mapping helpers device driver API Abdiel Janulgue
  2 siblings, 1 reply; 70+ messages in thread
From: Abdiel Janulgue @ 2025-02-24 11:49 UTC (permalink / raw)
  To: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS,
	Abdiel Janulgue

Trivial addition for missing EOVERFLOW error. This is used by a
subsequent patch that might require returning EOVERFLOW as a result
of `checked_mul`.

Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/kernel/error.rs | 1 +
 1 file changed, 1 insertion(+)

diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index f6ecf09cb65f..1e510181432c 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -64,6 +64,7 @@ macro_rules! declare_err {
     declare_err!(EPIPE, "Broken pipe.");
     declare_err!(EDOM, "Math argument out of domain of func.");
     declare_err!(ERANGE, "Math result not representable.");
+    declare_err!(EOVERFLOW, "Value too large for defined data type.");
     declare_err!(ERESTARTSYS, "Restart the system call.");
     declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal and will be restarted.");
     declare_err!(ERESTARTNOHAND, "Restart if no handler.");

base-commit: beeb78d46249cab8b2b8359a2ce8fa5376b5ad2d
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 11:49 [PATCH v12 0/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
  2025-02-24 11:49 ` [PATCH v12 1/3] rust: error: Add EOVERFLOW Abdiel Janulgue
@ 2025-02-24 11:49 ` Abdiel Janulgue
  2025-02-24 13:21   ` Alice Ryhl
                     ` (7 more replies)
  2025-02-24 11:49 ` [PATCH v12 3/3] MAINTAINERS: add entry for Rust dma mapping helpers device driver API Abdiel Janulgue
  2 siblings, 8 replies; 70+ messages in thread
From: Abdiel Janulgue @ 2025-02-24 11:49 UTC (permalink / raw)
  To: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS,
	Abdiel Janulgue

Add a simple dma coherent allocator rust abstraction. Based on
Andreas Hindborg's dma abstractions from the rnvme driver, which
was also based on earlier work by Wedson Almeida Filho.

Nacked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
---
 rust/bindings/bindings_helper.h |   1 +
 rust/helpers/dma.c              |  13 +
 rust/helpers/helpers.c          |   1 +
 rust/kernel/dma.rs              | 421 ++++++++++++++++++++++++++++++++
 rust/kernel/lib.rs              |   1 +
 5 files changed, 437 insertions(+)
 create mode 100644 rust/helpers/dma.c
 create mode 100644 rust/kernel/dma.rs

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 55354e4dec14..f69b05025e52 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -11,6 +11,7 @@
 #include <linux/blk_types.h>
 #include <linux/blkdev.h>
 #include <linux/cred.h>
+#include <linux/dma-mapping.h>
 #include <linux/errname.h>
 #include <linux/ethtool.h>
 #include <linux/file.h>
diff --git a/rust/helpers/dma.c b/rust/helpers/dma.c
new file mode 100644
index 000000000000..30da079d366c
--- /dev/null
+++ b/rust/helpers/dma.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/dma-mapping.h>
+
+int rust_helper_dma_set_mask_and_coherent(struct device *dev, u64 mask)
+{
+	return dma_set_mask_and_coherent(dev, mask);
+}
+
+int rust_helper_dma_set_mask(struct device *dev, u64 mask)
+{
+	return dma_set_mask(dev, mask);
+}
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 0640b7e115be..8f3808c8b7fe 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -13,6 +13,7 @@
 #include "build_bug.c"
 #include "cred.c"
 #include "device.c"
+#include "dma.c"
 #include "err.c"
 #include "fs.c"
 #include "io.c"
diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
new file mode 100644
index 000000000000..b4dd5d411711
--- /dev/null
+++ b/rust/kernel/dma.rs
@@ -0,0 +1,421 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Direct memory access (DMA).
+//!
+//! C header: [`include/linux/dma-mapping.h`](srctree/include/linux/dma-mapping.h)
+
+use crate::{
+    bindings, build_assert,
+    device::Device,
+    error::code::*,
+    error::Result,
+    transmute::{AsBytes, FromBytes},
+    types::ARef,
+};
+
+/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
+/// both streaming and coherent APIs together.
+pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
+    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
+    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
+}
+
+/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
+pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
+    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
+    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
+}
+
+/// Possible attributes associated with a DMA mapping.
+///
+/// They can be combined with the operators `|`, `&`, and `!`.
+///
+/// Values can be used from the [`attrs`] module.
+#[derive(Clone, Copy, PartialEq)]
+#[repr(transparent)]
+pub struct Attrs(u32);
+
+impl Attrs {
+    /// Get the raw representation of this attribute.
+    pub(crate) fn as_raw(self) -> crate::ffi::c_ulong {
+        self.0 as _
+    }
+
+    /// Check whether `flags` is contained in `self`.
+    pub fn contains(self, flags: Attrs) -> bool {
+        (self & flags) == flags
+    }
+}
+
+impl core::ops::BitOr for Attrs {
+    type Output = Self;
+    fn bitor(self, rhs: Self) -> Self::Output {
+        Self(self.0 | rhs.0)
+    }
+}
+
+impl core::ops::BitAnd for Attrs {
+    type Output = Self;
+    fn bitand(self, rhs: Self) -> Self::Output {
+        Self(self.0 & rhs.0)
+    }
+}
+
+impl core::ops::Not for Attrs {
+    type Output = Self;
+    fn not(self) -> Self::Output {
+        Self(!self.0)
+    }
+}
+
+/// DMA mapping attrributes.
+pub mod attrs {
+    use super::Attrs;
+
+    /// Specifies that reads and writes to the mapping may be weakly ordered, that is that reads
+    /// and writes may pass each other.
+    pub const DMA_ATTR_WEAK_ORDERING: Attrs = Attrs(bindings::DMA_ATTR_WEAK_ORDERING);
+
+    /// Specifies that writes to the mapping may be buffered to improve performance.
+    pub const DMA_ATTR_WRITE_COMBINE: Attrs = Attrs(bindings::DMA_ATTR_WRITE_COMBINE);
+
+    /// Lets the platform to avoid creating a kernel virtual mapping for the allocated buffer.
+    pub const DMA_ATTR_NO_KERNEL_MAPPING: Attrs = Attrs(bindings::DMA_ATTR_NO_KERNEL_MAPPING);
+
+    /// Allows platform code to skip synchronization of the CPU cache for the given buffer assuming
+    /// that it has been already transferred to 'device' domain.
+    pub const DMA_ATTR_SKIP_CPU_SYNC: Attrs = Attrs(bindings::DMA_ATTR_SKIP_CPU_SYNC);
+
+    /// Forces contiguous allocation of the buffer in physical memory.
+    pub const DMA_ATTR_FORCE_CONTIGUOUS: Attrs = Attrs(bindings::DMA_ATTR_FORCE_CONTIGUOUS);
+
+    /// This is a hint to the DMA-mapping subsystem that it's probably not worth the time to try
+    /// to allocate memory to in a way that gives better TLB efficiency.
+    pub const DMA_ATTR_ALLOC_SINGLE_PAGES: Attrs = Attrs(bindings::DMA_ATTR_ALLOC_SINGLE_PAGES);
+
+    /// This tells the DMA-mapping subsystem to suppress allocation failure reports (similarly to
+    /// __GFP_NOWARN).
+    pub const DMA_ATTR_NO_WARN: Attrs = Attrs(bindings::DMA_ATTR_NO_WARN);
+
+    /// Used to indicate that the buffer is fully accessible at an elevated privilege level (and
+    /// ideally inaccessible or at least read-only at lesser-privileged levels).
+    pub const DMA_ATTR_PRIVILEGED: Attrs = Attrs(bindings::DMA_ATTR_PRIVILEGED);
+}
+
+/// An abstraction of the `dma_alloc_coherent` API.
+///
+/// This is an abstraction around the `dma_alloc_coherent` API which is used to allocate and map
+/// large consistent DMA regions.
+///
+/// A [`CoherentAllocation`] instance contains a pointer to the allocated region (in the
+/// processor's virtual address space) and the device address which can be given to the device
+/// as the DMA address base of the region. The region is released once [`CoherentAllocation`]
+/// is dropped.
+///
+/// # Invariants
+///
+/// For the lifetime of an instance of [`CoherentAllocation`], the cpu address is a valid pointer
+/// to an allocated region of consistent memory and we hold a reference to the device.
+pub struct CoherentAllocation<T: AsBytes + FromBytes> {
+    dev: ARef<Device>,
+    dma_handle: bindings::dma_addr_t,
+    count: usize,
+    cpu_addr: *mut T,
+    dma_attrs: Attrs,
+}
+
+impl<T: AsBytes + FromBytes> CoherentAllocation<T> {
+    /// Allocates a region of `size_of::<T> * count` of consistent memory.
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// use kernel::device::Device;
+    /// use kernel::dma::{attrs::*, CoherentAllocation};
+    ///
+    /// # fn test(dev: &Device) -> Result {
+    /// let c: CoherentAllocation<u64> = CoherentAllocation::alloc_attrs(dev.into(), 4, GFP_KERNEL,
+    ///                                                                  DMA_ATTR_NO_WARN)?;
+    /// # Ok::<(), Error>(()) }
+    /// ```
+    pub fn alloc_attrs(
+        dev: ARef<Device>,
+        count: usize,
+        gfp_flags: kernel::alloc::Flags,
+        dma_attrs: Attrs,
+    ) -> Result<CoherentAllocation<T>> {
+        build_assert!(
+            core::mem::size_of::<T>() > 0,
+            "It doesn't make sense for the allocated type to be a ZST"
+        );
+
+        let size = count
+            .checked_mul(core::mem::size_of::<T>())
+            .ok_or(EOVERFLOW)?;
+        let mut dma_handle = 0;
+        // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
+        // We ensure that we catch the failure on this function and throw an ENOMEM
+        let ret = unsafe {
+            bindings::dma_alloc_attrs(
+                dev.as_raw(),
+                size,
+                &mut dma_handle,
+                gfp_flags.as_raw(),
+                dma_attrs.as_raw(),
+            )
+        };
+        if ret.is_null() {
+            return Err(ENOMEM);
+        }
+        // INVARIANT: We just successfully allocated a coherent region which is accessible for
+        // `count` elements, hence the cpu address is valid. We also hold a refcounted reference
+        // to the device.
+        Ok(Self {
+            dev,
+            dma_handle,
+            count,
+            cpu_addr: ret as *mut T,
+            dma_attrs,
+        })
+    }
+
+    /// Performs the same functionality as `alloc_attrs`, except the `dma_attrs` is 0 by default.
+    pub fn alloc_coherent(
+        dev: ARef<Device>,
+        count: usize,
+        gfp_flags: kernel::alloc::Flags,
+    ) -> Result<CoherentAllocation<T>> {
+        CoherentAllocation::alloc_attrs(dev, count, gfp_flags, Attrs(0))
+    }
+
+    /// Create a duplicate of the `CoherentAllocation` object but prevent it from being dropped.
+    pub fn skip_drop(self) -> CoherentAllocation<T> {
+        let me = core::mem::ManuallyDrop::new(self);
+        Self {
+            // SAFETY: The refcount of `dev` will not be decremented because this doesn't actually
+            // duplicafe `ARef` and the use of `ManuallyDrop` forgets the originals.
+            dev: unsafe { core::ptr::read(&me.dev) },
+            dma_handle: me.dma_handle,
+            count: me.count,
+            cpu_addr: me.cpu_addr,
+            dma_attrs: me.dma_attrs,
+        }
+    }
+
+    /// Returns the base address to the allocated region in the CPU's virtual address space.
+    pub fn start_ptr(&self) -> *const T {
+        self.cpu_addr
+    }
+
+    /// Returns the base address to the allocated region in the CPU's virtual address space as
+    /// a mutable pointer.
+    pub fn start_ptr_mut(&mut self) -> *mut T {
+        self.cpu_addr
+    }
+
+    /// Returns a DMA handle which may given to the device as the DMA address base of
+    /// the region.
+    pub fn dma_handle(&self) -> bindings::dma_addr_t {
+        self.dma_handle
+    }
+
+    /// Returns the data from the region starting from `offset` as a slice.
+    /// `offset` and `count` are in units of `T`, not the number of bytes.
+    ///
+    /// Due to the safety requirements of slice, the caller should consider that the region could
+    /// be modified by the device at anytime (see the safety block below). For ringbuffer type of
+    /// r/w access or use-cases where the pointer to the live data is needed, `start_ptr()` or
+    /// `start_ptr_mut()` could be used instead.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that no hardware operations that involve the buffer are currently
+    /// taking place while the returned slice is live.
+    pub unsafe fn as_slice(&self, offset: usize, count: usize) -> Result<&[T]> {
+        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
+        if end >= self.count {
+            return Err(EINVAL);
+        }
+        // SAFETY:
+        // - The pointer is valid due to type invariant on `CoherentAllocation`,
+        // we've just checked that the range and index is within bounds. The immutability of the
+        // of data is also guaranteed by the safety requirements of the function.
+        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
+        // that `self.count` won't overflow early in the constructor.
+        Ok(unsafe { core::slice::from_raw_parts(self.cpu_addr.add(offset), count) })
+    }
+
+    /// Performs the same functionality as `as_slice`, except that a mutable slice is returned.
+    /// See that method for documentation and safety requirements.
+    ///
+    /// # Safety
+    ///
+    /// It is the callers responsibility to avoid separate read and write accesses to the region
+    /// while the returned slice is live.
+    pub unsafe fn as_slice_mut(&self, offset: usize, count: usize) -> Result<&mut [T]> {
+        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
+        if end >= self.count {
+            return Err(EINVAL);
+        }
+        // SAFETY:
+        // - The pointer is valid due to type invariant on `CoherentAllocation`,
+        // we've just checked that the range and index is within bounds. The immutability of the
+        // of data is also guaranteed by the safety requirements of the function.
+        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
+        // that `self.count` won't overflow early in the constructor.
+        Ok(unsafe { core::slice::from_raw_parts_mut(self.cpu_addr.add(offset), count) })
+    }
+
+    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
+    /// number of bytes.
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
+    /// let somedata: [u8; 4] = [0xf; 4];
+    /// let buf: &[u8] = &somedata;
+    /// alloc.write(buf, 0)?;
+    /// # Ok::<(), Error>(()) }
+    /// ```
+    pub fn write(&self, src: &[T], offset: usize) -> Result {
+        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
+        if end >= self.count {
+            return Err(EINVAL);
+        }
+        // SAFETY:
+        // - The pointer is valid due to type invariant on `CoherentAllocation`
+        // and we've just checked that the range and index is within bounds.
+        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
+        // that `self.count` won't overflow early in the constructor.
+        unsafe {
+            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
+        };
+        Ok(())
+    }
+
+    /// Retrieve a single entry from the region with bounds checking. `offset` is in units of `T`,
+    /// not the number of bytes.
+    pub fn item_from_index(&self, offset: usize) -> Result<*mut T> {
+        if offset >= self.count {
+            return Err(EINVAL);
+        }
+        // SAFETY:
+        // - The pointer is valid due to type invariant on `CoherentAllocation`
+        // and we've just checked that the range and index is within bounds.
+        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
+        // that `self.count` won't overflow early in the constructor.
+        Ok(unsafe { &mut *self.cpu_addr.add(offset) })
+    }
+
+    /// Reads the value of `field` and ensures that its type is `FromBytes`
+    ///
+    /// # Safety:
+    ///
+    /// This must be called from the `dma_read` macro which ensures that the `field` pointer is
+    /// validated beforehand.
+    ///
+    /// Public but hidden since it should only be used from `dma_read` macro.
+    #[doc(hidden)]
+    pub unsafe fn field_read<F: FromBytes>(&self, field: *const F) -> F {
+        // SAFETY: By the safety requirements field is valid
+        unsafe { field.read() }
+    }
+
+    /// Writes a value to `field` and ensures that its type is `AsBytes`
+    ///
+    /// # Safety:
+    ///
+    /// This must be called from the `dma_write` macro which ensures that the `field` pointer is
+    /// validated beforehand.
+    ///
+    /// Public but hidden since it should only be used from `dma_write` macro.
+    #[doc(hidden)]
+    pub unsafe fn field_write<F: AsBytes>(&self, field: *mut F, val: F) {
+        // SAFETY: By the safety requirements field is valid
+        unsafe { field.write(val) }
+    }
+}
+
+/// Reads a field of an item from an allocated region of structs.
+/// # Examples
+///
+/// ```
+/// struct MyStruct { field: u32, }
+/// // SAFETY: All bit patterns are acceptable values for MyStruct.
+/// unsafe impl kernel::transmute::FromBytes for MyStruct{};
+/// // SAFETY: Instances of MyStruct have no uninitialized portions.
+/// unsafe impl kernel::transmute::AsBytes for MyStruct{};
+///
+/// # fn test(alloc: &kernel::dma::CoherentAllocation<MyStruct>) -> Result {
+/// let whole = kernel::dma_read!(alloc[2]);
+/// let field = kernel::dma_read!(alloc[1].field);
+/// # Ok::<(), Error>(()) }
+/// ```
+#[macro_export]
+macro_rules! dma_read {
+    ($dma:ident [ $idx:expr ] $($field:tt)* ) => {{
+        let item = $dma.item_from_index($idx)?;
+        // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
+        // dereferenced. The compiler also further validates the expression on whether `field`
+        // is a member of `item` when expanded by the macro.
+        unsafe {
+            let ptr_field = ::core::ptr::addr_of!((*item) $($field)*);
+            $dma.field_read(ptr_field)
+        }
+    }};
+}
+
+/// Writes to a field of an item from an allocated region of structs.
+/// # Examples
+///
+/// ```
+/// struct MyStruct { member: u32, }
+/// // SAFETY: All bit patterns are acceptable values for MyStruct.
+/// unsafe impl kernel::transmute::FromBytes for MyStruct{};
+/// // SAFETY: Instances of MyStruct have no uninitialized portions.
+/// unsafe impl kernel::transmute::AsBytes for MyStruct{};
+///
+/// # fn test(alloc: &mut kernel::dma::CoherentAllocation<MyStruct>) -> Result {
+/// kernel::dma_write!(alloc[2].member = 0xf);
+/// kernel::dma_write!(alloc[1] = MyStruct { member: 0xf });
+/// # Ok::<(), Error>(()) }
+/// ```
+#[macro_export]
+macro_rules! dma_write {
+    ($dma:ident [ $idx:expr ] $($field:tt)*) => {{
+        kernel::dma_write!($dma, $idx, $($field)*);
+    }};
+    ($dma:ident, $idx: expr, = $val:expr) => {
+        let item = $dma.item_from_index($idx)?;
+        // SAFETY: `item_from_index` ensures that `item` is always a valid item.
+        unsafe { $dma.field_write(item, $val) }
+    };
+    ($dma:ident, $idx: expr, $(.$field:ident)* = $val:expr) => {
+        let item = $dma.item_from_index($idx)?;
+        // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
+        // dereferenced. The compiler also further validates the expression on whether `field`
+        // is a member of `item` when expanded by the macro.
+        unsafe {
+            let ptr_field = ::core::ptr::addr_of_mut!((*item) $(.$field)*);
+            $dma.field_write(ptr_field, $val)
+        }
+    };
+}
+
+impl<T: AsBytes + FromBytes> Drop for CoherentAllocation<T> {
+    fn drop(&mut self) {
+        let size = self.count * core::mem::size_of::<T>();
+        // SAFETY: the device, cpu address, and the dma handle is valid due to the
+        // type invariants on `CoherentAllocation`.
+        unsafe {
+            bindings::dma_free_attrs(
+                self.dev.as_raw(),
+                size,
+                self.cpu_addr as _,
+                self.dma_handle,
+                self.dma_attrs.as_raw(),
+            )
+        }
+    }
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 496ed32b0911..5081cb66b2f9 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -44,6 +44,7 @@
 pub mod device;
 pub mod device_id;
 pub mod devres;
+pub mod dma;
 pub mod driver;
 pub mod error;
 #[cfg(CONFIG_RUST_FW_LOADER_ABSTRACTIONS)]
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v12 3/3] MAINTAINERS: add entry for Rust dma mapping helpers device driver API
  2025-02-24 11:49 [PATCH v12 0/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
  2025-02-24 11:49 ` [PATCH v12 1/3] rust: error: Add EOVERFLOW Abdiel Janulgue
  2025-02-24 11:49 ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
@ 2025-02-24 11:49 ` Abdiel Janulgue
  2025-02-24 13:10   ` Andreas Hindborg
  2 siblings, 1 reply; 70+ messages in thread
From: Abdiel Janulgue @ 2025-02-24 11:49 UTC (permalink / raw)
  To: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS,
	Abdiel Janulgue

Add an entry for the Rust dma mapping helpers abstractions.

Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
---
 MAINTAINERS | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c8d9e8187eb0..3bf130c0502c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6900,6 +6900,18 @@ F:	include/linux/dma-mapping.h
 F:	include/linux/swiotlb.h
 F:	kernel/dma/
 
+DMA MAPPING HELPERS DEVICE DRIVER API [RUST]
+M:	Abdiel Janulgue <abdiel.janulgue@gmail.com>
+M:	Danilo Krummrich <dakr@kernel.org>
+R:	Daniel Almeida <daniel.almeida@collabora.com>
+R:	Robin Murphy <robin.murphy@arm.com>
+R:	Andreas Hindborg <a.hindborg@kernel.org>
+L:	rust-for-linux@vger.kernel.org
+S:	Supported
+W:	https://rust-for-linux.com
+T:	git https://github.com/Rust-for-Linux/linux.git rust-next
+F:	rust/kernel/dma.rs
+
 DMA-BUF HEAPS FRAMEWORK
 M:	Sumit Semwal <sumit.semwal@linaro.org>
 R:	Benjamin Gaignard <benjamin.gaignard@collabora.com>
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 3/3] MAINTAINERS: add entry for Rust dma mapping helpers device driver API
  2025-02-24 11:49 ` [PATCH v12 3/3] MAINTAINERS: add entry for Rust dma mapping helpers device driver API Abdiel Janulgue
@ 2025-02-24 13:10   ` Andreas Hindborg
  0 siblings, 0 replies; 70+ messages in thread
From: Andreas Hindborg @ 2025-02-24 13:10 UTC (permalink / raw)
  To: Abdiel Janulgue
  Cc: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, Valentin Obst,
	linux-kernel, Christoph Hellwig, Marek Szyprowski, airlied, iommu

"Abdiel Janulgue" <abdiel.janulgue@gmail.com> writes:

> Add an entry for the Rust dma mapping helpers abstractions.
>
> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>

Acked-by: Andreas Hindborg <a.hindborg@kernel.org>


Best regards,
Andreas Hindborg



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 1/3] rust: error: Add EOVERFLOW
  2025-02-24 11:49 ` [PATCH v12 1/3] rust: error: Add EOVERFLOW Abdiel Janulgue
@ 2025-02-24 13:11   ` Andreas Hindborg
  0 siblings, 0 replies; 70+ messages in thread
From: Andreas Hindborg @ 2025-02-24 13:11 UTC (permalink / raw)
  To: Abdiel Janulgue
  Cc: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, Valentin Obst,
	linux-kernel, Christoph Hellwig, Marek Szyprowski, airlied, iommu

"Abdiel Janulgue" <abdiel.janulgue@gmail.com> writes:

> Trivial addition for missing EOVERFLOW error. This is used by a
> subsequent patch that might require returning EOVERFLOW as a result
> of `checked_mul`.
>
> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
> Reviewed-by: Alice Ryhl <aliceryhl@google.com>

Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>


Best regards,
Andreas Hindborg




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 11:49 ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
@ 2025-02-24 13:21   ` Alice Ryhl
  2025-02-24 16:27     ` Abdiel Janulgue
  2025-02-24 13:30   ` QUENTIN BOYER
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 70+ messages in thread
From: Alice Ryhl @ 2025-02-24 13:21 UTC (permalink / raw)
  To: Abdiel Janulgue
  Cc: dakr, robin.murphy, daniel.almeida, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Valentin Obst,
	open list, Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

On Mon, Feb 24, 2025 at 12:50 PM Abdiel Janulgue
<abdiel.janulgue@gmail.com> wrote:
>
> Add a simple dma coherent allocator rust abstraction. Based on
> Andreas Hindborg's dma abstractions from the rnvme driver, which
> was also based on earlier work by Wedson Almeida Filho.
>
> Nacked-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>

> +    /// Create a duplicate of the `CoherentAllocation` object but prevent it from being dropped.
> +    pub fn skip_drop(self) -> CoherentAllocation<T> {
> +        let me = core::mem::ManuallyDrop::new(self);
> +        Self {
> +            // SAFETY: The refcount of `dev` will not be decremented because this doesn't actually
> +            // duplicafe `ARef` and the use of `ManuallyDrop` forgets the originals.
> +            dev: unsafe { core::ptr::read(&me.dev) },
> +            dma_handle: me.dma_handle,
> +            count: me.count,
> +            cpu_addr: me.cpu_addr,
> +            dma_attrs: me.dma_attrs,
> +        }
> +    }

The skip_drop pattern requires the return value to use a different
struct with the same fields, because otherwise you don't really skip
the destructor. But I don't think you have the user for this method
anymore so maybe just drop it.

> +    /// Retrieve a single entry from the region with bounds checking. `offset` is in units of `T`,
> +    /// not the number of bytes.
> +    pub fn item_from_index(&self, offset: usize) -> Result<*mut T> {
> +        if offset >= self.count {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY:
> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
> +        // and we've just checked that the range and index is within bounds.
> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> +        // that `self.count` won't overflow early in the constructor.
> +        Ok(unsafe { &mut *self.cpu_addr.add(offset) })

The point of the dma_read/dma_write macros is to avoid creating
references to the dma memory, so don't create a reference here.

Ok(unsafe { self.cpu_addr.add(offset) })

Alice

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 11:49 ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
  2025-02-24 13:21   ` Alice Ryhl
@ 2025-02-24 13:30   ` QUENTIN BOYER
  2025-02-24 16:30     ` Abdiel Janulgue
  2025-02-24 14:40   ` Andreas Hindborg
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 70+ messages in thread
From: QUENTIN BOYER @ 2025-02-24 13:30 UTC (permalink / raw)
  To: Abdiel Janulgue, aliceryhl@google.com, dakr@kernel.org,
	robin.murphy@arm.com, daniel.almeida@collabora.com,
	rust-for-linux@vger.kernel.org
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied@redhat.com,
	open list:DMA MAPPING HELPERS

Wouldn't it be safer if the `as_slice_mut` function took a `&mut self`,
allowing the compiler to correctly check the borrows (like `start_ptr_mut`)

Quentin

On Mon Feb 24, 2025 at 12:49 PM CET, Abdiel Janulgue wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>
>
> Add a simple dma coherent allocator rust abstraction. Based on
> Andreas Hindborg's dma abstractions from the rnvme driver, which
> was also based on earlier work by Wedson Almeida Filho.
>
> Nacked-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
> ---
>  rust/bindings/bindings_helper.h |   1 +
>  rust/helpers/dma.c              |  13 +
>  rust/helpers/helpers.c          |   1 +
>  rust/kernel/dma.rs              | 421 ++++++++++++++++++++++++++++++++
>  rust/kernel/lib.rs              |   1 +
>  5 files changed, 437 insertions(+)
>  create mode 100644 rust/helpers/dma.c
>  create mode 100644 rust/kernel/dma.rs
>
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 55354e4dec14..f69b05025e52 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -11,6 +11,7 @@
>  #include <linux/blk_types.h>
>  #include <linux/blkdev.h>
>  #include <linux/cred.h>
> +#include <linux/dma-mapping.h>
>  #include <linux/errname.h>
>  #include <linux/ethtool.h>
>  #include <linux/file.h>
> diff --git a/rust/helpers/dma.c b/rust/helpers/dma.c
> new file mode 100644
> index 000000000000..30da079d366c
> --- /dev/null
> +++ b/rust/helpers/dma.c
> @@ -0,0 +1,13 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/dma-mapping.h>
> +
> +int rust_helper_dma_set_mask_and_coherent(struct device *dev, u64 mask)
> +{
> +       return dma_set_mask_and_coherent(dev, mask);
> +}
> +
> +int rust_helper_dma_set_mask(struct device *dev, u64 mask)
> +{
> +       return dma_set_mask(dev, mask);
> +}
> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> index 0640b7e115be..8f3808c8b7fe 100644
> --- a/rust/helpers/helpers.c
> +++ b/rust/helpers/helpers.c
> @@ -13,6 +13,7 @@
>  #include "build_bug.c"
>  #include "cred.c"
>  #include "device.c"
> +#include "dma.c"
>  #include "err.c"
>  #include "fs.c"
>  #include "io.c"
> diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
> new file mode 100644
> index 000000000000..b4dd5d411711
> --- /dev/null
> +++ b/rust/kernel/dma.rs
> @@ -0,0 +1,421 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Direct memory access (DMA).
> +//!
> +//! C header: [`include/linux/dma-mapping.h`](srctree/include/linux/dma-mapping.h)
> +
> +use crate::{
> +    bindings, build_assert,
> +    device::Device,
> +    error::code::*,
> +    error::Result,
> +    transmute::{AsBytes, FromBytes},
> +    types::ARef,
> +};
> +
> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
> +/// both streaming and coherent APIs together.
> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
> +}
> +
> +/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
> +pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
> +}
> +
> +/// Possible attributes associated with a DMA mapping.
> +///
> +/// They can be combined with the operators `|`, `&`, and `!`.
> +///
> +/// Values can be used from the [`attrs`] module.
> +#[derive(Clone, Copy, PartialEq)]
> +#[repr(transparent)]
> +pub struct Attrs(u32);
> +
> +impl Attrs {
> +    /// Get the raw representation of this attribute.
> +    pub(crate) fn as_raw(self) -> crate::ffi::c_ulong {
> +        self.0 as _
> +    }
> +
> +    /// Check whether `flags` is contained in `self`.
> +    pub fn contains(self, flags: Attrs) -> bool {
> +        (self & flags) == flags
> +    }
> +}
> +
> +impl core::ops::BitOr for Attrs {
> +    type Output = Self;
> +    fn bitor(self, rhs: Self) -> Self::Output {
> +        Self(self.0 | rhs.0)
> +    }
> +}
> +
> +impl core::ops::BitAnd for Attrs {
> +    type Output = Self;
> +    fn bitand(self, rhs: Self) -> Self::Output {
> +        Self(self.0 & rhs.0)
> +    }
> +}
> +
> +impl core::ops::Not for Attrs {
> +    type Output = Self;
> +    fn not(self) -> Self::Output {
> +        Self(!self.0)
> +    }
> +}
> +
> +/// DMA mapping attrributes.
> +pub mod attrs {
> +    use super::Attrs;
> +
> +    /// Specifies that reads and writes to the mapping may be weakly ordered, that is that reads
> +    /// and writes may pass each other.
> +    pub const DMA_ATTR_WEAK_ORDERING: Attrs = Attrs(bindings::DMA_ATTR_WEAK_ORDERING);
> +
> +    /// Specifies that writes to the mapping may be buffered to improve performance.
> +    pub const DMA_ATTR_WRITE_COMBINE: Attrs = Attrs(bindings::DMA_ATTR_WRITE_COMBINE);
> +
> +    /// Lets the platform to avoid creating a kernel virtual mapping for the allocated buffer.
> +    pub const DMA_ATTR_NO_KERNEL_MAPPING: Attrs = Attrs(bindings::DMA_ATTR_NO_KERNEL_MAPPING);
> +
> +    /// Allows platform code to skip synchronization of the CPU cache for the given buffer assuming
> +    /// that it has been already transferred to 'device' domain.
> +    pub const DMA_ATTR_SKIP_CPU_SYNC: Attrs = Attrs(bindings::DMA_ATTR_SKIP_CPU_SYNC);
> +
> +    /// Forces contiguous allocation of the buffer in physical memory.
> +    pub const DMA_ATTR_FORCE_CONTIGUOUS: Attrs = Attrs(bindings::DMA_ATTR_FORCE_CONTIGUOUS);
> +
> +    /// This is a hint to the DMA-mapping subsystem that it's probably not worth the time to try
> +    /// to allocate memory to in a way that gives better TLB efficiency.
> +    pub const DMA_ATTR_ALLOC_SINGLE_PAGES: Attrs = Attrs(bindings::DMA_ATTR_ALLOC_SINGLE_PAGES);
> +
> +    /// This tells the DMA-mapping subsystem to suppress allocation failure reports (similarly to
> +    /// __GFP_NOWARN).
> +    pub const DMA_ATTR_NO_WARN: Attrs = Attrs(bindings::DMA_ATTR_NO_WARN);
> +
> +    /// Used to indicate that the buffer is fully accessible at an elevated privilege level (and
> +    /// ideally inaccessible or at least read-only at lesser-privileged levels).
> +    pub const DMA_ATTR_PRIVILEGED: Attrs = Attrs(bindings::DMA_ATTR_PRIVILEGED);
> +}
> +
> +/// An abstraction of the `dma_alloc_coherent` API.
> +///
> +/// This is an abstraction around the `dma_alloc_coherent` API which is used to allocate and map
> +/// large consistent DMA regions.
> +///
> +/// A [`CoherentAllocation`] instance contains a pointer to the allocated region (in the
> +/// processor's virtual address space) and the device address which can be given to the device
> +/// as the DMA address base of the region. The region is released once [`CoherentAllocation`]
> +/// is dropped.
> +///
> +/// # Invariants
> +///
> +/// For the lifetime of an instance of [`CoherentAllocation`], the cpu address is a valid pointer
> +/// to an allocated region of consistent memory and we hold a reference to the device.
> +pub struct CoherentAllocation<T: AsBytes + FromBytes> {
> +    dev: ARef<Device>,
> +    dma_handle: bindings::dma_addr_t,
> +    count: usize,
> +    cpu_addr: *mut T,
> +    dma_attrs: Attrs,
> +}
> +
> +impl<T: AsBytes + FromBytes> CoherentAllocation<T> {
> +    /// Allocates a region of `size_of::<T> * count` of consistent memory.
> +    ///
> +    /// # Examples
> +    ///
> +    /// ```
> +    /// use kernel::device::Device;
> +    /// use kernel::dma::{attrs::*, CoherentAllocation};
> +    ///
> +    /// # fn test(dev: &Device) -> Result {
> +    /// let c: CoherentAllocation<u64> = CoherentAllocation::alloc_attrs(dev.into(), 4, GFP_KERNEL,
> +    ///                                                                  DMA_ATTR_NO_WARN)?;
> +    /// # Ok::<(), Error>(()) }
> +    /// ```
> +    pub fn alloc_attrs(
> +        dev: ARef<Device>,
> +        count: usize,
> +        gfp_flags: kernel::alloc::Flags,
> +        dma_attrs: Attrs,
> +    ) -> Result<CoherentAllocation<T>> {
> +        build_assert!(
> +            core::mem::size_of::<T>() > 0,
> +            "It doesn't make sense for the allocated type to be a ZST"
> +        );
> +
> +        let size = count
> +            .checked_mul(core::mem::size_of::<T>())
> +            .ok_or(EOVERFLOW)?;
> +        let mut dma_handle = 0;
> +        // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +        // We ensure that we catch the failure on this function and throw an ENOMEM
> +        let ret = unsafe {
> +            bindings::dma_alloc_attrs(
> +                dev.as_raw(),
> +                size,
> +                &mut dma_handle,
> +                gfp_flags.as_raw(),
> +                dma_attrs.as_raw(),
> +            )
> +        };
> +        if ret.is_null() {
> +            return Err(ENOMEM);
> +        }
> +        // INVARIANT: We just successfully allocated a coherent region which is accessible for
> +        // `count` elements, hence the cpu address is valid. We also hold a refcounted reference
> +        // to the device.
> +        Ok(Self {
> +            dev,
> +            dma_handle,
> +            count,
> +            cpu_addr: ret as *mut T,
> +            dma_attrs,
> +        })
> +    }
> +
> +    /// Performs the same functionality as `alloc_attrs`, except the `dma_attrs` is 0 by default.
> +    pub fn alloc_coherent(
> +        dev: ARef<Device>,
> +        count: usize,
> +        gfp_flags: kernel::alloc::Flags,
> +    ) -> Result<CoherentAllocation<T>> {
> +        CoherentAllocation::alloc_attrs(dev, count, gfp_flags, Attrs(0))
> +    }
> +
> +    /// Create a duplicate of the `CoherentAllocation` object but prevent it from being dropped.
> +    pub fn skip_drop(self) -> CoherentAllocation<T> {
> +        let me = core::mem::ManuallyDrop::new(self);
> +        Self {
> +            // SAFETY: The refcount of `dev` will not be decremented because this doesn't actually
> +            // duplicafe `ARef` and the use of `ManuallyDrop` forgets the originals.
> +            dev: unsafe { core::ptr::read(&me.dev) },
> +            dma_handle: me.dma_handle,
> +            count: me.count,
> +            cpu_addr: me.cpu_addr,
> +            dma_attrs: me.dma_attrs,
> +        }
> +    }
> +
> +    /// Returns the base address to the allocated region in the CPU's virtual address space.
> +    pub fn start_ptr(&self) -> *const T {
> +        self.cpu_addr
> +    }
> +
> +    /// Returns the base address to the allocated region in the CPU's virtual address space as
> +    /// a mutable pointer.
> +    pub fn start_ptr_mut(&mut self) -> *mut T {
> +        self.cpu_addr
> +    }
> +
> +    /// Returns a DMA handle which may given to the device as the DMA address base of
> +    /// the region.
> +    pub fn dma_handle(&self) -> bindings::dma_addr_t {
> +        self.dma_handle
> +    }
> +
> +    /// Returns the data from the region starting from `offset` as a slice.
> +    /// `offset` and `count` are in units of `T`, not the number of bytes.
> +    ///
> +    /// Due to the safety requirements of slice, the caller should consider that the region could
> +    /// be modified by the device at anytime (see the safety block below). For ringbuffer type of
> +    /// r/w access or use-cases where the pointer to the live data is needed, `start_ptr()` or
> +    /// `start_ptr_mut()` could be used instead.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Callers must ensure that no hardware operations that involve the buffer are currently
> +    /// taking place while the returned slice is live.
> +    pub unsafe fn as_slice(&self, offset: usize, count: usize) -> Result<&[T]> {
> +        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
> +        if end >= self.count {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY:
> +        // - The pointer is valid due to type invariant on `CoherentAllocation`,
> +        // we've just checked that the range and index is within bounds. The immutability of the
> +        // of data is also guaranteed by the safety requirements of the function.
> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> +        // that `self.count` won't overflow early in the constructor.
> +        Ok(unsafe { core::slice::from_raw_parts(self.cpu_addr.add(offset), count) })
> +    }
> +
> +    /// Performs the same functionality as `as_slice`, except that a mutable slice is returned.
> +    /// See that method for documentation and safety requirements.
> +    ///
> +    /// # Safety
> +    ///
> +    /// It is the callers responsibility to avoid separate read and write accesses to the region
> +    /// while the returned slice is live.
> +    pub unsafe fn as_slice_mut(&self, offset: usize, count: usize) -> Result<&mut [T]> {
> +        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
> +        if end >= self.count {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY:
> +        // - The pointer is valid due to type invariant on `CoherentAllocation`,
> +        // we've just checked that the range and index is within bounds. The immutability of the
> +        // of data is also guaranteed by the safety requirements of the function.
> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> +        // that `self.count` won't overflow early in the constructor.
> +        Ok(unsafe { core::slice::from_raw_parts_mut(self.cpu_addr.add(offset), count) })
> +    }
> +
> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
> +    /// number of bytes.
> +    ///
> +    /// # Examples
> +    ///
> +    /// ```
> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
> +    /// let somedata: [u8; 4] = [0xf; 4];
> +    /// let buf: &[u8] = &somedata;
> +    /// alloc.write(buf, 0)?;
> +    /// # Ok::<(), Error>(()) }
> +    /// ```
> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
> +        if end >= self.count {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY:
> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
> +        // and we've just checked that the range and index is within bounds.
> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> +        // that `self.count` won't overflow early in the constructor.
> +        unsafe {
> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
> +        };
> +        Ok(())
> +    }
> +
> +    /// Retrieve a single entry from the region with bounds checking. `offset` is in units of `T`,
> +    /// not the number of bytes.
> +    pub fn item_from_index(&self, offset: usize) -> Result<*mut T> {
> +        if offset >= self.count {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY:
> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
> +        // and we've just checked that the range and index is within bounds.
> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> +        // that `self.count` won't overflow early in the constructor.
> +        Ok(unsafe { &mut *self.cpu_addr.add(offset) })
> +    }
> +
> +    /// Reads the value of `field` and ensures that its type is `FromBytes`
> +    ///
> +    /// # Safety:
> +    ///
> +    /// This must be called from the `dma_read` macro which ensures that the `field` pointer is
> +    /// validated beforehand.
> +    ///
> +    /// Public but hidden since it should only be used from `dma_read` macro.
> +    #[doc(hidden)]
> +    pub unsafe fn field_read<F: FromBytes>(&self, field: *const F) -> F {
> +        // SAFETY: By the safety requirements field is valid
> +        unsafe { field.read() }
> +    }
> +
> +    /// Writes a value to `field` and ensures that its type is `AsBytes`
> +    ///
> +    /// # Safety:
> +    ///
> +    /// This must be called from the `dma_write` macro which ensures that the `field` pointer is
> +    /// validated beforehand.
> +    ///
> +    /// Public but hidden since it should only be used from `dma_write` macro.
> +    #[doc(hidden)]
> +    pub unsafe fn field_write<F: AsBytes>(&self, field: *mut F, val: F) {
> +        // SAFETY: By the safety requirements field is valid
> +        unsafe { field.write(val) }
> +    }
> +}
> +
> +/// Reads a field of an item from an allocated region of structs.
> +/// # Examples
> +///
> +/// ```
> +/// struct MyStruct { field: u32, }
> +/// // SAFETY: All bit patterns are acceptable values for MyStruct.
> +/// unsafe impl kernel::transmute::FromBytes for MyStruct{};
> +/// // SAFETY: Instances of MyStruct have no uninitialized portions.
> +/// unsafe impl kernel::transmute::AsBytes for MyStruct{};
> +///
> +/// # fn test(alloc: &kernel::dma::CoherentAllocation<MyStruct>) -> Result {
> +/// let whole = kernel::dma_read!(alloc[2]);
> +/// let field = kernel::dma_read!(alloc[1].field);
> +/// # Ok::<(), Error>(()) }
> +/// ```
> +#[macro_export]
> +macro_rules! dma_read {
> +    ($dma:ident [ $idx:expr ] $($field:tt)* ) => {{
> +        let item = $dma.item_from_index($idx)?;
> +        // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
> +        // dereferenced. The compiler also further validates the expression on whether `field`
> +        // is a member of `item` when expanded by the macro.
> +        unsafe {
> +            let ptr_field = ::core::ptr::addr_of!((*item) $($field)*);
> +            $dma.field_read(ptr_field)
> +        }
> +    }};
> +}
> +
> +/// Writes to a field of an item from an allocated region of structs.
> +/// # Examples
> +///
> +/// ```
> +/// struct MyStruct { member: u32, }
> +/// // SAFETY: All bit patterns are acceptable values for MyStruct.
> +/// unsafe impl kernel::transmute::FromBytes for MyStruct{};
> +/// // SAFETY: Instances of MyStruct have no uninitialized portions.
> +/// unsafe impl kernel::transmute::AsBytes for MyStruct{};
> +///
> +/// # fn test(alloc: &mut kernel::dma::CoherentAllocation<MyStruct>) -> Result {
> +/// kernel::dma_write!(alloc[2].member = 0xf);
> +/// kernel::dma_write!(alloc[1] = MyStruct { member: 0xf });
> +/// # Ok::<(), Error>(()) }
> +/// ```
> +#[macro_export]
> +macro_rules! dma_write {
> +    ($dma:ident [ $idx:expr ] $($field:tt)*) => {{
> +        kernel::dma_write!($dma, $idx, $($field)*);
> +    }};
> +    ($dma:ident, $idx: expr, = $val:expr) => {
> +        let item = $dma.item_from_index($idx)?;
> +        // SAFETY: `item_from_index` ensures that `item` is always a valid item.
> +        unsafe { $dma.field_write(item, $val) }
> +    };
> +    ($dma:ident, $idx: expr, $(.$field:ident)* = $val:expr) => {
> +        let item = $dma.item_from_index($idx)?;
> +        // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
> +        // dereferenced. The compiler also further validates the expression on whether `field`
> +        // is a member of `item` when expanded by the macro.
> +        unsafe {
> +            let ptr_field = ::core::ptr::addr_of_mut!((*item) $(.$field)*);
> +            $dma.field_write(ptr_field, $val)
> +        }
> +    };
> +}
> +
> +impl<T: AsBytes + FromBytes> Drop for CoherentAllocation<T> {
> +    fn drop(&mut self) {
> +        let size = self.count * core::mem::size_of::<T>();
> +        // SAFETY: the device, cpu address, and the dma handle is valid due to the
> +        // type invariants on `CoherentAllocation`.
> +        unsafe {
> +            bindings::dma_free_attrs(
> +                self.dev.as_raw(),
> +                size,
> +                self.cpu_addr as _,
> +                self.dma_handle,
> +                self.dma_attrs.as_raw(),
> +            )
> +        }
> +    }
> +}
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index 496ed32b0911..5081cb66b2f9 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -44,6 +44,7 @@
>  pub mod device;
>  pub mod device_id;
>  pub mod devres;
> +pub mod dma;
>  pub mod driver;
>  pub mod error;
>  #[cfg(CONFIG_RUST_FW_LOADER_ABSTRACTIONS)]
> --
> 2.43.0

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 11:49 ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
  2025-02-24 13:21   ` Alice Ryhl
  2025-02-24 13:30   ` QUENTIN BOYER
@ 2025-02-24 14:40   ` Andreas Hindborg
  2025-02-24 16:27     ` Abdiel Janulgue
  2025-02-24 20:07   ` Benno Lossin
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 70+ messages in thread
From: Andreas Hindborg @ 2025-02-24 14:40 UTC (permalink / raw)
  To: Abdiel Janulgue
  Cc: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, Valentin Obst,
	linux-kernel, Christoph Hellwig, Marek Szyprowski, airlied, iommu

"Abdiel Janulgue" <abdiel.janulgue@gmail.com> writes:

[...]

> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
> +/// both streaming and coherent APIs together.
> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
> +}
> +
> +/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
> +pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
> +}

Sorry if it was asked before, I am late to the party. But would it make
sense to put these to functions on `Device` and make them take `&self`.

> +
> +/// Possible attributes associated with a DMA mapping.
> +///
> +/// They can be combined with the operators `|`, `&`, and `!`.
> +///
> +/// Values can be used from the [`attrs`] module.
> +#[derive(Clone, Copy, PartialEq)]
> +#[repr(transparent)]
> +pub struct Attrs(u32);
> +
> +impl Attrs {
> +    /// Get the raw representation of this attribute.
> +    pub(crate) fn as_raw(self) -> crate::ffi::c_ulong {
> +        self.0 as _
> +    }
> +
> +    /// Check whether `flags` is contained in `self`.
> +    pub fn contains(self, flags: Attrs) -> bool {
> +        (self & flags) == flags
> +    }
> +}
> +
> +impl core::ops::BitOr for Attrs {
> +    type Output = Self;
> +    fn bitor(self, rhs: Self) -> Self::Output {
> +        Self(self.0 | rhs.0)
> +    }
> +}
> +
> +impl core::ops::BitAnd for Attrs {
> +    type Output = Self;
> +    fn bitand(self, rhs: Self) -> Self::Output {
> +        Self(self.0 & rhs.0)
> +    }
> +}
> +
> +impl core::ops::Not for Attrs {
> +    type Output = Self;
> +    fn not(self) -> Self::Output {
> +        Self(!self.0)
> +    }
> +}
> +
> +/// DMA mapping attrributes.

Typo in attributes.


Best regards,
Andreas Hindborg



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 13:21   ` Alice Ryhl
@ 2025-02-24 16:27     ` Abdiel Janulgue
  0 siblings, 0 replies; 70+ messages in thread
From: Abdiel Janulgue @ 2025-02-24 16:27 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: dakr, robin.murphy, daniel.almeida, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Valentin Obst,
	open list, Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS



On 24/02/2025 15:21, Alice Ryhl wrote:
> On Mon, Feb 24, 2025 at 12:50 PM Abdiel Janulgue
> <abdiel.janulgue@gmail.com> wrote:
>>
>> Add a simple dma coherent allocator rust abstraction. Based on
>> Andreas Hindborg's dma abstractions from the rnvme driver, which
>> was also based on earlier work by Wedson Almeida Filho.
>>
>> Nacked-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
> 
>> +    /// Create a duplicate of the `CoherentAllocation` object but prevent it from being dropped.
>> +    pub fn skip_drop(self) -> CoherentAllocation<T> {
>> +        let me = core::mem::ManuallyDrop::new(self);
>> +        Self {
>> +            // SAFETY: The refcount of `dev` will not be decremented because this doesn't actually
>> +            // duplicafe `ARef` and the use of `ManuallyDrop` forgets the originals.
>> +            dev: unsafe { core::ptr::read(&me.dev) },
>> +            dma_handle: me.dma_handle,
>> +            count: me.count,
>> +            cpu_addr: me.cpu_addr,
>> +            dma_attrs: me.dma_attrs,
>> +        }
>> +    }
> 
> The skip_drop pattern requires the return value to use a different
> struct with the same fields, because otherwise you don't really skip
> the destructor. But I don't think you have the user for this method
> anymore so maybe just drop it.

Ah, yep. I agree.

> 
>> +    /// Retrieve a single entry from the region with bounds checking. `offset` is in units of `T`,
>> +    /// not the number of bytes.
>> +    pub fn item_from_index(&self, offset: usize) -> Result<*mut T> {
>> +        if offset >= self.count {
>> +            return Err(EINVAL);
>> +        }
>> +        // SAFETY:
>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
>> +        // and we've just checked that the range and index is within bounds.
>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>> +        // that `self.count` won't overflow early in the constructor.
>> +        Ok(unsafe { &mut *self.cpu_addr.add(offset) })
> 
> The point of the dma_read/dma_write macros is to avoid creating
> references to the dma memory, so don't create a reference here.
> 

This is embarrassing, I thought I had changed this already in the patch. 
My local tree had this already fixed. Anyways thanks for catching this!

/Abdiel


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 14:40   ` Andreas Hindborg
@ 2025-02-24 16:27     ` Abdiel Janulgue
  2025-02-24 22:35       ` Daniel Almeida
  2025-02-28  8:35       ` Alexandre Courbot
  0 siblings, 2 replies; 70+ messages in thread
From: Abdiel Janulgue @ 2025-02-24 16:27 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, Valentin Obst,
	linux-kernel, Christoph Hellwig, Marek Szyprowski, airlied, iommu


On 24/02/2025 16:40, Andreas Hindborg wrote:
> "Abdiel Janulgue" <abdiel.janulgue@gmail.com> writes:
> 
> [...]
> 
>> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
>> +/// both streaming and coherent APIs together.
>> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
>> +    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
>> +}
>> +
>> +/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
>> +pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
>> +    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
>> +}
> 
> Sorry if it was asked before, I am late to the party. But would it make
> sense to put these to functions on `Device` and make them take `&self`.

Thanks for checking this. The API is about the dma addressing 
capabalities of the device, my thoughts would be to group them with the 
rest of the dma API? But either way, I don't have a strong preference. 
I'll let others comment.

Daniel, Danilo?

Regards,
Abdiel

> 
>> +
>> +/// Possible attributes associated with a DMA mapping.
>> +///
>> +/// They can be combined with the operators `|`, `&`, and `!`.
>> +///
>> +/// Values can be used from the [`attrs`] module.
>> +#[derive(Clone, Copy, PartialEq)]
>> +#[repr(transparent)]
>> +pub struct Attrs(u32);
>> +
>> +impl Attrs {
>> +    /// Get the raw representation of this attribute.
>> +    pub(crate) fn as_raw(self) -> crate::ffi::c_ulong {
>> +        self.0 as _
>> +    }
>> +
>> +    /// Check whether `flags` is contained in `self`.
>> +    pub fn contains(self, flags: Attrs) -> bool {
>> +        (self & flags) == flags
>> +    }
>> +}
>> +
>> +impl core::ops::BitOr for Attrs {
>> +    type Output = Self;
>> +    fn bitor(self, rhs: Self) -> Self::Output {
>> +        Self(self.0 | rhs.0)
>> +    }
>> +}
>> +
>> +impl core::ops::BitAnd for Attrs {
>> +    type Output = Self;
>> +    fn bitand(self, rhs: Self) -> Self::Output {
>> +        Self(self.0 & rhs.0)
>> +    }
>> +}
>> +
>> +impl core::ops::Not for Attrs {
>> +    type Output = Self;
>> +    fn not(self) -> Self::Output {
>> +        Self(!self.0)
>> +    }
>> +}
>> +
>> +/// DMA mapping attrributes.
> 
> Typo in attributes.
> 
> 
> Best regards,
> Andreas Hindborg
> 
> 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 13:30   ` QUENTIN BOYER
@ 2025-02-24 16:30     ` Abdiel Janulgue
  0 siblings, 0 replies; 70+ messages in thread
From: Abdiel Janulgue @ 2025-02-24 16:30 UTC (permalink / raw)
  To: QUENTIN BOYER, aliceryhl@google.com, dakr@kernel.org,
	robin.murphy@arm.com, daniel.almeida@collabora.com,
	rust-for-linux@vger.kernel.org
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied@redhat.com,
	open list:DMA MAPPING HELPERS



On 24/02/2025 15:30, QUENTIN BOYER wrote:
> Wouldn't it be safer if the `as_slice_mut` function took a `&mut self`,
> allowing the compiler to correctly check the borrows (like `start_ptr_mut`)
> 
> Quentin

Yes that would make sense. Appreciate the feedback!

Regards,
Abdiel

> 
> On Mon Feb 24, 2025 at 12:49 PM CET, Abdiel Janulgue wrote:
>> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>>
>>
>> Add a simple dma coherent allocator rust abstraction. Based on
>> Andreas Hindborg's dma abstractions from the rnvme driver, which
>> was also based on earlier work by Wedson Almeida Filho.
>>
>> Nacked-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
>> ---
>>   rust/bindings/bindings_helper.h |   1 +
>>   rust/helpers/dma.c              |  13 +
>>   rust/helpers/helpers.c          |   1 +
>>   rust/kernel/dma.rs              | 421 ++++++++++++++++++++++++++++++++
>>   rust/kernel/lib.rs              |   1 +
>>   5 files changed, 437 insertions(+)
>>   create mode 100644 rust/helpers/dma.c
>>   create mode 100644 rust/kernel/dma.rs
>>
>> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
>> index 55354e4dec14..f69b05025e52 100644
>> --- a/rust/bindings/bindings_helper.h
>> +++ b/rust/bindings/bindings_helper.h
>> @@ -11,6 +11,7 @@
>>   #include <linux/blk_types.h>
>>   #include <linux/blkdev.h>
>>   #include <linux/cred.h>
>> +#include <linux/dma-mapping.h>
>>   #include <linux/errname.h>
>>   #include <linux/ethtool.h>
>>   #include <linux/file.h>
>> diff --git a/rust/helpers/dma.c b/rust/helpers/dma.c
>> new file mode 100644
>> index 000000000000..30da079d366c
>> --- /dev/null
>> +++ b/rust/helpers/dma.c
>> @@ -0,0 +1,13 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include <linux/dma-mapping.h>
>> +
>> +int rust_helper_dma_set_mask_and_coherent(struct device *dev, u64 mask)
>> +{
>> +       return dma_set_mask_and_coherent(dev, mask);
>> +}
>> +
>> +int rust_helper_dma_set_mask(struct device *dev, u64 mask)
>> +{
>> +       return dma_set_mask(dev, mask);
>> +}
>> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
>> index 0640b7e115be..8f3808c8b7fe 100644
>> --- a/rust/helpers/helpers.c
>> +++ b/rust/helpers/helpers.c
>> @@ -13,6 +13,7 @@
>>   #include "build_bug.c"
>>   #include "cred.c"
>>   #include "device.c"
>> +#include "dma.c"
>>   #include "err.c"
>>   #include "fs.c"
>>   #include "io.c"
>> diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
>> new file mode 100644
>> index 000000000000..b4dd5d411711
>> --- /dev/null
>> +++ b/rust/kernel/dma.rs
>> @@ -0,0 +1,421 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Direct memory access (DMA).
>> +//!
>> +//! C header: [`include/linux/dma-mapping.h`](srctree/include/linux/dma-mapping.h)
>> +
>> +use crate::{
>> +    bindings, build_assert,
>> +    device::Device,
>> +    error::code::*,
>> +    error::Result,
>> +    transmute::{AsBytes, FromBytes},
>> +    types::ARef,
>> +};
>> +
>> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
>> +/// both streaming and coherent APIs together.
>> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
>> +    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
>> +}
>> +
>> +/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
>> +pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
>> +    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
>> +}
>> +
>> +/// Possible attributes associated with a DMA mapping.
>> +///
>> +/// They can be combined with the operators `|`, `&`, and `!`.
>> +///
>> +/// Values can be used from the [`attrs`] module.
>> +#[derive(Clone, Copy, PartialEq)]
>> +#[repr(transparent)]
>> +pub struct Attrs(u32);
>> +
>> +impl Attrs {
>> +    /// Get the raw representation of this attribute.
>> +    pub(crate) fn as_raw(self) -> crate::ffi::c_ulong {
>> +        self.0 as _
>> +    }
>> +
>> +    /// Check whether `flags` is contained in `self`.
>> +    pub fn contains(self, flags: Attrs) -> bool {
>> +        (self & flags) == flags
>> +    }
>> +}
>> +
>> +impl core::ops::BitOr for Attrs {
>> +    type Output = Self;
>> +    fn bitor(self, rhs: Self) -> Self::Output {
>> +        Self(self.0 | rhs.0)
>> +    }
>> +}
>> +
>> +impl core::ops::BitAnd for Attrs {
>> +    type Output = Self;
>> +    fn bitand(self, rhs: Self) -> Self::Output {
>> +        Self(self.0 & rhs.0)
>> +    }
>> +}
>> +
>> +impl core::ops::Not for Attrs {
>> +    type Output = Self;
>> +    fn not(self) -> Self::Output {
>> +        Self(!self.0)
>> +    }
>> +}
>> +
>> +/// DMA mapping attrributes.
>> +pub mod attrs {
>> +    use super::Attrs;
>> +
>> +    /// Specifies that reads and writes to the mapping may be weakly ordered, that is that reads
>> +    /// and writes may pass each other.
>> +    pub const DMA_ATTR_WEAK_ORDERING: Attrs = Attrs(bindings::DMA_ATTR_WEAK_ORDERING);
>> +
>> +    /// Specifies that writes to the mapping may be buffered to improve performance.
>> +    pub const DMA_ATTR_WRITE_COMBINE: Attrs = Attrs(bindings::DMA_ATTR_WRITE_COMBINE);
>> +
>> +    /// Lets the platform to avoid creating a kernel virtual mapping for the allocated buffer.
>> +    pub const DMA_ATTR_NO_KERNEL_MAPPING: Attrs = Attrs(bindings::DMA_ATTR_NO_KERNEL_MAPPING);
>> +
>> +    /// Allows platform code to skip synchronization of the CPU cache for the given buffer assuming
>> +    /// that it has been already transferred to 'device' domain.
>> +    pub const DMA_ATTR_SKIP_CPU_SYNC: Attrs = Attrs(bindings::DMA_ATTR_SKIP_CPU_SYNC);
>> +
>> +    /// Forces contiguous allocation of the buffer in physical memory.
>> +    pub const DMA_ATTR_FORCE_CONTIGUOUS: Attrs = Attrs(bindings::DMA_ATTR_FORCE_CONTIGUOUS);
>> +
>> +    /// This is a hint to the DMA-mapping subsystem that it's probably not worth the time to try
>> +    /// to allocate memory to in a way that gives better TLB efficiency.
>> +    pub const DMA_ATTR_ALLOC_SINGLE_PAGES: Attrs = Attrs(bindings::DMA_ATTR_ALLOC_SINGLE_PAGES);
>> +
>> +    /// This tells the DMA-mapping subsystem to suppress allocation failure reports (similarly to
>> +    /// __GFP_NOWARN).
>> +    pub const DMA_ATTR_NO_WARN: Attrs = Attrs(bindings::DMA_ATTR_NO_WARN);
>> +
>> +    /// Used to indicate that the buffer is fully accessible at an elevated privilege level (and
>> +    /// ideally inaccessible or at least read-only at lesser-privileged levels).
>> +    pub const DMA_ATTR_PRIVILEGED: Attrs = Attrs(bindings::DMA_ATTR_PRIVILEGED);
>> +}
>> +
>> +/// An abstraction of the `dma_alloc_coherent` API.
>> +///
>> +/// This is an abstraction around the `dma_alloc_coherent` API which is used to allocate and map
>> +/// large consistent DMA regions.
>> +///
>> +/// A [`CoherentAllocation`] instance contains a pointer to the allocated region (in the
>> +/// processor's virtual address space) and the device address which can be given to the device
>> +/// as the DMA address base of the region. The region is released once [`CoherentAllocation`]
>> +/// is dropped.
>> +///
>> +/// # Invariants
>> +///
>> +/// For the lifetime of an instance of [`CoherentAllocation`], the cpu address is a valid pointer
>> +/// to an allocated region of consistent memory and we hold a reference to the device.
>> +pub struct CoherentAllocation<T: AsBytes + FromBytes> {
>> +    dev: ARef<Device>,
>> +    dma_handle: bindings::dma_addr_t,
>> +    count: usize,
>> +    cpu_addr: *mut T,
>> +    dma_attrs: Attrs,
>> +}
>> +
>> +impl<T: AsBytes + FromBytes> CoherentAllocation<T> {
>> +    /// Allocates a region of `size_of::<T> * count` of consistent memory.
>> +    ///
>> +    /// # Examples
>> +    ///
>> +    /// ```
>> +    /// use kernel::device::Device;
>> +    /// use kernel::dma::{attrs::*, CoherentAllocation};
>> +    ///
>> +    /// # fn test(dev: &Device) -> Result {
>> +    /// let c: CoherentAllocation<u64> = CoherentAllocation::alloc_attrs(dev.into(), 4, GFP_KERNEL,
>> +    ///                                                                  DMA_ATTR_NO_WARN)?;
>> +    /// # Ok::<(), Error>(()) }
>> +    /// ```
>> +    pub fn alloc_attrs(
>> +        dev: ARef<Device>,
>> +        count: usize,
>> +        gfp_flags: kernel::alloc::Flags,
>> +        dma_attrs: Attrs,
>> +    ) -> Result<CoherentAllocation<T>> {
>> +        build_assert!(
>> +            core::mem::size_of::<T>() > 0,
>> +            "It doesn't make sense for the allocated type to be a ZST"
>> +        );
>> +
>> +        let size = count
>> +            .checked_mul(core::mem::size_of::<T>())
>> +            .ok_or(EOVERFLOW)?;
>> +        let mut dma_handle = 0;
>> +        // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
>> +        // We ensure that we catch the failure on this function and throw an ENOMEM
>> +        let ret = unsafe {
>> +            bindings::dma_alloc_attrs(
>> +                dev.as_raw(),
>> +                size,
>> +                &mut dma_handle,
>> +                gfp_flags.as_raw(),
>> +                dma_attrs.as_raw(),
>> +            )
>> +        };
>> +        if ret.is_null() {
>> +            return Err(ENOMEM);
>> +        }
>> +        // INVARIANT: We just successfully allocated a coherent region which is accessible for
>> +        // `count` elements, hence the cpu address is valid. We also hold a refcounted reference
>> +        // to the device.
>> +        Ok(Self {
>> +            dev,
>> +            dma_handle,
>> +            count,
>> +            cpu_addr: ret as *mut T,
>> +            dma_attrs,
>> +        })
>> +    }
>> +
>> +    /// Performs the same functionality as `alloc_attrs`, except the `dma_attrs` is 0 by default.
>> +    pub fn alloc_coherent(
>> +        dev: ARef<Device>,
>> +        count: usize,
>> +        gfp_flags: kernel::alloc::Flags,
>> +    ) -> Result<CoherentAllocation<T>> {
>> +        CoherentAllocation::alloc_attrs(dev, count, gfp_flags, Attrs(0))
>> +    }
>> +
>> +    /// Create a duplicate of the `CoherentAllocation` object but prevent it from being dropped.
>> +    pub fn skip_drop(self) -> CoherentAllocation<T> {
>> +        let me = core::mem::ManuallyDrop::new(self);
>> +        Self {
>> +            // SAFETY: The refcount of `dev` will not be decremented because this doesn't actually
>> +            // duplicafe `ARef` and the use of `ManuallyDrop` forgets the originals.
>> +            dev: unsafe { core::ptr::read(&me.dev) },
>> +            dma_handle: me.dma_handle,
>> +            count: me.count,
>> +            cpu_addr: me.cpu_addr,
>> +            dma_attrs: me.dma_attrs,
>> +        }
>> +    }
>> +
>> +    /// Returns the base address to the allocated region in the CPU's virtual address space.
>> +    pub fn start_ptr(&self) -> *const T {
>> +        self.cpu_addr
>> +    }
>> +
>> +    /// Returns the base address to the allocated region in the CPU's virtual address space as
>> +    /// a mutable pointer.
>> +    pub fn start_ptr_mut(&mut self) -> *mut T {
>> +        self.cpu_addr
>> +    }
>> +
>> +    /// Returns a DMA handle which may given to the device as the DMA address base of
>> +    /// the region.
>> +    pub fn dma_handle(&self) -> bindings::dma_addr_t {
>> +        self.dma_handle
>> +    }
>> +
>> +    /// Returns the data from the region starting from `offset` as a slice.
>> +    /// `offset` and `count` are in units of `T`, not the number of bytes.
>> +    ///
>> +    /// Due to the safety requirements of slice, the caller should consider that the region could
>> +    /// be modified by the device at anytime (see the safety block below). For ringbuffer type of
>> +    /// r/w access or use-cases where the pointer to the live data is needed, `start_ptr()` or
>> +    /// `start_ptr_mut()` could be used instead.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// Callers must ensure that no hardware operations that involve the buffer are currently
>> +    /// taking place while the returned slice is live.
>> +    pub unsafe fn as_slice(&self, offset: usize, count: usize) -> Result<&[T]> {
>> +        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
>> +        if end >= self.count {
>> +            return Err(EINVAL);
>> +        }
>> +        // SAFETY:
>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`,
>> +        // we've just checked that the range and index is within bounds. The immutability of the
>> +        // of data is also guaranteed by the safety requirements of the function.
>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>> +        // that `self.count` won't overflow early in the constructor.
>> +        Ok(unsafe { core::slice::from_raw_parts(self.cpu_addr.add(offset), count) })
>> +    }
>> +
>> +    /// Performs the same functionality as `as_slice`, except that a mutable slice is returned.
>> +    /// See that method for documentation and safety requirements.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// It is the callers responsibility to avoid separate read and write accesses to the region
>> +    /// while the returned slice is live.
>> +    pub unsafe fn as_slice_mut(&self, offset: usize, count: usize) -> Result<&mut [T]> {
>> +        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
>> +        if end >= self.count {
>> +            return Err(EINVAL);
>> +        }
>> +        // SAFETY:
>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`,
>> +        // we've just checked that the range and index is within bounds. The immutability of the
>> +        // of data is also guaranteed by the safety requirements of the function.
>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>> +        // that `self.count` won't overflow early in the constructor.
>> +        Ok(unsafe { core::slice::from_raw_parts_mut(self.cpu_addr.add(offset), count) })
>> +    }
>> +
>> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
>> +    /// number of bytes.
>> +    ///
>> +    /// # Examples
>> +    ///
>> +    /// ```
>> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
>> +    /// let somedata: [u8; 4] = [0xf; 4];
>> +    /// let buf: &[u8] = &somedata;
>> +    /// alloc.write(buf, 0)?;
>> +    /// # Ok::<(), Error>(()) }
>> +    /// ```
>> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
>> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
>> +        if end >= self.count {
>> +            return Err(EINVAL);
>> +        }
>> +        // SAFETY:
>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
>> +        // and we've just checked that the range and index is within bounds.
>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>> +        // that `self.count` won't overflow early in the constructor.
>> +        unsafe {
>> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
>> +        };
>> +        Ok(())
>> +    }
>> +
>> +    /// Retrieve a single entry from the region with bounds checking. `offset` is in units of `T`,
>> +    /// not the number of bytes.
>> +    pub fn item_from_index(&self, offset: usize) -> Result<*mut T> {
>> +        if offset >= self.count {
>> +            return Err(EINVAL);
>> +        }
>> +        // SAFETY:
>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
>> +        // and we've just checked that the range and index is within bounds.
>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>> +        // that `self.count` won't overflow early in the constructor.
>> +        Ok(unsafe { &mut *self.cpu_addr.add(offset) })
>> +    }
>> +
>> +    /// Reads the value of `field` and ensures that its type is `FromBytes`
>> +    ///
>> +    /// # Safety:
>> +    ///
>> +    /// This must be called from the `dma_read` macro which ensures that the `field` pointer is
>> +    /// validated beforehand.
>> +    ///
>> +    /// Public but hidden since it should only be used from `dma_read` macro.
>> +    #[doc(hidden)]
>> +    pub unsafe fn field_read<F: FromBytes>(&self, field: *const F) -> F {
>> +        // SAFETY: By the safety requirements field is valid
>> +        unsafe { field.read() }
>> +    }
>> +
>> +    /// Writes a value to `field` and ensures that its type is `AsBytes`
>> +    ///
>> +    /// # Safety:
>> +    ///
>> +    /// This must be called from the `dma_write` macro which ensures that the `field` pointer is
>> +    /// validated beforehand.
>> +    ///
>> +    /// Public but hidden since it should only be used from `dma_write` macro.
>> +    #[doc(hidden)]
>> +    pub unsafe fn field_write<F: AsBytes>(&self, field: *mut F, val: F) {
>> +        // SAFETY: By the safety requirements field is valid
>> +        unsafe { field.write(val) }
>> +    }
>> +}
>> +
>> +/// Reads a field of an item from an allocated region of structs.
>> +/// # Examples
>> +///
>> +/// ```
>> +/// struct MyStruct { field: u32, }
>> +/// // SAFETY: All bit patterns are acceptable values for MyStruct.
>> +/// unsafe impl kernel::transmute::FromBytes for MyStruct{};
>> +/// // SAFETY: Instances of MyStruct have no uninitialized portions.
>> +/// unsafe impl kernel::transmute::AsBytes for MyStruct{};
>> +///
>> +/// # fn test(alloc: &kernel::dma::CoherentAllocation<MyStruct>) -> Result {
>> +/// let whole = kernel::dma_read!(alloc[2]);
>> +/// let field = kernel::dma_read!(alloc[1].field);
>> +/// # Ok::<(), Error>(()) }
>> +/// ```
>> +#[macro_export]
>> +macro_rules! dma_read {
>> +    ($dma:ident [ $idx:expr ] $($field:tt)* ) => {{
>> +        let item = $dma.item_from_index($idx)?;
>> +        // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
>> +        // dereferenced. The compiler also further validates the expression on whether `field`
>> +        // is a member of `item` when expanded by the macro.
>> +        unsafe {
>> +            let ptr_field = ::core::ptr::addr_of!((*item) $($field)*);
>> +            $dma.field_read(ptr_field)
>> +        }
>> +    }};
>> +}
>> +
>> +/// Writes to a field of an item from an allocated region of structs.
>> +/// # Examples
>> +///
>> +/// ```
>> +/// struct MyStruct { member: u32, }
>> +/// // SAFETY: All bit patterns are acceptable values for MyStruct.
>> +/// unsafe impl kernel::transmute::FromBytes for MyStruct{};
>> +/// // SAFETY: Instances of MyStruct have no uninitialized portions.
>> +/// unsafe impl kernel::transmute::AsBytes for MyStruct{};
>> +///
>> +/// # fn test(alloc: &mut kernel::dma::CoherentAllocation<MyStruct>) -> Result {
>> +/// kernel::dma_write!(alloc[2].member = 0xf);
>> +/// kernel::dma_write!(alloc[1] = MyStruct { member: 0xf });
>> +/// # Ok::<(), Error>(()) }
>> +/// ```
>> +#[macro_export]
>> +macro_rules! dma_write {
>> +    ($dma:ident [ $idx:expr ] $($field:tt)*) => {{
>> +        kernel::dma_write!($dma, $idx, $($field)*);
>> +    }};
>> +    ($dma:ident, $idx: expr, = $val:expr) => {
>> +        let item = $dma.item_from_index($idx)?;
>> +        // SAFETY: `item_from_index` ensures that `item` is always a valid item.
>> +        unsafe { $dma.field_write(item, $val) }
>> +    };
>> +    ($dma:ident, $idx: expr, $(.$field:ident)* = $val:expr) => {
>> +        let item = $dma.item_from_index($idx)?;
>> +        // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
>> +        // dereferenced. The compiler also further validates the expression on whether `field`
>> +        // is a member of `item` when expanded by the macro.
>> +        unsafe {
>> +            let ptr_field = ::core::ptr::addr_of_mut!((*item) $(.$field)*);
>> +            $dma.field_write(ptr_field, $val)
>> +        }
>> +    };
>> +}
>> +
>> +impl<T: AsBytes + FromBytes> Drop for CoherentAllocation<T> {
>> +    fn drop(&mut self) {
>> +        let size = self.count * core::mem::size_of::<T>();
>> +        // SAFETY: the device, cpu address, and the dma handle is valid due to the
>> +        // type invariants on `CoherentAllocation`.
>> +        unsafe {
>> +            bindings::dma_free_attrs(
>> +                self.dev.as_raw(),
>> +                size,
>> +                self.cpu_addr as _,
>> +                self.dma_handle,
>> +                self.dma_attrs.as_raw(),
>> +            )
>> +        }
>> +    }
>> +}
>> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
>> index 496ed32b0911..5081cb66b2f9 100644
>> --- a/rust/kernel/lib.rs
>> +++ b/rust/kernel/lib.rs
>> @@ -44,6 +44,7 @@
>>   pub mod device;
>>   pub mod device_id;
>>   pub mod devres;
>> +pub mod dma;
>>   pub mod driver;
>>   pub mod error;
>>   #[cfg(CONFIG_RUST_FW_LOADER_ABSTRACTIONS)]
>> --
>> 2.43.0


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 11:49 ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
                     ` (2 preceding siblings ...)
  2025-02-24 14:40   ` Andreas Hindborg
@ 2025-02-24 20:07   ` Benno Lossin
  2025-02-24 21:40     ` Miguel Ojeda
                       ` (2 more replies)
  2025-02-24 22:05   ` Miguel Ojeda
                     ` (3 subsequent siblings)
  7 siblings, 3 replies; 70+ messages in thread
From: Benno Lossin @ 2025-02-24 20:07 UTC (permalink / raw)
  To: Abdiel Janulgue, aliceryhl, dakr, robin.murphy, daniel.almeida,
	rust-for-linux
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu

On 24.02.25 12:49, Abdiel Janulgue wrote:
> Add a simple dma coherent allocator rust abstraction. Based on
> Andreas Hindborg's dma abstractions from the rnvme driver, which
> was also based on earlier work by Wedson Almeida Filho.
> 
> Nacked-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
> ---
>  rust/bindings/bindings_helper.h |   1 +
>  rust/helpers/dma.c              |  13 +
>  rust/helpers/helpers.c          |   1 +
>  rust/kernel/dma.rs              | 421 ++++++++++++++++++++++++++++++++
>  rust/kernel/lib.rs              |   1 +
>  5 files changed, 437 insertions(+)
>  create mode 100644 rust/helpers/dma.c
>  create mode 100644 rust/kernel/dma.rs
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 55354e4dec14..f69b05025e52 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -11,6 +11,7 @@
>  #include <linux/blk_types.h>
>  #include <linux/blkdev.h>
>  #include <linux/cred.h>
> +#include <linux/dma-mapping.h>
>  #include <linux/errname.h>
>  #include <linux/ethtool.h>
>  #include <linux/file.h>
> diff --git a/rust/helpers/dma.c b/rust/helpers/dma.c
> new file mode 100644
> index 000000000000..30da079d366c
> --- /dev/null
> +++ b/rust/helpers/dma.c
> @@ -0,0 +1,13 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/dma-mapping.h>
> +
> +int rust_helper_dma_set_mask_and_coherent(struct device *dev, u64 mask)
> +{
> +	return dma_set_mask_and_coherent(dev, mask);
> +}
> +
> +int rust_helper_dma_set_mask(struct device *dev, u64 mask)
> +{
> +	return dma_set_mask(dev, mask);
> +}
> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> index 0640b7e115be..8f3808c8b7fe 100644
> --- a/rust/helpers/helpers.c
> +++ b/rust/helpers/helpers.c
> @@ -13,6 +13,7 @@
>  #include "build_bug.c"
>  #include "cred.c"
>  #include "device.c"
> +#include "dma.c"
>  #include "err.c"
>  #include "fs.c"
>  #include "io.c"
> diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
> new file mode 100644
> index 000000000000..b4dd5d411711
> --- /dev/null
> +++ b/rust/kernel/dma.rs
> @@ -0,0 +1,421 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Direct memory access (DMA).
> +//!
> +//! C header: [`include/linux/dma-mapping.h`](srctree/include/linux/dma-mapping.h)
> +
> +use crate::{
> +    bindings, build_assert,
> +    device::Device,
> +    error::code::*,
> +    error::Result,
> +    transmute::{AsBytes, FromBytes},
> +    types::ARef,
> +};
> +
> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
> +/// both streaming and coherent APIs together.
> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
> +}
> +
> +/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
> +pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
> +}

Why aren't these methods on `Device`? (i.e. inside of an `impl Device`
block)

> +
> +/// Possible attributes associated with a DMA mapping.
> +///
> +/// They can be combined with the operators `|`, `&`, and `!`.
> +///
> +/// Values can be used from the [`attrs`] module.
> +#[derive(Clone, Copy, PartialEq)]
> +#[repr(transparent)]
> +pub struct Attrs(u32);
> +
> +impl Attrs {
> +    /// Get the raw representation of this attribute.
> +    pub(crate) fn as_raw(self) -> crate::ffi::c_ulong {
> +        self.0 as _
> +    }
> +
> +    /// Check whether `flags` is contained in `self`.
> +    pub fn contains(self, flags: Attrs) -> bool {
> +        (self & flags) == flags
> +    }
> +}
> +
> +impl core::ops::BitOr for Attrs {
> +    type Output = Self;
> +    fn bitor(self, rhs: Self) -> Self::Output {
> +        Self(self.0 | rhs.0)
> +    }
> +}
> +
> +impl core::ops::BitAnd for Attrs {
> +    type Output = Self;
> +    fn bitand(self, rhs: Self) -> Self::Output {
> +        Self(self.0 & rhs.0)
> +    }
> +}
> +
> +impl core::ops::Not for Attrs {
> +    type Output = Self;
> +    fn not(self) -> Self::Output {
> +        Self(!self.0)
> +    }
> +}
> +
> +/// DMA mapping attrributes.
> +pub mod attrs {
> +    use super::Attrs;
> +
> +    /// Specifies that reads and writes to the mapping may be weakly ordered, that is that reads
> +    /// and writes may pass each other.
> +    pub const DMA_ATTR_WEAK_ORDERING: Attrs = Attrs(bindings::DMA_ATTR_WEAK_ORDERING);
> +
> +    /// Specifies that writes to the mapping may be buffered to improve performance.
> +    pub const DMA_ATTR_WRITE_COMBINE: Attrs = Attrs(bindings::DMA_ATTR_WRITE_COMBINE);
> +
> +    /// Lets the platform to avoid creating a kernel virtual mapping for the allocated buffer.
> +    pub const DMA_ATTR_NO_KERNEL_MAPPING: Attrs = Attrs(bindings::DMA_ATTR_NO_KERNEL_MAPPING);
> +
> +    /// Allows platform code to skip synchronization of the CPU cache for the given buffer assuming
> +    /// that it has been already transferred to 'device' domain.
> +    pub const DMA_ATTR_SKIP_CPU_SYNC: Attrs = Attrs(bindings::DMA_ATTR_SKIP_CPU_SYNC);
> +
> +    /// Forces contiguous allocation of the buffer in physical memory.
> +    pub const DMA_ATTR_FORCE_CONTIGUOUS: Attrs = Attrs(bindings::DMA_ATTR_FORCE_CONTIGUOUS);
> +
> +    /// This is a hint to the DMA-mapping subsystem that it's probably not worth the time to try
> +    /// to allocate memory to in a way that gives better TLB efficiency.
> +    pub const DMA_ATTR_ALLOC_SINGLE_PAGES: Attrs = Attrs(bindings::DMA_ATTR_ALLOC_SINGLE_PAGES);
> +
> +    /// This tells the DMA-mapping subsystem to suppress allocation failure reports (similarly to
> +    /// __GFP_NOWARN).
> +    pub const DMA_ATTR_NO_WARN: Attrs = Attrs(bindings::DMA_ATTR_NO_WARN);
> +
> +    /// Used to indicate that the buffer is fully accessible at an elevated privilege level (and
> +    /// ideally inaccessible or at least read-only at lesser-privileged levels).
> +    pub const DMA_ATTR_PRIVILEGED: Attrs = Attrs(bindings::DMA_ATTR_PRIVILEGED);
> +}
> +
> +/// An abstraction of the `dma_alloc_coherent` API.
> +///
> +/// This is an abstraction around the `dma_alloc_coherent` API which is used to allocate and map
> +/// large consistent DMA regions.
> +///
> +/// A [`CoherentAllocation`] instance contains a pointer to the allocated region (in the
> +/// processor's virtual address space) and the device address which can be given to the device
> +/// as the DMA address base of the region. The region is released once [`CoherentAllocation`]
> +/// is dropped.
> +///
> +/// # Invariants
> +///
> +/// For the lifetime of an instance of [`CoherentAllocation`], the cpu address is a valid pointer

"the cpu address" -> "`cpu_addr`"

You can shorten this to "`cpu_addr` is a valid pointer to [...]".

> +/// to an allocated region of consistent memory and we hold a reference to the device.

Isn't the "we hold a reference to the device" part ensured by the
`ARef<Device>`? Or did you want to specify that `cpu_addr` must come
from `dev`?

> +pub struct CoherentAllocation<T: AsBytes + FromBytes> {
> +    dev: ARef<Device>,
> +    dma_handle: bindings::dma_addr_t,
> +    count: usize,
> +    cpu_addr: *mut T,
> +    dma_attrs: Attrs,
> +}
> +
> +impl<T: AsBytes + FromBytes> CoherentAllocation<T> {
> +    /// Allocates a region of `size_of::<T> * count` of consistent memory.
> +    ///
> +    /// # Examples
> +    ///
> +    /// ```
> +    /// use kernel::device::Device;
> +    /// use kernel::dma::{attrs::*, CoherentAllocation};
> +    ///
> +    /// # fn test(dev: &Device) -> Result {
> +    /// let c: CoherentAllocation<u64> = CoherentAllocation::alloc_attrs(dev.into(), 4, GFP_KERNEL,
> +    ///                                                                  DMA_ATTR_NO_WARN)?;
> +    /// # Ok::<(), Error>(()) }
> +    /// ```
> +    pub fn alloc_attrs(
> +        dev: ARef<Device>,
> +        count: usize,
> +        gfp_flags: kernel::alloc::Flags,
> +        dma_attrs: Attrs,
> +    ) -> Result<CoherentAllocation<T>> {
> +        build_assert!(
> +            core::mem::size_of::<T>() > 0,
> +            "It doesn't make sense for the allocated type to be a ZST"
> +        );

Is this a safety requirement? I.e. the `dma_alloc_attrs` function cannot
handle a size of 0?

> +
> +        let size = count
> +            .checked_mul(core::mem::size_of::<T>())
> +            .ok_or(EOVERFLOW)?;
> +        let mut dma_handle = 0;
> +        // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +        // We ensure that we catch the failure on this function and throw an ENOMEM

The second sentence is not part of a safety requirement.

> +        let ret = unsafe {
> +            bindings::dma_alloc_attrs(
> +                dev.as_raw(),
> +                size,
> +                &mut dma_handle,
> +                gfp_flags.as_raw(),
> +                dma_attrs.as_raw(),
> +            )
> +        };
> +        if ret.is_null() {
> +            return Err(ENOMEM);
> +        }
> +        // INVARIANT: We just successfully allocated a coherent region which is accessible for
> +        // `count` elements, hence the cpu address is valid. We also hold a refcounted reference
> +        // to the device.
> +        Ok(Self {
> +            dev,
> +            dma_handle,
> +            count,
> +            cpu_addr: ret as *mut T,
> +            dma_attrs,
> +        })
> +    }
> +
> +    /// Performs the same functionality as `alloc_attrs`, except the `dma_attrs` is 0 by default.
> +    pub fn alloc_coherent(
> +        dev: ARef<Device>,
> +        count: usize,
> +        gfp_flags: kernel::alloc::Flags,
> +    ) -> Result<CoherentAllocation<T>> {
> +        CoherentAllocation::alloc_attrs(dev, count, gfp_flags, Attrs(0))
> +    }
> +
> +    /// Create a duplicate of the `CoherentAllocation` object but prevent it from being dropped.
> +    pub fn skip_drop(self) -> CoherentAllocation<T> {

Why does this method exist? It doesn't do anything.

> +        let me = core::mem::ManuallyDrop::new(self);
> +        Self {
> +            // SAFETY: The refcount of `dev` will not be decremented because this doesn't actually
> +            // duplicafe `ARef` and the use of `ManuallyDrop` forgets the originals.
> +            dev: unsafe { core::ptr::read(&me.dev) },
> +            dma_handle: me.dma_handle,
> +            count: me.count,
> +            cpu_addr: me.cpu_addr,
> +            dma_attrs: me.dma_attrs,
> +        }
> +    }
> +
> +    /// Returns the base address to the allocated region in the CPU's virtual address space.
> +    pub fn start_ptr(&self) -> *const T {
> +        self.cpu_addr
> +    }
> +
> +    /// Returns the base address to the allocated region in the CPU's virtual address space as
> +    /// a mutable pointer.
> +    pub fn start_ptr_mut(&mut self) -> *mut T {
> +        self.cpu_addr
> +    }
> +
> +    /// Returns a DMA handle which may given to the device as the DMA address base of
> +    /// the region.
> +    pub fn dma_handle(&self) -> bindings::dma_addr_t {
> +        self.dma_handle
> +    }
> +
> +    /// Returns the data from the region starting from `offset` as a slice.
> +    /// `offset` and `count` are in units of `T`, not the number of bytes.
> +    ///
> +    /// Due to the safety requirements of slice, the caller should consider that the region could
> +    /// be modified by the device at anytime (see the safety block below). For ringbuffer type of

What is a safety block?

> +    /// r/w access or use-cases where the pointer to the live data is needed, `start_ptr()` or
> +    /// `start_ptr_mut()` could be used instead.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Callers must ensure that no hardware operations that involve the buffer are currently
> +    /// taking place while the returned slice is live.
> +    pub unsafe fn as_slice(&self, offset: usize, count: usize) -> Result<&[T]> {
> +        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
> +        if end >= self.count {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY:
> +        // - The pointer is valid due to type invariant on `CoherentAllocation`,
> +        // we've just checked that the range and index is within bounds. The immutability of the
> +        // of data is also guaranteed by the safety requirements of the function.
> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> +        // that `self.count` won't overflow early in the constructor.
> +        Ok(unsafe { core::slice::from_raw_parts(self.cpu_addr.add(offset), count) })
> +    }
> +
> +    /// Performs the same functionality as `as_slice`, except that a mutable slice is returned.
> +    /// See that method for documentation and safety requirements.

I don't think this is good documentation style. I think copy-pasting the
first line and second paragraph is better.

> +    ///
> +    /// # Safety
> +    ///
> +    /// It is the callers responsibility to avoid separate read and write accesses to the region
> +    /// while the returned slice is live.

This safety requirement is worded quite differently compared to the one
on `as_slice`, why?

> +    pub unsafe fn as_slice_mut(&self, offset: usize, count: usize) -> Result<&mut [T]> {
> +        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
> +        if end >= self.count {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY:
> +        // - The pointer is valid due to type invariant on `CoherentAllocation`,
> +        // we've just checked that the range and index is within bounds. The immutability of the
> +        // of data is also guaranteed by the safety requirements of the function.
> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> +        // that `self.count` won't overflow early in the constructor.
> +        Ok(unsafe { core::slice::from_raw_parts_mut(self.cpu_addr.add(offset), count) })
> +    }
> +
> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
> +    /// number of bytes.
> +    ///
> +    /// # Examples
> +    ///
> +    /// ```
> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
> +    /// let somedata: [u8; 4] = [0xf; 4];
> +    /// let buf: &[u8] = &somedata;
> +    /// alloc.write(buf, 0)?;
> +    /// # Ok::<(), Error>(()) }
> +    /// ```
> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
> +        if end >= self.count {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY:
> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
> +        // and we've just checked that the range and index is within bounds.
> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> +        // that `self.count` won't overflow early in the constructor.
> +        unsafe {
> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())

Why are there no concurrent write or read operations on `cpu_addr`?

> +        };
> +        Ok(())
> +    }
> +
> +    /// Retrieve a single entry from the region with bounds checking. `offset` is in units of `T`,
> +    /// not the number of bytes.

Please add some information on the returned raw pointer, ie when it can
be accessed.

> +    pub fn item_from_index(&self, offset: usize) -> Result<*mut T> {
> +        if offset >= self.count {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY:
> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
> +        // and we've just checked that the range and index is within bounds.
> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> +        // that `self.count` won't overflow early in the constructor.
> +        Ok(unsafe { &mut *self.cpu_addr.add(offset) })

Why do you take `&mut` here? The return type is a `*mut T`.

> +    }
> +
> +    /// Reads the value of `field` and ensures that its type is `FromBytes`
> +    ///
> +    /// # Safety:
> +    ///
> +    /// This must be called from the `dma_read` macro which ensures that the `field` pointer is
> +    /// validated beforehand.
> +    ///
> +    /// Public but hidden since it should only be used from `dma_read` macro.
> +    #[doc(hidden)]
> +    pub unsafe fn field_read<F: FromBytes>(&self, field: *const F) -> F {
> +        // SAFETY: By the safety requirements field is valid
> +        unsafe { field.read() }
> +    }
> +
> +    /// Writes a value to `field` and ensures that its type is `AsBytes`
> +    ///
> +    /// # Safety:
> +    ///
> +    /// This must be called from the `dma_write` macro which ensures that the `field` pointer is
> +    /// validated beforehand.
> +    ///
> +    /// Public but hidden since it should only be used from `dma_write` macro.
> +    #[doc(hidden)]
> +    pub unsafe fn field_write<F: AsBytes>(&self, field: *mut F, val: F) {
> +        // SAFETY: By the safety requirements field is valid
> +        unsafe { field.write(val) }
> +    }
> +}
> +
> +/// Reads a field of an item from an allocated region of structs.
> +/// # Examples
> +///
> +/// ```
> +/// struct MyStruct { field: u32, }
> +/// // SAFETY: All bit patterns are acceptable values for MyStruct.
> +/// unsafe impl kernel::transmute::FromBytes for MyStruct{};
> +/// // SAFETY: Instances of MyStruct have no uninitialized portions.
> +/// unsafe impl kernel::transmute::AsBytes for MyStruct{};
> +///
> +/// # fn test(alloc: &kernel::dma::CoherentAllocation<MyStruct>) -> Result {
> +/// let whole = kernel::dma_read!(alloc[2]);
> +/// let field = kernel::dma_read!(alloc[1].field);
> +/// # Ok::<(), Error>(()) }
> +/// ```
> +#[macro_export]
> +macro_rules! dma_read {
> +    ($dma:ident [ $idx:expr ] $($field:tt)* ) => {{
> +        let item = $dma.item_from_index($idx)?;

Please replace this line with:

    let item = $crate::dma::CoherentAllocation::item_from_index(&$dma, $idx)?;

This ensures that you're actually calling the `item_from_index` function
of the `CoherentAllocation` type and not some other user-defined type.
(this is very important for the `unsafe` block and safety comment
below!)

> +        // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
> +        // dereferenced. The compiler also further validates the expression on whether `field`
> +        // is a member of `item` when expanded by the macro.
> +        unsafe {
> +            let ptr_field = ::core::ptr::addr_of!((*item) $($field)*);
> +            $dma.field_read(ptr_field)

$crate::dma::CoherentAllocation::field_read(&$dma, ptr_field)

> +        }
> +    }};
> +}
> +
> +/// Writes to a field of an item from an allocated region of structs.
> +/// # Examples
> +///
> +/// ```
> +/// struct MyStruct { member: u32, }
> +/// // SAFETY: All bit patterns are acceptable values for MyStruct.
> +/// unsafe impl kernel::transmute::FromBytes for MyStruct{};
> +/// // SAFETY: Instances of MyStruct have no uninitialized portions.
> +/// unsafe impl kernel::transmute::AsBytes for MyStruct{};
> +///
> +/// # fn test(alloc: &mut kernel::dma::CoherentAllocation<MyStruct>) -> Result {
> +/// kernel::dma_write!(alloc[2].member = 0xf);
> +/// kernel::dma_write!(alloc[1] = MyStruct { member: 0xf });
> +/// # Ok::<(), Error>(()) }
> +/// ```
> +#[macro_export]
> +macro_rules! dma_write {
> +    ($dma:ident [ $idx:expr ] $($field:tt)*) => {{
> +        kernel::dma_write!($dma, $idx, $($field)*);

Please use `$crate::` instead of `kernel::`.

> +    }};
> +    ($dma:ident, $idx: expr, = $val:expr) => {
> +        let item = $dma.item_from_index($idx)?;

Same here as with `dma_read`.

> +        // SAFETY: `item_from_index` ensures that `item` is always a valid item.
> +        unsafe { $dma.field_write(item, $val) }

Again (and below).

> +    };
> +    ($dma:ident, $idx: expr, $(.$field:ident)* = $val:expr) => {
> +        let item = $dma.item_from_index($idx)?;
> +        // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
> +        // dereferenced. The compiler also further validates the expression on whether `field`
> +        // is a member of `item` when expanded by the macro.
> +        unsafe {
> +            let ptr_field = ::core::ptr::addr_of_mut!((*item) $(.$field)*);
> +            $dma.field_write(ptr_field, $val)
> +        }
> +    };
> +}
> +
> +impl<T: AsBytes + FromBytes> Drop for CoherentAllocation<T> {
> +    fn drop(&mut self) {
> +        let size = self.count * core::mem::size_of::<T>();
> +        // SAFETY: the device, cpu address, and the dma handle is valid due to the
> +        // type invariants on `CoherentAllocation`.
> +        unsafe {
> +            bindings::dma_free_attrs(
> +                self.dev.as_raw(),
> +                size,
> +                self.cpu_addr as _,
> +                self.dma_handle,
> +                self.dma_attrs.as_raw(),
> +            )
> +        }
> +    }
> +}

Can you move this drop impl directly below the struct definition?

---
Cheers,
Benno

> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index 496ed32b0911..5081cb66b2f9 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -44,6 +44,7 @@
>  pub mod device;
>  pub mod device_id;
>  pub mod devres;
> +pub mod dma;
>  pub mod driver;
>  pub mod error;
>  #[cfg(CONFIG_RUST_FW_LOADER_ABSTRACTIONS)]
> --
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 20:07   ` Benno Lossin
@ 2025-02-24 21:40     ` Miguel Ojeda
  2025-02-24 23:12     ` Daniel Almeida
  2025-02-25  8:15     ` Abdiel Janulgue
  2 siblings, 0 replies; 70+ messages in thread
From: Miguel Ojeda @ 2025-02-24 21:40 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Abdiel Janulgue, aliceryhl, dakr, robin.murphy, daniel.almeida,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu

On Mon, Feb 24, 2025 at 9:07 PM Benno Lossin <benno.lossin@proton.me> wrote:
>
> I don't think this is good documentation style. I think copy-pasting the
> first line and second paragraph is better.

Yeah, sometimes a bit of duplication is OK.

Now, even if we wanted to do something like this, then at the very
least intra-doc links should be used properly so that people can
actually jump to the right place in the generated ones.

Abdiel: in general, please use intra-doc links everywhere where they
may work -- the patch is missing almost all of them. It helps not just
readers, but also to keep docs in sync with each other, since you will
get diagnostics if `rustdoc` finds broken links.

Thanks!

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 11:49 ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
                     ` (3 preceding siblings ...)
  2025-02-24 20:07   ` Benno Lossin
@ 2025-02-24 22:05   ` Miguel Ojeda
  2025-02-25  8:15     ` Abdiel Janulgue
  2025-03-03 11:30   ` Andreas Hindborg
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 70+ messages in thread
From: Miguel Ojeda @ 2025-02-24 22:05 UTC (permalink / raw)
  To: Abdiel Janulgue
  Cc: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS

Hi Abdiel,

Some quick doc-related nits -- please take them as a general guide for
potential improvements in newer versions etc., given there are still
other comments that could change the contents.

On Mon, Feb 24, 2025 at 12:50 PM Abdiel Janulgue
<abdiel.janulgue@gmail.com> wrote:
>
> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
> +/// both streaming and coherent APIs together.

This comment differs from the C side one -- that is OK, but just
wondering if there was a strong reason for that.

> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {

This returns `i32` -- I have not read the users of this, but should we
take the chance to have a `Result` already here? Same below for the
other one.

> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.

To keep things consistent, please start comments with uppercase, i.e.
"SAFETY: Device pointer ..."

It may also be clearer to say "by the type invariant on".

> +/// Possible attributes associated with a DMA mapping.
> +///
> +/// They can be combined with the operators `|`, `&`, and `!`.

Even if it may be trivial, a small example could be nice here (when I
see a sentence like "This can be used ...", I typically consider
whether it is a good place to show how).

> +/// DMA mapping attrributes.

Typo: attributes.

> +    /// let c: CoherentAllocation<u64> = CoherentAllocation::alloc_attrs(dev.into(), 4, GFP_KERNEL,
> +    ///                                                                  DMA_ATTR_NO_WARN)?;

Please try to format the code as `rustfmt` would normally do it. I
know it is a pain to do it manually -- hopefully
`format_code_in_doc_comments` will eventually be stable.

> +        // We ensure that we catch the failure on this function and throw an ENOMEM

Apart from what Benno said, please try to use Markdown in all comments.

> +    /// Performs the same functionality as `alloc_attrs`, except the `dma_attrs` is 0 by default.

Intra-doc links (I will mark a few more that I think may work).

> +    /// Create a duplicate of the `CoherentAllocation` object but prevent it from being dropped.

Intra-doc link.

> +    /// r/w access or use-cases where the pointer to the live data is needed, `start_ptr()` or
> +    /// `start_ptr_mut()` could be used instead.

Intra-doc links.

> +    /// Performs the same functionality as `as_slice`, except that a mutable slice is returned.

Intra-doc link.

> +    /// Reads the value of `field` and ensures that its type is `FromBytes`

Intra-doc link.

> +    /// # Safety:

Typo: no colon. Also another one below.

> +    /// This must be called from the `dma_read` macro which ensures that the `field` pointer is
> +    /// validated beforehand.
> +    ///
> +    /// Public but hidden since it should only be used from `dma_read` macro.

Intra-doc links -- even if they are not rendered because it is hidden
(also even if it were a private item).

> +    #[doc(hidden)]
> +    pub unsafe fn field_read<F: FromBytes>(&self, field: *const F) -> F {
> +        // SAFETY: By the safety requirements field is valid

Markdown; and please end the sentence with a period for consistency.

> +    /// Writes a value to `field` and ensures that its type is `AsBytes`

Intra-doc link, and period at the end (same below too).

> +/// Reads a field of an item from an allocated region of structs.
> +/// # Examples

Newline between these two lines. Also for the write equivalent one below.

> +/// struct MyStruct { field: u32, }
> +/// // SAFETY: All bit patterns are acceptable values for MyStruct.

Newline between these two, also Markdown. Same below and in the write
equivalent.

I think it is fairly important to have clean examples, since people
will learn from and follow them!

Thanks!

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 16:27     ` Abdiel Janulgue
@ 2025-02-24 22:35       ` Daniel Almeida
  2025-02-28  8:35       ` Alexandre Courbot
  1 sibling, 0 replies; 70+ messages in thread
From: Daniel Almeida @ 2025-02-24 22:35 UTC (permalink / raw)
  To: Abdiel Janulgue
  Cc: Andreas Hindborg, aliceryhl, dakr, robin.murphy, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, Valentin Obst,
	linux-kernel, Christoph Hellwig, Marek Szyprowski, airlied, iommu

Hi Abdiel

> On 24 Feb 2025, at 13:27, Abdiel Janulgue <abdiel.janulgue@gmail.com> wrote:
> 
> 
> On 24/02/2025 16:40, Andreas Hindborg wrote:
>> "Abdiel Janulgue" <abdiel.janulgue@gmail.com> writes:
>> [...]
>>> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
>>> +/// both streaming and coherent APIs together.
>>> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
>>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
>>> +    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
>>> +}
>>> +
>>> +/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
>>> +pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
>>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
>>> +    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
>>> +}
>> Sorry if it was asked before, I am late to the party. But would it make
>> sense to put these to functions on `Device` and make them take `&self`.
> 
> Thanks for checking this. The API is about the dma addressing capabalities of the device, my thoughts would be to group them with the rest of the dma API? But either way, I don't have a strong preference. I'll let others comment.
> 
> Daniel, Danilo?
> 
> Regards,
> Abdiel

IIRC, that was already suggested by either Alice or someone else previously. Also (and again IIRC), you were going to
split that part into a separate patch? 

— Daniel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 20:07   ` Benno Lossin
  2025-02-24 21:40     ` Miguel Ojeda
@ 2025-02-24 23:12     ` Daniel Almeida
  2025-03-03 13:00       ` Andreas Hindborg
  2025-02-25  8:15     ` Abdiel Janulgue
  2 siblings, 1 reply; 70+ messages in thread
From: Daniel Almeida @ 2025-02-24 23:12 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Abdiel Janulgue, aliceryhl, dakr, robin.murphy, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu

Hi Benno,

>> +pub struct CoherentAllocation<T: AsBytes + FromBytes> {
>> +    dev: ARef<Device>,
>> +    dma_handle: bindings::dma_addr_t,
>> +    count: usize,
>> +    cpu_addr: *mut T,
>> +    dma_attrs: Attrs,
>> +}
>> +
>> +impl<T: AsBytes + FromBytes> CoherentAllocation<T> {
>> +    /// Allocates a region of `size_of::<T> * count` of consistent memory.
>> +    ///
>> +    /// # Examples
>> +    ///
>> +    /// ```
>> +    /// use kernel::device::Device;
>> +    /// use kernel::dma::{attrs::*, CoherentAllocation};
>> +    ///
>> +    /// # fn test(dev: &Device) -> Result {
>> +    /// let c: CoherentAllocation<u64> = CoherentAllocation::alloc_attrs(dev.into(), 4, GFP_KERNEL,
>> +    ///                                                                  DMA_ATTR_NO_WARN)?;
>> +    /// # Ok::<(), Error>(()) }
>> +    /// ```
>> +    pub fn alloc_attrs(
>> +        dev: ARef<Device>,
>> +        count: usize,
>> +        gfp_flags: kernel::alloc::Flags,
>> +        dma_attrs: Attrs,
>> +    ) -> Result<CoherentAllocation<T>> {
>> +        build_assert!(
>> +            core::mem::size_of::<T>() > 0,
>> +            "It doesn't make sense for the allocated type to be a ZST"
>> +        );
> 
> Is this a safety requirement? I.e. the `dma_alloc_attrs` function cannot
> handle a size of 0?

It doesn’t make any sense to have a ZST here. At the very minimum we want to be able to read and
write bytes using this code, or preferably some larger T if applicable. The region also has to be allocated and we
need a size for that too.

This was discussed in an early iteration of this patch. I think a build failure is warranted.

> 
>> +    /// r/w access or use-cases where the pointer to the live data is needed, `start_ptr()` or
>> +    /// `start_ptr_mut()` could be used instead.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// Callers must ensure that no hardware operations that involve the buffer are currently
>> +    /// taking place while the returned slice is live.
>> +    pub unsafe fn as_slice(&self, offset: usize, count: usize) -> Result<&[T]> {
>> +        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
>> +        if end >= self.count {
>> +            return Err(EINVAL);
>> +        }
>> +        // SAFETY:
>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`,
>> +        // we've just checked that the range and index is within bounds. The immutability of the
>> +        // of data is also guaranteed by the safety requirements of the function.
>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>> +        // that `self.count` won't overflow early in the constructor.
>> +        Ok(unsafe { core::slice::from_raw_parts(self.cpu_addr.add(offset), count) })
>> +    }
>> +
>> +    /// Performs the same functionality as `as_slice`, except that a mutable slice is returned.
>> +    /// See that method for documentation and safety requirements.
> 
> I don't think this is good documentation style. I think copy-pasting the
> first line and second paragraph is better.
> 
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// It is the callers responsibility to avoid separate read and write accesses to the region
>> +    /// while the returned slice is live.
> 
> This safety requirement is worded quite differently compared to the one
> on `as_slice`, why?

This was discussed in an earlier iteration of this patch too. If you call this function, you must make
sure that some hw doesn’t change the memory contents while the slice is alive.

This is device-specific. e.g.: I know that for video codecs this is possible. Therefore this API
can be used there.

On the other hand, some people may use the API to share ring buffers with the hw or even implement
some polling logic where the CPU is waiting for a given memory location to be written by the HW. 

If you’re trying to do this, you cannot use this API, that’s what the safety requirement is about.

Although I’d word this differently to be honest, i.e.:

 /// It is the callers responsibility to avoid concurrent access to the region by the CPU and any other device
 /// while the slice is alive.

This is also needs a bit of work, but at least it makes the point clearer.

> 
>> +    pub unsafe fn as_slice_mut(&self, offset: usize, count: usize) -> Result<&mut [T]> {
>> +        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
>> +        if end >= self.count {
>> +            return Err(EINVAL);
>> +        }
>> +        // SAFETY:
>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`,
>> +        // we've just checked that the range and index is within bounds. The immutability of the
>> +        // of data is also guaranteed by the safety requirements of the function.
>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>> +        // that `self.count` won't overflow early in the constructor.
>> +        Ok(unsafe { core::slice::from_raw_parts_mut(self.cpu_addr.add(offset), count) })
>> +    }
>> +
>> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
>> +    /// number of bytes.
>> +    ///
>> +    /// # Examples
>> +    ///
>> +    /// ```
>> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
>> +    /// let somedata: [u8; 4] = [0xf; 4];
>> +    /// let buf: &[u8] = &somedata;
>> +    /// alloc.write(buf, 0)?;
>> +    /// # Ok::<(), Error>(()) }
>> +    /// ```
>> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
>> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
>> +        if end >= self.count {
>> +            return Err(EINVAL);
>> +        }
>> +        // SAFETY:
>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
>> +        // and we've just checked that the range and index is within bounds.
>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>> +        // that `self.count` won't overflow early in the constructor.
>> +        unsafe {
>> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
> 
> Why are there no concurrent write or read operations on `cpu_addr`?

Sorry, can you rephrase this question?

— Daniel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 22:05   ` Miguel Ojeda
@ 2025-02-25  8:15     ` Abdiel Janulgue
  0 siblings, 0 replies; 70+ messages in thread
From: Abdiel Janulgue @ 2025-02-25  8:15 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS

On 25/02/2025 00:05, Miguel Ojeda wrote:
> Hi Abdiel,
> 
> Some quick doc-related nits -- please take them as a general guide for
> potential improvements in newer versions etc., given there are still
> other comments that could change the contents.
> 
> On Mon, Feb 24, 2025 at 12:50 PM Abdiel Janulgue
> <abdiel.janulgue@gmail.com> wrote:
>>
>> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
>> +/// both streaming and coherent APIs together.
> 
> This comment differs from the C side one -- that is OK, but just
> wondering if there was a strong reason for that.
> 
>> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
> 
> This returns `i32` -- I have not read the users of this, but should we
> take the chance to have a `Result` already here? Same below for the
> other one.
> 
>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> 
> To keep things consistent, please start comments with uppercase, i.e.
> "SAFETY: Device pointer ..."
> 
> It may also be clearer to say "by the type invariant on".
> 
>> +/// Possible attributes associated with a DMA mapping.
>> +///
>> +/// They can be combined with the operators `|`, `&`, and `!`.
> 
> Even if it may be trivial, a small example could be nice here (when I
> see a sentence like "This can be used ...", I typically consider
> whether it is a good place to show how).
> 
>> +/// DMA mapping attrributes.
> 
> Typo: attributes.
> 
>> +    /// let c: CoherentAllocation<u64> = CoherentAllocation::alloc_attrs(dev.into(), 4, GFP_KERNEL,
>> +    ///                                                                  DMA_ATTR_NO_WARN)?;
> 
> Please try to format the code as `rustfmt` would normally do it. I
> know it is a pain to do it manually -- hopefully
> `format_code_in_doc_comments` will eventually be stable.
> 
>> +        // We ensure that we catch the failure on this function and throw an ENOMEM
> 
> Apart from what Benno said, please try to use Markdown in all comments.
> 
>> +    /// Performs the same functionality as `alloc_attrs`, except the `dma_attrs` is 0 by default.
> 
> Intra-doc links (I will mark a few more that I think may work).
> 
>> +    /// Create a duplicate of the `CoherentAllocation` object but prevent it from being dropped.
> 
> Intra-doc link.
> 
>> +    /// r/w access or use-cases where the pointer to the live data is needed, `start_ptr()` or
>> +    /// `start_ptr_mut()` could be used instead.
> 
> Intra-doc links.
> 
>> +    /// Performs the same functionality as `as_slice`, except that a mutable slice is returned.
> 
> Intra-doc link.
> 
>> +    /// Reads the value of `field` and ensures that its type is `FromBytes`
> 
> Intra-doc link.
> 
>> +    /// # Safety:
> 
> Typo: no colon. Also another one below.
> 
>> +    /// This must be called from the `dma_read` macro which ensures that the `field` pointer is
>> +    /// validated beforehand.
>> +    ///
>> +    /// Public but hidden since it should only be used from `dma_read` macro.
> 
> Intra-doc links -- even if they are not rendered because it is hidden
> (also even if it were a private item).
> 
>> +    #[doc(hidden)]
>> +    pub unsafe fn field_read<F: FromBytes>(&self, field: *const F) -> F {
>> +        // SAFETY: By the safety requirements field is valid
> 
> Markdown; and please end the sentence with a period for consistency.
> 
>> +    /// Writes a value to `field` and ensures that its type is `AsBytes`
> 
> Intra-doc link, and period at the end (same below too).
> 
>> +/// Reads a field of an item from an allocated region of structs.
>> +/// # Examples
> 
> Newline between these two lines. Also for the write equivalent one below.
> 
>> +/// struct MyStruct { field: u32, }
>> +/// // SAFETY: All bit patterns are acceptable values for MyStruct.
> 
> Newline between these two, also Markdown. Same below and in the write
> equivalent.
> 
> I think it is fairly important to have clean examples, since people
> will learn from and follow them!

Hi Miguel,

Thanks for the valuable feedback. Still learning the ropes at this 
point, but will further improve this. :)

Regards,
Abdiel

> 
> Thanks!
> 
> Cheers,
> Miguel


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 20:07   ` Benno Lossin
  2025-02-24 21:40     ` Miguel Ojeda
  2025-02-24 23:12     ` Daniel Almeida
@ 2025-02-25  8:15     ` Abdiel Janulgue
  2025-02-25  9:09       ` Alice Ryhl
  2 siblings, 1 reply; 70+ messages in thread
From: Abdiel Janulgue @ 2025-02-25  8:15 UTC (permalink / raw)
  To: Benno Lossin, aliceryhl, dakr, robin.murphy, daniel.almeida,
	rust-for-linux
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu

Hi,

On 24/02/2025 22:07, Benno Lossin wrote:
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// It is the callers responsibility to avoid separate read and write accesses to the region
>> +    /// while the returned slice is live.
> This safety requirement is worded quite differently compared to the one
> on `as_slice`, why?
> 
>> +    pub unsafe fn as_slice_mut(&self, offset: usize, count: usize) -> Result<&mut [T]> {
>> +        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
>> +        if end >= self.count {
>> +            return Err(EINVAL);
>> +        }
>> +        // SAFETY:
>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`,
>> +        // we've just checked that the range and index is within bounds. The immutability of the
>> +        // of data is also guaranteed by the safety requirements of the function.
>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>> +        // that `self.count` won't overflow early in the constructor.
>> +        Ok(unsafe {core::slice::from_raw_parts_mut(self.cpu_addr.add(offset), count) })
>> +    }
>> +
>> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
>> +    /// number of bytes.
>> +    ///
>> +    /// # Examples
>> +    ///
>> +    /// ```
>> +    /// # fn test(alloc: &mutkernel::dma::CoherentAllocation<u8>) -> Result {
>> +    /// let somedata: [u8; 4] = [0xf; 4];
>> +    /// let buf: &[u8] = &somedata;
>> +    /// alloc.write(buf, 0)?;
>> +    /// # Ok::<(), Error>(()) }
>> +    /// ```
>> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
>> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
>> +        if end >= self.count {
>> +            return Err(EINVAL);
>> +        }
>> +        // SAFETY:
>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
>> +        // and we've just checked that the range and index is within bounds.
>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>> +        // that `self.count` won't overflow early in the constructor.
>> +        unsafe {
>> +core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
> Why are there no concurrent write or read operations on `cpu_addr`?

Thanks for the feedback! I noticed an additional safety requirement in 
slice::from_raw_parts_mut:

"The memory referenced by the returned slice must not be accessed 
through any other pointer (not derived from the return value) for the 
duration of lifetime 'a. Both read and write accesses are forbidden."

I can see now though why both as_slice and as_slice_mut docs needs more 
clarity. i.e., they could be worded similarly and add the additional 
safety requirement of slice::from_raw_parts_mut of having no other r/w 
access while the slice is live?

Regards,
Abdiel




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-25  8:15     ` Abdiel Janulgue
@ 2025-02-25  9:09       ` Alice Ryhl
  0 siblings, 0 replies; 70+ messages in thread
From: Alice Ryhl @ 2025-02-25  9:09 UTC (permalink / raw)
  To: Abdiel Janulgue
  Cc: Benno Lossin, dakr, robin.murphy, daniel.almeida, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu

On Tue, Feb 25, 2025 at 9:15 AM Abdiel Janulgue
<abdiel.janulgue@gmail.com> wrote:
>
> Hi,
>
> On 24/02/2025 22:07, Benno Lossin wrote:
> >> +    ///
> >> +    /// # Safety
> >> +    ///
> >> +    /// It is the callers responsibility to avoid separate read and write accesses to the region
> >> +    /// while the returned slice is live.
> > This safety requirement is worded quite differently compared to the one
> > on `as_slice`, why?
> >
> >> +    pub unsafe fn as_slice_mut(&self, offset: usize, count: usize) -> Result<&mut [T]> {
> >> +        let end = offset.checked_add(count).ok_or(EOVERFLOW)?;
> >> +        if end >= self.count {
> >> +            return Err(EINVAL);
> >> +        }
> >> +        // SAFETY:
> >> +        // - The pointer is valid due to type invariant on `CoherentAllocation`,
> >> +        // we've just checked that the range and index is within bounds. The immutability of the
> >> +        // of data is also guaranteed by the safety requirements of the function.
> >> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> >> +        // that `self.count` won't overflow early in the constructor.
> >> +        Ok(unsafe {core::slice::from_raw_parts_mut(self.cpu_addr.add(offset), count) })
> >> +    }
> >> +
> >> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
> >> +    /// number of bytes.
> >> +    ///
> >> +    /// # Examples
> >> +    ///
> >> +    /// ```
> >> +    /// # fn test(alloc: &mutkernel::dma::CoherentAllocation<u8>) -> Result {
> >> +    /// let somedata: [u8; 4] = [0xf; 4];
> >> +    /// let buf: &[u8] = &somedata;
> >> +    /// alloc.write(buf, 0)?;
> >> +    /// # Ok::<(), Error>(()) }
> >> +    /// ```
> >> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
> >> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
> >> +        if end >= self.count {
> >> +            return Err(EINVAL);
> >> +        }
> >> +        // SAFETY:
> >> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
> >> +        // and we've just checked that the range and index is within bounds.
> >> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> >> +        // that `self.count` won't overflow early in the constructor.
> >> +        unsafe {
> >> +core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
> > Why are there no concurrent write or read operations on `cpu_addr`?
>
> Thanks for the feedback! I noticed an additional safety requirement in
> slice::from_raw_parts_mut:
>
> "The memory referenced by the returned slice must not be accessed
> through any other pointer (not derived from the return value) for the
> duration of lifetime 'a. Both read and write accesses are forbidden."
>
> I can see now though why both as_slice and as_slice_mut docs needs more
> clarity. i.e., they could be worded similarly and add the additional
> safety requirement of slice::from_raw_parts_mut of having no other r/w
> access while the slice is live?

You can use the same wordins as `Page::read_raw` and `Page::write_raw`
for your as_slice[_mut] methods. See rust/kernel/page.rs.

Alice

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 16:27     ` Abdiel Janulgue
  2025-02-24 22:35       ` Daniel Almeida
@ 2025-02-28  8:35       ` Alexandre Courbot
  2025-02-28 10:01         ` Danilo Krummrich
  1 sibling, 1 reply; 70+ messages in thread
From: Alexandre Courbot @ 2025-02-28  8:35 UTC (permalink / raw)
  To: Abdiel Janulgue, Andreas Hindborg
  Cc: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, Valentin Obst,
	linux-kernel, Christoph Hellwig, Marek Szyprowski, airlied, iommu

On Tue Feb 25, 2025 at 1:27 AM JST, Abdiel Janulgue wrote:
>
> On 24/02/2025 16:40, Andreas Hindborg wrote:
>> "Abdiel Janulgue" <abdiel.janulgue@gmail.com> writes:
>> 
>> [...]
>> 
>>> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
>>> +/// both streaming and coherent APIs together.
>>> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
>>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
>>> +    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
>>> +}
>>> +
>>> +/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
>>> +pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
>>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
>>> +    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
>>> +}
>> 
>> Sorry if it was asked before, I am late to the party. But would it make
>> sense to put these to functions on `Device` and make them take `&self`.
>
> Thanks for checking this. The API is about the dma addressing 
> capabalities of the device, my thoughts would be to group them with the 
> rest of the dma API? But either way, I don't have a strong preference. 
> I'll let others comment.

FWIW I was about to make the same comment as Andreas. The mask is set on
a Device, it should thus be part of its implementation. You can still
keep them with the rest of the DMA API in this file by just adding an
`impl Device` block here - since Device resides in the same crate, it is
allowed.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-28  8:35       ` Alexandre Courbot
@ 2025-02-28 10:01         ` Danilo Krummrich
  0 siblings, 0 replies; 70+ messages in thread
From: Danilo Krummrich @ 2025-02-28 10:01 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Abdiel Janulgue, Andreas Hindborg, aliceryhl, robin.murphy,
	daniel.almeida, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Trevor Gross, Valentin Obst, linux-kernel, Christoph Hellwig,
	Marek Szyprowski, airlied, iommu

On Fri, Feb 28, 2025 at 05:35:26PM +0900, Alexandre Courbot wrote:
> On Tue Feb 25, 2025 at 1:27 AM JST, Abdiel Janulgue wrote:
> >
> > On 24/02/2025 16:40, Andreas Hindborg wrote:
> >> "Abdiel Janulgue" <abdiel.janulgue@gmail.com> writes:
> >> 
> >> [...]
> >> 
> >>> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
> >>> +/// both streaming and coherent APIs together.
> >>> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
> >>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> >>> +    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
> >>> +}
> >>> +
> >>> +/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
> >>> +pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
> >>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> >>> +    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
> >>> +}
> >> 
> >> Sorry if it was asked before, I am late to the party. But would it make
> >> sense to put these to functions on `Device` and make them take `&self`.
> >
> > Thanks for checking this. The API is about the dma addressing 
> > capabalities of the device, my thoughts would be to group them with the 
> > rest of the dma API? But either way, I don't have a strong preference. 
> > I'll let others comment.
> 
> FWIW I was about to make the same comment as Andreas. The mask is set on
> a Device, it should thus be part of its implementation.

Yes, this should be Device methods. Please also add them in a separate commit.

> You can still
> keep them with the rest of the DMA API in this file by just adding an
> `impl Device` block here - since Device resides in the same crate, it is
> allowed.

Eventually, the build system will support to move rust code to the corresponding
subsystem entires in separate crates.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 11:49 ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
                     ` (4 preceding siblings ...)
  2025-02-24 22:05   ` Miguel Ojeda
@ 2025-03-03 11:30   ` Andreas Hindborg
  2025-03-04  8:58     ` Abdiel Janulgue
  2025-03-03 13:08   ` Robin Murphy
  2025-03-05 17:41   ` Jason Gunthorpe
  7 siblings, 1 reply; 70+ messages in thread
From: Andreas Hindborg @ 2025-03-03 11:30 UTC (permalink / raw)
  To: Abdiel Janulgue
  Cc: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, Valentin Obst,
	linux-kernel, Christoph Hellwig, Marek Szyprowski, airlied, iommu

"Abdiel Janulgue" <abdiel.janulgue@gmail.com> writes:

> Add a simple dma coherent allocator rust abstraction. Based on
> Andreas Hindborg's dma abstractions from the rnvme driver, which
> was also based on earlier work by Wedson Almeida Filho.
>
> Nacked-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
[...]

> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> index 0640b7e115be..8f3808c8b7fe 100644
> --- a/rust/helpers/helpers.c
> +++ b/rust/helpers/helpers.c
> @@ -13,6 +13,7 @@
>  #include "build_bug.c"
>  #include "cred.c"
>  #include "device.c"
> +#include "dma.c"
>  #include "err.c"
>  #include "fs.c"
>  #include "io.c"
> diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
> new file mode 100644
> index 000000000000..b4dd5d411711
> --- /dev/null
> +++ b/rust/kernel/dma.rs
> @@ -0,0 +1,421 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Direct memory access (DMA).
> +//!
> +//! C header: [`include/linux/dma-mapping.h`](srctree/include/linux/dma-mapping.h)
> +
> +use crate::{
> +    bindings, build_assert,
> +    device::Device,
> +    error::code::*,
> +    error::Result,
> +    transmute::{AsBytes, FromBytes},
> +    types::ARef,
> +};
> +
> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
> +/// both streaming and coherent APIs together.
> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
> +}
> +
> +/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
> +pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
> +}

I'm rebasing some of the dma pool code I'm using for NVMe on top of
these patches, and I notice that these methods in the original code from
way back (besides being on Device) has these methods return `Result`:

    pub fn dma_set_mask(&self, mask: u64) -> Result {
        let dev = self.as_raw();
        let ret = unsafe { bindings::dma_set_mask(dev as _, mask) };
        if ret != 0 {
            Err(Error::from_errno(ret))
        } else {
            Ok(())
        }
    }

Is there a reason for not returning a `Result` in this series?


Best regards,
Andreas Hindborg



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 23:12     ` Daniel Almeida
@ 2025-03-03 13:00       ` Andreas Hindborg
  2025-03-03 13:13         ` Alice Ryhl
  0 siblings, 1 reply; 70+ messages in thread
From: Andreas Hindborg @ 2025-03-03 13:00 UTC (permalink / raw)
  To: Daniel Almeida
  Cc: Benno Lossin, Abdiel Janulgue, aliceryhl, dakr, robin.murphy,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Trevor Gross, Valentin Obst, linux-kernel,
	Christoph Hellwig, Marek Szyprowski, airlied, iommu

"Daniel Almeida" <daniel.almeida@collabora.com> writes:

> Hi Benno,
>

[...]

>>> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
>>> +    /// number of bytes.
>>> +    ///
>>> +    /// # Examples
>>> +    ///
>>> +    /// ```
>>> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
>>> +    /// let somedata: [u8; 4] = [0xf; 4];
>>> +    /// let buf: &[u8] = &somedata;
>>> +    /// alloc.write(buf, 0)?;
>>> +    /// # Ok::<(), Error>(()) }
>>> +    /// ```
>>> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
>>> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
>>> +        if end >= self.count {
>>> +            return Err(EINVAL);
>>> +        }
>>> +        // SAFETY:
>>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
>>> +        // and we've just checked that the range and index is within bounds.
>>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>>> +        // that `self.count` won't overflow early in the constructor.
>>> +        unsafe {
>>> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
>>
>> Why are there no concurrent write or read operations on `cpu_addr`?
>
> Sorry, can you rephrase this question?

This write is suffering the same complications as discussed here [1].
There are multiple issues with this implementation.

1) `write` takes a shared reference and thus may be called concurrently.
There is no synchronization, so `copy_nonoverlapping` could be called
concurrently on the same address. The safety requirements for
`copy_nonoverlapping` state that the destination must be valid for
write. Alice claims in [1] that any memory area that experience data
races are not valid for writes. So the safety requirement of
`copy_nonoverlapping` is violated and this call is potential UB.

2) The destination of this write is DMA memory. It could be concurrently
modified by hardware, leading to the same issues as 1). Thus the
function cannot be safe if we cannot guarantee hardware will not write
to the region while this function is executing.

Now, I don't think that these _should_ be issues, but according to our
Rust language experts they _are_.

I really think that copying data through a raw pointer to or from a
place that experiences data races, should _not_ be UB if the data is not
interpreted in any way, other than moving it.

Best regards,
Andreas Hindborg

[1] https://rust-for-linux.zulipchat.com/#narrow/channel/291565-Help/topic/Interacting.20with.20user.20space.20pages

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 11:49 ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
                     ` (5 preceding siblings ...)
  2025-03-03 11:30   ` Andreas Hindborg
@ 2025-03-03 13:08   ` Robin Murphy
  2025-03-05 17:41   ` Jason Gunthorpe
  7 siblings, 0 replies; 70+ messages in thread
From: Robin Murphy @ 2025-03-03 13:08 UTC (permalink / raw)
  To: Abdiel Janulgue, aliceryhl, dakr, daniel.almeida, rust-for-linux
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS

On 24/02/2025 11:49 am, Abdiel Janulgue wrote:
> Add a simple dma coherent allocator rust abstraction. Based on
> Andreas Hindborg's dma abstractions from the rnvme driver, which
> was also based on earlier work by Wedson Almeida Filho.
> 
> Nacked-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
> ---
>   rust/bindings/bindings_helper.h |   1 +
>   rust/helpers/dma.c              |  13 +
>   rust/helpers/helpers.c          |   1 +
>   rust/kernel/dma.rs              | 421 ++++++++++++++++++++++++++++++++
>   rust/kernel/lib.rs              |   1 +
>   5 files changed, 437 insertions(+)
>   create mode 100644 rust/helpers/dma.c
>   create mode 100644 rust/kernel/dma.rs
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 55354e4dec14..f69b05025e52 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -11,6 +11,7 @@
>   #include <linux/blk_types.h>
>   #include <linux/blkdev.h>
>   #include <linux/cred.h>
> +#include <linux/dma-mapping.h>
>   #include <linux/errname.h>
>   #include <linux/ethtool.h>
>   #include <linux/file.h>
> diff --git a/rust/helpers/dma.c b/rust/helpers/dma.c
> new file mode 100644
> index 000000000000..30da079d366c
> --- /dev/null
> +++ b/rust/helpers/dma.c
> @@ -0,0 +1,13 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/dma-mapping.h>
> +
> +int rust_helper_dma_set_mask_and_coherent(struct device *dev, u64 mask)
> +{
> +	return dma_set_mask_and_coherent(dev, mask);
> +}
> +
> +int rust_helper_dma_set_mask(struct device *dev, u64 mask)
> +{
> +	return dma_set_mask(dev, mask);
> +}
> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> index 0640b7e115be..8f3808c8b7fe 100644
> --- a/rust/helpers/helpers.c
> +++ b/rust/helpers/helpers.c
> @@ -13,6 +13,7 @@
>   #include "build_bug.c"
>   #include "cred.c"
>   #include "device.c"
> +#include "dma.c"
>   #include "err.c"
>   #include "fs.c"
>   #include "io.c"
> diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
> new file mode 100644
> index 000000000000..b4dd5d411711
> --- /dev/null
> +++ b/rust/kernel/dma.rs
> @@ -0,0 +1,421 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Direct memory access (DMA).
> +//!
> +//! C header: [`include/linux/dma-mapping.h`](srctree/include/linux/dma-mapping.h)
> +
> +use crate::{
> +    bindings, build_assert,
> +    device::Device,
> +    error::code::*,
> +    error::Result,
> +    transmute::{AsBytes, FromBytes},
> +    types::ARef,
> +};
> +
> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
> +/// both streaming and coherent APIs together.
> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
> +}
> +
> +/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
> +pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
> +    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
> +}

It's a bit funny that dma_set_coherent_mask() is the only one you really 
need at this point (with only the coherent API), yet it's the one that's 
been left out.

FWIW, also +1 to the consensus that the DMA masks are, both logically 
and functionally, properties of the device, so making these Device 
methods seems sensible.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-03 13:00       ` Andreas Hindborg
@ 2025-03-03 13:13         ` Alice Ryhl
  2025-03-03 15:21           ` Andreas Hindborg
  2025-03-04  8:28           ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
  0 siblings, 2 replies; 70+ messages in thread
From: Alice Ryhl @ 2025-03-03 13:13 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Daniel Almeida, Benno Lossin, Abdiel Janulgue, dakr, robin.murphy,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Trevor Gross, Valentin Obst, linux-kernel,
	Christoph Hellwig, Marek Szyprowski, airlied, iommu

On Mon, Mar 3, 2025 at 2:00 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Daniel Almeida" <daniel.almeida@collabora.com> writes:
>
> > Hi Benno,
> >
>
> [...]
>
> >>> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
> >>> +    /// number of bytes.
> >>> +    ///
> >>> +    /// # Examples
> >>> +    ///
> >>> +    /// ```
> >>> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
> >>> +    /// let somedata: [u8; 4] = [0xf; 4];
> >>> +    /// let buf: &[u8] = &somedata;
> >>> +    /// alloc.write(buf, 0)?;
> >>> +    /// # Ok::<(), Error>(()) }
> >>> +    /// ```
> >>> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
> >>> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
> >>> +        if end >= self.count {
> >>> +            return Err(EINVAL);
> >>> +        }
> >>> +        // SAFETY:
> >>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
> >>> +        // and we've just checked that the range and index is within bounds.
> >>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
> >>> +        // that `self.count` won't overflow early in the constructor.
> >>> +        unsafe {
> >>> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
> >>
> >> Why are there no concurrent write or read operations on `cpu_addr`?
> >
> > Sorry, can you rephrase this question?
>
> This write is suffering the same complications as discussed here [1].
> There are multiple issues with this implementation.
>
> 1) `write` takes a shared reference and thus may be called concurrently.
> There is no synchronization, so `copy_nonoverlapping` could be called
> concurrently on the same address. The safety requirements for
> `copy_nonoverlapping` state that the destination must be valid for
> write. Alice claims in [1] that any memory area that experience data
> races are not valid for writes. So the safety requirement of
> `copy_nonoverlapping` is violated and this call is potential UB.
>
> 2) The destination of this write is DMA memory. It could be concurrently
> modified by hardware, leading to the same issues as 1). Thus the
> function cannot be safe if we cannot guarantee hardware will not write
> to the region while this function is executing.
>
> Now, I don't think that these _should_ be issues, but according to our
> Rust language experts they _are_.
>
> I really think that copying data through a raw pointer to or from a
> place that experiences data races, should _not_ be UB if the data is not
> interpreted in any way, other than moving it.
>
>
> Best regards,
> Andreas Hindborg

We need to make progress on this series, and it's starting to get late
in the cycle. I suggest we:

1. Delete as_slice, as_slice_mut, write, and skip_drop.
2. Change field_read/field_write to use a volatile read/write.

This will let us make progress now and sidestep this discussion. The
deleted methods can happen in a follow-up.

Similarly for the dma mask methods, let's either drop them to a
follow-up patch or just put them anywhere and move them later.

Alice

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-03 13:13         ` Alice Ryhl
@ 2025-03-03 15:21           ` Andreas Hindborg
  2025-03-03 15:44             ` Alice Ryhl
  2025-03-04  8:28           ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
  1 sibling, 1 reply; 70+ messages in thread
From: Andreas Hindborg @ 2025-03-03 15:21 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Daniel Almeida, Benno Lossin, Abdiel Janulgue, dakr, robin.murphy,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Trevor Gross, Valentin Obst, linux-kernel,
	Christoph Hellwig, Marek Szyprowski, airlied, iommu

"Alice Ryhl" <aliceryhl@google.com> writes:

> On Mon, Mar 3, 2025 at 2:00 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Daniel Almeida" <daniel.almeida@collabora.com> writes:
>>
>> > Hi Benno,
>> >
>>
>> [...]
>>
>> >>> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
>> >>> +    /// number of bytes.
>> >>> +    ///
>> >>> +    /// # Examples
>> >>> +    ///
>> >>> +    /// ```
>> >>> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
>> >>> +    /// let somedata: [u8; 4] = [0xf; 4];
>> >>> +    /// let buf: &[u8] = &somedata;
>> >>> +    /// alloc.write(buf, 0)?;
>> >>> +    /// # Ok::<(), Error>(()) }
>> >>> +    /// ```
>> >>> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
>> >>> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
>> >>> +        if end >= self.count {
>> >>> +            return Err(EINVAL);
>> >>> +        }
>> >>> +        // SAFETY:
>> >>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
>> >>> +        // and we've just checked that the range and index is within bounds.
>> >>> +        // - `offset` can't overflow since it is smaller than `selfcount` and we've checked
>> >>> +        // that `self.count` won't overflow early in the constructor.
>> >>> +        unsafe {
>> >>> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
>> >>
>> >> Why are there no concurrent write or read operations on `cpu_addr`?
>> >
>> > Sorry, can you rephrase this question?
>>
>> This write is suffering the same complications as discussed here [1].
>> There are multiple issues with this implementation.
>>
>> 1) `write` takes a shared reference and thus may be called concurrently.
>> There is no synchronization, so `copy_nonoverlapping` could be called
>> concurrently on the same address. The safety requirements for
>> `copy_nonoverlapping` state that the destination must be valid for
>> write. Alice claims in [1] that any memory area that experience data
>> races are not valid for writes. So the safety requirement of
>> `copy_nonoverlapping` is violated and this call is potential UB.
>>
>> 2) The destination of this write is DMA memory. It could be concurrently
>> modified by hardware, leading to the same issues as 1). Thus the
>> function cannot be safe if we cannot guarantee hardware will not write
>> to the region while this function is executing.
>>
>> Now, I don't think that these _should_ be issues, but according to our
>> Rust language experts they _are_.
>>
>> I really think that copying data through a raw pointer to or from a
>> place that experiences data races, should _not_ be UB if the data is not
>> interpreted in any way, other than moving it.
>>
>>
>> Best regards,
>> Andreas Hindborg
>
> We need to make progress on this series, and it's starting to get late
> in the cycle. I suggest we:

There is always another cycle.

>
> 1. Delete as_slice, as_slice_mut, write, and skip_drop.
> 2. Change field_read/field_write to use a volatile read/write.

Volatile reads/writes that race are OK?

>
> This will let us make progress now and sidestep this discussion. The
> deleted methods can happen in a follow-up.

`item_from_index`, the `dma_read` and `dma_write` macros as well, I would think?

>
> Similarly for the dma mask methods, let's either drop them to a
> follow-up patch or just put them anywhere and move them later.

Sure.


Best regards,
Andreas Hindborg



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-03 15:21           ` Andreas Hindborg
@ 2025-03-03 15:44             ` Alice Ryhl
  2025-03-03 18:45               ` Andreas Hindborg
  2025-03-03 19:00               ` Allow data races on some read/write operations Andreas Hindborg
  0 siblings, 2 replies; 70+ messages in thread
From: Alice Ryhl @ 2025-03-03 15:44 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Daniel Almeida, Benno Lossin, Abdiel Janulgue, dakr, robin.murphy,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Trevor Gross, Valentin Obst, linux-kernel,
	Christoph Hellwig, Marek Szyprowski, airlied, iommu

On Mon, Mar 3, 2025 at 4:21 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > On Mon, Mar 3, 2025 at 2:00 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> >>
> >> "Daniel Almeida" <daniel.almeida@collabora.com> writes:
> >>
> >> > Hi Benno,
> >> >
> >>
> >> [...]
> >>
> >> >>> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
> >> >>> +    /// number of bytes.
> >> >>> +    ///
> >> >>> +    /// # Examples
> >> >>> +    ///
> >> >>> +    /// ```
> >> >>> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
> >> >>> +    /// let somedata: [u8; 4] = [0xf; 4];
> >> >>> +    /// let buf: &[u8] = &somedata;
> >> >>> +    /// alloc.write(buf, 0)?;
> >> >>> +    /// # Ok::<(), Error>(()) }
> >> >>> +    /// ```
> >> >>> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
> >> >>> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
> >> >>> +        if end >= self.count {
> >> >>> +            return Err(EINVAL);
> >> >>> +        }
> >> >>> +        // SAFETY:
> >> >>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
> >> >>> +        // and we've just checked that the range and index is within bounds.
> >> >>> +        // - `offset` can't overflow since it is smaller than `selfcount` and we've checked
> >> >>> +        // that `self.count` won't overflow early in the constructor.
> >> >>> +        unsafe {
> >> >>> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
> >> >>
> >> >> Why are there no concurrent write or read operations on `cpu_addr`?
> >> >
> >> > Sorry, can you rephrase this question?
> >>
> >> This write is suffering the same complications as discussed here [1].
> >> There are multiple issues with this implementation.
> >>
> >> 1) `write` takes a shared reference and thus may be called concurrently.
> >> There is no synchronization, so `copy_nonoverlapping` could be called
> >> concurrently on the same address. The safety requirements for
> >> `copy_nonoverlapping` state that the destination must be valid for
> >> write. Alice claims in [1] that any memory area that experience data
> >> races are not valid for writes. So the safety requirement of
> >> `copy_nonoverlapping` is violated and this call is potential UB.
> >>
> >> 2) The destination of this write is DMA memory. It could be concurrently
> >> modified by hardware, leading to the same issues as 1). Thus the
> >> function cannot be safe if we cannot guarantee hardware will not write
> >> to the region while this function is executing.
> >>
> >> Now, I don't think that these _should_ be issues, but according to our
> >> Rust language experts they _are_.
> >>
> >> I really think that copying data through a raw pointer to or from a
> >> place that experiences data races, should _not_ be UB if the data is not
> >> interpreted in any way, other than moving it.
> >>
> >>
> >> Best regards,
> >> Andreas Hindborg
> >
> > We need to make progress on this series, and it's starting to get late
> > in the cycle. I suggest we:
>
> There is always another cycle.
>
> >
> > 1. Delete as_slice, as_slice_mut, write, and skip_drop.
> > 2. Change field_read/field_write to use a volatile read/write.
>
> Volatile reads/writes that race are OK?

I will not give a blanket yes to that. If you read their docs, you
will find that they claim to not allow it. But they are the correct
choice for DMA memory, and there's no way in practice to get
miscompilations on memory locations that are only accessed with
volatile operations, and never have references to them created.

In general, this will fall into the exception that we've been given
from the Rust people. In cases such as this where the Rust language
does not give us the operation we want, do it like you do in C. Since
Rust uses LLVM which does not miscompile the C part of the kernel, it
should not miscompile the Rust part either.

> > This will let us make progress now and sidestep this discussion. The
> > deleted methods can happen in a follow-up.
>
> `item_from_index`, the `dma_read` and `dma_write` macros as well, I would think?

Those are necessary to use field_read/field_write, so I think it's
fine to keep those.

Alice

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-03 15:44             ` Alice Ryhl
@ 2025-03-03 18:45               ` Andreas Hindborg
  2025-03-03 19:00               ` Allow data races on some read/write operations Andreas Hindborg
  1 sibling, 0 replies; 70+ messages in thread
From: Andreas Hindborg @ 2025-03-03 18:45 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Daniel Almeida, Benno Lossin, Abdiel Janulgue, dakr, robin.murphy,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Trevor Gross, Valentin Obst, linux-kernel,
	Christoph Hellwig, Marek Szyprowski, airlied, iommu

"Alice Ryhl" <aliceryhl@google.com> writes:

> On Mon, Mar 3, 2025 at 4:21 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>>
>> > On Mon, Mar 3, 2025 at 2:00 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>> >>
>> >> "Daniel Almeida" <daniel.almeida@collabora.com> writes:
>> >>
>> >> > Hi Benno,
>> >> >
>> >>
>> >> [...]
>> >>
>> >> >>> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
>> >> >>> +    /// number of bytes.
>> >> >>> +    ///
>> >> >>> +    /// # Examples
>> >> >>> +    ///
>> >> >>> +    /// ```
>> >> >>> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
>> >> >>> +    /// let somedata: [u8; 4] = [0xf; 4];
>> >> >>> +    /// let buf: &[u8] = &somedata;
>> >> >>> +    /// alloc.write(buf, 0)?;
>> >> >>> +    /// # Ok::<(), Error>(()) }
>> >> >>> +    /// ```
>> >> >>> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
>> >> >>> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
>> >> >>> +        if end >= self.count {
>> >> >>> +            return Err(EINVAL);
>> >> >>> +        }
>> >> >>> +        // SAFETY:
>> >> >>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
>> >> >>> +        // and we've just checked that the range and index is within bounds.
>> >> >>> +        // - `offset` can't overflow since it is smaller than `selfcount` and we've checked
>> >> >>> +        // that `self.count` won't overflow early in the constructor.
>> >> >>> +        unsafe {
>> >> >>> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
>> >> >>
>> >> >> Why are there no concurrent write or read operations on `cpu_addr`?
>> >> >
>> >> > Sorry, can you rephrase this question?
>> >>
>> >> This write is suffering the same complications as discussed here [1].
>> >> There are multiple issues with this implementation.
>> >>
>> >> 1) `write` takes a shared reference and thus may be called concurrently.
>> >> There is no synchronization, so `copy_nonoverlapping` could be called
>> >> concurrently on the same address. The safety requirements for
>> >> `copy_nonoverlapping` state that the destination must be valid for
>> >> write. Alice claims in [1] that any memory area that experience data
>> >> races are not valid for writes. So the safety requirement of
>> >> `copy_nonoverlapping` is violated and this call is potential UB.
>> >>
>> >> 2) The destination of this write is DMA memory. It could be concurrently
>> >> modified by hardware, leading to the same issues as 1). Thus the
>> >> function cannot be safe if we cannot guarantee hardware will not write
>> >> to the region while this function is executing.
>> >>
>> >> Now, I don't think that these _should_ be issues, but according to our
>> >> Rust language experts they _are_.
>> >>
>> >> I really think that copying data through a raw pointer to or from a
>> >> place that experiences data races, should _not_ be UB if the data is not
>> >> interpreted in any way, other than moving it.
>> >>
>> >>
>> >> Best regards,
>> >> Andreas Hindborg
>> >
>> > We need to make progress on this series, and it's starting to get late
>> > in the cycle. I suggest we:
>>
>> There is always another cycle.
>>
>> >
>> > 1. Delete as_slice, as_slice_mut, write, and skip_drop.
>> > 2. Change field_read/field_write to use a volatile read/write.
>>
>> Volatile reads/writes that race are OK?
>
> I will not give a blanket yes to that. If you read their docs, you
> will find that they claim to not allow it. But they are the correct
> choice for DMA memory, and there's no way in practice to get
> miscompilations on memory locations that are only accessed with
> volatile operations, and never have references to them created.
>
> In general, this will fall into the exception that we've been given
> from the Rust people. In cases such as this where the Rust language
> does not give us the operation we want, do it like you do in C. Since
> Rust uses LLVM which does not miscompile the C part of the kernel, it
> should not miscompile the Rust part either.
>
>> > This will let us make progress now and sidestep this discussion. The
>> > deleted methods can happen in a follow-up.
>>
>> `item_from_index`, the `dma_read` and `dma_write` macros as well, I would think?
>
> Those are necessary to use field_read/field_write, so I think it's
> fine to keep those.

I misread `item_from_index` as returning a value, but it's just pointer
arithmetic, so that is fine 👍



Best regards,
Andreas Hindborg



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Allow data races on some read/write operations
  2025-03-03 15:44             ` Alice Ryhl
  2025-03-03 18:45               ` Andreas Hindborg
@ 2025-03-03 19:00               ` Andreas Hindborg
  2025-03-03 20:08                 ` Boqun Feng
  1 sibling, 1 reply; 70+ messages in thread
From: Andreas Hindborg @ 2025-03-03 19:00 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Daniel Almeida, Benno Lossin, Abdiel Janulgue, dakr, robin.murphy,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Trevor Gross, Valentin Obst, linux-kernel,
	Christoph Hellwig, Marek Szyprowski, airlied, iommu


[New subject, was: Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction]

"Alice Ryhl" <aliceryhl@google.com> writes:

> On Mon, Mar 3, 2025 at 4:21 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>>
>> > On Mon, Mar 3, 2025 at 2:00 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>> >>
>> >> "Daniel Almeida" <daniel.almeida@collabora.com> writes:
>> >>
>> >> > Hi Benno,
>> >> >
>> >>
>> >> [...]
>> >>
>> >> >>> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
>> >> >>> +    /// number of bytes.
>> >> >>> +    ///
>> >> >>> +    /// # Examples
>> >> >>> +    ///
>> >> >>> +    /// ```
>> >> >>> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
>> >> >>> +    /// let somedata: [u8; 4] = [0xf; 4];
>> >> >>> +    /// let buf: &[u8] = &somedata;
>> >> >>> +    /// alloc.write(buf, 0)?;
>> >> >>> +    /// # Ok::<(), Error>(()) }
>> >> >>> +    /// ```
>> >> >>> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
>> >> >>> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
>> >> >>> +        if end >= self.count {
>> >> >>> +            return Err(EINVAL);
>> >> >>> +        }
>> >> >>> +        // SAFETY:
>> >> >>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
>> >> >>> +        // and we've just checked that the range and index is within bounds.
>> >> >>> +        // - `offset` can't overflow since it is smaller than `selfcount` and we've checked
>> >> >>> +        // that `self.count` won't overflow early in the constructor.
>> >> >>> +        unsafe {
>> >> >>> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
>> >> >>
>> >> >> Why are there no concurrent write or read operations on `cpu_addr`?
>> >> >
>> >> > Sorry, can you rephrase this question?
>> >>
>> >> This write is suffering the same complications as discussed here [1].
>> >> There are multiple issues with this implementation.
>> >>
>> >> 1) `write` takes a shared reference and thus may be called concurrently.
>> >> There is no synchronization, so `copy_nonoverlapping` could be called
>> >> concurrently on the same address. The safety requirements for
>> >> `copy_nonoverlapping` state that the destination must be valid for
>> >> write. Alice claims in [1] that any memory area that experience data
>> >> races are not valid for writes. So the safety requirement of
>> >> `copy_nonoverlapping` is violated and this call is potential UB.
>> >>
>> >> 2) The destination of this write is DMA memory. It could be concurrently
>> >> modified by hardware, leading to the same issues as 1). Thus the
>> >> function cannot be safe if we cannot guarantee hardware will not write
>> >> to the region while this function is executing.
>> >>
>> >> Now, I don't think that these _should_ be issues, but according to our
>> >> Rust language experts they _are_.
>> >>
>> >> I really think that copying data through a raw pointer to or from a
>> >> place that experiences data races, should _not_ be UB if the data is not
>> >> interpreted in any way, other than moving it.
>> >>
>> >>
>> >> Best regards,
>> >> Andreas Hindborg
>> >
>> > We need to make progress on this series, and it's starting to get late
>> > in the cycle. I suggest we:
>>
>> There is always another cycle.
>>
>> >
>> > 1. Delete as_slice, as_slice_mut, write, and skip_drop.
>> > 2. Change field_read/field_write to use a volatile read/write.
>>
>> Volatile reads/writes that race are OK?
>
> I will not give a blanket yes to that. If you read their docs, you
> will find that they claim to not allow it. But they are the correct
> choice for DMA memory, and there's no way in practice to get
> miscompilations on memory locations that are only accessed with
> volatile operations, and never have references to them created.
>
> In general, this will fall into the exception that we've been given
> from the Rust people. In cases such as this where the Rust language
> does not give us the operation we want, do it like you do in C. Since
> Rust uses LLVM which does not miscompile the C part of the kernel, it
> should not miscompile the Rust part either.

This exception we got for `core::ptr::{read,write}_volatile`, did we
document that somewhere?

I feel slightly lost when trying to figure out what fits under this
exception and what is UB. I think that fist step to making this more
straight forward is having clear documentation.

For cases where we need to do the equivalent of `memmove`/`memcpy`, what
are is our options?

In case we have no options, do you know who would be the right people on
the Rust Project side to contact about getting an exception for this
case?


Best regards,
Andreas Hindborg



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-03 19:00               ` Allow data races on some read/write operations Andreas Hindborg
@ 2025-03-03 20:08                 ` Boqun Feng
  2025-03-04 19:03                   ` Ralf Jung
  0 siblings, 1 reply; 70+ messages in thread
From: Boqun Feng @ 2025-03-03 20:08 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Alice Ryhl, Daniel Almeida, Benno Lossin, Abdiel Janulgue, dakr,
	robin.murphy, rust-for-linux, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Trevor Gross, Valentin Obst, linux-kernel,
	Christoph Hellwig, Marek Szyprowski, airlied, iommu, Ralf Jung,
	comex, lkmm

On Mon, Mar 03, 2025 at 08:00:03PM +0100, Andreas Hindborg wrote:
> 
> [New subject, was: Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction]
> 
> "Alice Ryhl" <aliceryhl@google.com> writes:
> 
> > On Mon, Mar 3, 2025 at 4:21 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> >>
> >> "Alice Ryhl" <aliceryhl@google.com> writes:
> >>
> >> > On Mon, Mar 3, 2025 at 2:00 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> >> >>
> >> >> "Daniel Almeida" <daniel.almeida@collabora.com> writes:
> >> >>
> >> >> > Hi Benno,
> >> >> >
> >> >>
> >> >> [...]
> >> >>
> >> >> >>> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
> >> >> >>> +    /// number of bytes.
> >> >> >>> +    ///
> >> >> >>> +    /// # Examples
> >> >> >>> +    ///
> >> >> >>> +    /// ```
> >> >> >>> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
> >> >> >>> +    /// let somedata: [u8; 4] = [0xf; 4];
> >> >> >>> +    /// let buf: &[u8] = &somedata;
> >> >> >>> +    /// alloc.write(buf, 0)?;
> >> >> >>> +    /// # Ok::<(), Error>(()) }
> >> >> >>> +    /// ```
> >> >> >>> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
> >> >> >>> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
> >> >> >>> +        if end >= self.count {
> >> >> >>> +            return Err(EINVAL);
> >> >> >>> +        }
> >> >> >>> +        // SAFETY:
> >> >> >>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
> >> >> >>> +        // and we've just checked that the range and index is within bounds.
> >> >> >>> +        // - `offset` can't overflow since it is smaller than `selfcount` and we've checked
> >> >> >>> +        // that `self.count` won't overflow early in the constructor.
> >> >> >>> +        unsafe {
> >> >> >>> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
> >> >> >>
> >> >> >> Why are there no concurrent write or read operations on `cpu_addr`?
> >> >> >
> >> >> > Sorry, can you rephrase this question?
> >> >>
> >> >> This write is suffering the same complications as discussed here [1].
> >> >> There are multiple issues with this implementation.
> >> >>
> >> >> 1) `write` takes a shared reference and thus may be called concurrently.
> >> >> There is no synchronization, so `copy_nonoverlapping` could be called
> >> >> concurrently on the same address. The safety requirements for
> >> >> `copy_nonoverlapping` state that the destination must be valid for
> >> >> write. Alice claims in [1] that any memory area that experience data
> >> >> races are not valid for writes. So the safety requirement of
> >> >> `copy_nonoverlapping` is violated and this call is potential UB.
> >> >>
> >> >> 2) The destination of this write is DMA memory. It could be concurrently
> >> >> modified by hardware, leading to the same issues as 1). Thus the
> >> >> function cannot be safe if we cannot guarantee hardware will not write
> >> >> to the region while this function is executing.
> >> >>
> >> >> Now, I don't think that these _should_ be issues, but according to our
> >> >> Rust language experts they _are_.
> >> >>
> >> >> I really think that copying data through a raw pointer to or from a
> >> >> place that experiences data races, should _not_ be UB if the data is not
> >> >> interpreted in any way, other than moving it.
> >> >>
> >> >>
> >> >> Best regards,
> >> >> Andreas Hindborg
> >> >
> >> > We need to make progress on this series, and it's starting to get late
> >> > in the cycle. I suggest we:
> >>
> >> There is always another cycle.
> >>
> >> >
> >> > 1. Delete as_slice, as_slice_mut, write, and skip_drop.
> >> > 2. Change field_read/field_write to use a volatile read/write.
> >>
> >> Volatile reads/writes that race are OK?
> >
> > I will not give a blanket yes to that. If you read their docs, you
> > will find that they claim to not allow it. But they are the correct
> > choice for DMA memory, and there's no way in practice to get
> > miscompilations on memory locations that are only accessed with
> > volatile operations, and never have references to them created.
> >
> > In general, this will fall into the exception that we've been given
> > from the Rust people. In cases such as this where the Rust language
> > does not give us the operation we want, do it like you do in C. Since
> > Rust uses LLVM which does not miscompile the C part of the kernel, it
> > should not miscompile the Rust part either.
> 
> This exception we got for `core::ptr::{read,write}_volatile`, did we
> document that somewhere?
> 

[Cc Ralf, comex and LKMM list]

Some related discussions:

* https://github.com/rust-lang/unsafe-code-guidelines/issues/476
* https://github.com/rust-lang/unsafe-code-guidelines/issues/348#issuecomment-1221376388
  
  particularly Ralf's comment on comex's message:

  """
  @comex

  > First, keep in mind that you could simply transliterate the C
  > versions of READ_ONCE/WRITE_ONCE, barriers, etc. directly to Rust,
  > using ptr::read_volatile/ptr::write_volatile in place of C volatile
  > loads and stores, and asm! in place of C asm blocks. If you do,
  > you'll end up with the same LLVM IR instructions (or GCC equivalent
  > with rustc_codegen_gcc), which will get passed to the same
  > optimizer, and which ultimately will work or not work to the same
  > extent as the C versions.

  Indeed I think that is probably the best approach.
  """

* A LONG thread of the discussion:

  https://rust-lang.zulipchat.com/#narrow/channel/136281-t-opsem/topic/UB.20caused.20by.20races.20on.20.60.7Bread.2Cwrite.7D_volatile.60/near/399343771

In general, the rationale is if Rust code could generate the same LLVM
IR as C code, then if it's not data race per LKMM, then it's not treated
as data race in Rust as well. But this is not a "get-out-of-UB" free
card IMO:

* If both sides of the racing are Rust code, we should avoid using
  {read,write}_volatile(), and use proper synchronization.

* If atomicity is also required, we should use Atomic::from_ptr()
  instead of {read,write}_volatile().

> I feel slightly lost when trying to figure out what fits under this
> exception and what is UB. I think that fist step to making this more
> straight forward is having clear documentation.
> 

I agree, and I'm happy to help on this.

> For cases where we need to do the equivalent of `memmove`/`memcpy`, what
> are is our options?
> 

Seems we need "volatile" memmove and memcpy in Rust?

> In case we have no options, do you know who would be the right people on
> the Rust Project side to contact about getting an exception for this
> case?
> 

I will say it'll be t-opsem.

Regards,
Boqun

> 
> Best regards,
> Andreas Hindborg
> 
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-03 13:13         ` Alice Ryhl
  2025-03-03 15:21           ` Andreas Hindborg
@ 2025-03-04  8:28           ` Abdiel Janulgue
  1 sibling, 0 replies; 70+ messages in thread
From: Abdiel Janulgue @ 2025-03-04  8:28 UTC (permalink / raw)
  To: Alice Ryhl, Andreas Hindborg
  Cc: Daniel Almeida, Benno Lossin, dakr, robin.murphy, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Trevor Gross, Valentin Obst, linux-kernel,
	Christoph Hellwig, Marek Szyprowski, airlied, iommu



On 03/03/2025 15:13, Alice Ryhl wrote:
> On Mon, Mar 3, 2025 at 2:00 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Daniel Almeida" <daniel.almeida@collabora.com> writes:
>>
>>> Hi Benno,
>>>
>>
>> [...]
>>
>>>>> +    /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
>>>>> +    /// number of bytes.
>>>>> +    ///
>>>>> +    /// # Examples
>>>>> +    ///
>>>>> +    /// ```
>>>>> +    /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
>>>>> +    /// let somedata: [u8; 4] = [0xf; 4];
>>>>> +    /// let buf: &[u8] = &somedata;
>>>>> +    /// alloc.write(buf, 0)?;
>>>>> +    /// # Ok::<(), Error>(()) }
>>>>> +    /// ```
>>>>> +    pub fn write(&self, src: &[T], offset: usize) -> Result {
>>>>> +        let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
>>>>> +        if end >= self.count {
>>>>> +            return Err(EINVAL);
>>>>> +        }
>>>>> +        // SAFETY:
>>>>> +        // - The pointer is valid due to type invariant on `CoherentAllocation`
>>>>> +        // and we've just checked that the range and index is within bounds.
>>>>> +        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
>>>>> +        // that `self.count` won't overflow early in the constructor.
>>>>> +        unsafe {
>>>>> +            core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
>>>>
>>>> Why are there no concurrent write or read operations on `cpu_addr`?
>>>
>>> Sorry, can you rephrase this question?
>>
>> This write is suffering the same complications as discussed here [1].
>> There are multiple issues with this implementation.
>>
>> 1) `write` takes a shared reference and thus may be called concurrently.
>> There is no synchronization, so `copy_nonoverlapping` could be called
>> concurrently on the same address. The safety requirements for
>> `copy_nonoverlapping` state that the destination must be valid for
>> write. Alice claims in [1] that any memory area that experience data
>> races are not valid for writes. So the safety requirement of
>> `copy_nonoverlapping` is violated and this call is potential UB.
>>
>> 2) The destination of this write is DMA memory. It could be concurrently
>> modified by hardware, leading to the same issues as 1). Thus the
>> function cannot be safe if we cannot guarantee hardware will not write
>> to the region while this function is executing.
>>
>> Now, I don't think that these _should_ be issues, but according to our
>> Rust language experts they _are_.
>>
>> I really think that copying data through a raw pointer to or from a
>> place that experiences data races, should _not_ be UB if the data is not
>> interpreted in any way, other than moving it.
>>
>>
>> Best regards,
>> Andreas Hindborg
> 
> We need to make progress on this series, and it's starting to get late
> in the cycle. I suggest we:
> 
> 1. Delete as_slice, as_slice_mut, write, and skip_drop.
> 2. Change field_read/field_write to use a volatile read/write.
> 
> This will let us make progress now and sidestep this discussion. The
> deleted methods can happen in a follow-up.
> 
> Similarly for the dma mask methods, let's either drop them to a
> follow-up patch or just put them anywhere and move them later.
> 
> Alice

Thanks Alice. Yeah, will follow-up with those other patches and move 
forward with the basic implementation for now.

Regards,
Abdiel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-03 11:30   ` Andreas Hindborg
@ 2025-03-04  8:58     ` Abdiel Janulgue
  0 siblings, 0 replies; 70+ messages in thread
From: Abdiel Janulgue @ 2025-03-04  8:58 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, Valentin Obst,
	linux-kernel, Christoph Hellwig, Marek Szyprowski, airlied, iommu



On 03/03/2025 13:30, Andreas Hindborg wrote:
> "Abdiel Janulgue" <abdiel.janulgue@gmail.com> writes:
> 
>> Add a simple dma coherent allocator rust abstraction. Based on
>> Andreas Hindborg's dma abstractions from the rnvme driver, which
>> was also based on earlier work by Wedson Almeida Filho.
>>
>> Nacked-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
> [...]
> 
>> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
>> index 0640b7e115be..8f3808c8b7fe 100644
>> --- a/rust/helpers/helpers.c
>> +++ b/rust/helpers/helpers.c
>> @@ -13,6 +13,7 @@
>>   #include "build_bug.c"
>>   #include "cred.c"
>>   #include "device.c"
>> +#include "dma.c"
>>   #include "err.c"
>>   #include "fs.c"
>>   #include "io.c"
>> diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
>> new file mode 100644
>> index 000000000000..b4dd5d411711
>> --- /dev/null
>> +++ b/rust/kernel/dma.rs
>> @@ -0,0 +1,421 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Direct memory access (DMA).
>> +//!
>> +//! C header: [`include/linux/dma-mapping.h`](srctree/include/linux/dma-mapping.h)
>> +
>> +use crate::{
>> +    bindings, build_assert,
>> +    device::Device,
>> +    error::code::*,
>> +    error::Result,
>> +    transmute::{AsBytes, FromBytes},
>> +    types::ARef,
>> +};
>> +
>> +/// Inform the kernel about the device's DMA addressing capabilities. This will set the mask for
>> +/// both streaming and coherent APIs together.
>> +pub fn dma_set_mask_and_coherent(dev: &Device, mask: u64) -> i32 {
>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
>> +    unsafe { bindings::dma_set_mask_and_coherent(dev.as_raw(), mask) }
>> +}
>> +
>> +/// Same as `dma_set_mask_and_coherent`, but set the mask only for streaming mappings.
>> +pub fn dma_set_mask(dev: &Device, mask: u64) -> i32 {
>> +    // SAFETY: device pointer is guaranteed as valid by invariant on `Device`.
>> +    unsafe { bindings::dma_set_mask(dev.as_raw(), mask) }
>> +}
> 
> I'm rebasing some of the dma pool code I'm using for NVMe on top of
> these patches, and I notice that these methods in the original code from
> way back (besides being on Device) has these methods return `Result`:
> 
>      pub fn dma_set_mask(&self, mask: u64) -> Result {
>          let dev = self.as_raw();
>          let ret = unsafe { bindings::dma_set_mask(dev as _, mask) };
>          if ret != 0 {
>              Err(Error::from_errno(ret))
>          } else {
>              Ok(())
>          }
>      }
> 
> Is there a reason for not returning a `Result` in this series?
> 

Hi Andreas, the original dma_set_mask function got lost to me for some 
reason. But yes, this is indeed a better approach!

Regards,
Abdiel



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-03 20:08                 ` Boqun Feng
@ 2025-03-04 19:03                   ` Ralf Jung
  2025-03-04 20:18                     ` comex
  0 siblings, 1 reply; 70+ messages in thread
From: Ralf Jung @ 2025-03-04 19:03 UTC (permalink / raw)
  To: Boqun Feng, Andreas Hindborg
  Cc: Alice Ryhl, Daniel Almeida, Benno Lossin, Abdiel Janulgue, dakr,
	robin.murphy, rust-for-linux, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Trevor Gross, Valentin Obst, linux-kernel,
	Christoph Hellwig, Marek Szyprowski, airlied, iommu, comex, lkmm

Hi all,

Ah, the never-ending LKMM vs Rust concurrency story. :)

> In general, this will fall into the exception that we've been given
> from the Rust people. In cases such as this where the Rust language
> does not give us the operation we want, do it like you do in C. Since
> Rust uses LLVM which does not miscompile the C part of the kernel, it
> should not miscompile the Rust part either.

Right, so to be clear this is all a compromise. :)  We don't want Rust to have 
two official memory models. We certainly don't want random Rust programs out 
there experimenting with any non-standard memory models and then expect that to 
work. (They can't all have a Paul McKenny keeping things together. ;) That's why 
in all official docs, you'll only find the one Rust concurrency memory model, 
which is the C++ model.
But meanwhile, the kernel is not some random Rust program out there (and you do 
have Paul ;). The LKMM works quite well in practice, and it is a critical part 
for Rust-for-Linux. Also, we have pretty good contacts with the Rust-for-Linux 
folks, so when there are issues we can just talk.

Given that the LKMM is at the edge (or, arguably, somewhat beyond the edge) of 
what compiler backends can reliably support, I think the lowest-risk strategy is 
to make LKMM usage from Rust look to the compiler just like LKMM usage in C. 
Since it works in C, it should then hopefully also work in Rust. To do this, you 
have to bend (really, break) some of the usual rules for Rust, and that's the 
special exception mentioned above.

>> For cases where we need to do the equivalent of `memmove`/`memcpy`, what
>> are is our options?
>>
> 
> Seems we need "volatile" memmove and memcpy in Rust?

Those already exist in Rust, albeit only unstably: 
<https://doc.rust-lang.org/nightly/std/intrinsics/fn.volatile_copy_memory.html>. 
However, I am not sure how you'd even generate such a call in C? The standard 
memcpy function is not doing volatile accesses, to my knowledge.

Kind regards,
Ralf

>> In case we have no options, do you know who would be the right people on
>> the Rust Project side to contact about getting an exception for this
>> case?
>>
> 
> I will say it'll be t-opsem.
> 
> Regards,
> Boqun
> 
>>
>> Best regards,
>> Andreas Hindborg
>>
>>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-04 19:03                   ` Ralf Jung
@ 2025-03-04 20:18                     ` comex
  2025-03-05  3:24                       ` Boqun Feng
  0 siblings, 1 reply; 70+ messages in thread
From: comex @ 2025-03-04 20:18 UTC (permalink / raw)
  To: Ralf Jung
  Cc: Boqun Feng, Andreas Hindborg, Alice Ryhl, Daniel Almeida,
	Benno Lossin, Abdiel Janulgue, dakr, robin.murphy, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
	Trevor Gross, Valentin Obst, linux-kernel, Christoph Hellwig,
	Marek Szyprowski, airlied, iommu, lkmm

> On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
> 
> Those already exist in Rust, albeit only unstably: <https://doc.rust-lang.org/nightly/std/intrinsics/fn.volatile_copy_memory.html>. However, I am not sure how you'd even generate such a call in C? The standard memcpy function is not doing volatile accesses, to my knowledge.

The actual memcpy symbol that exists at runtime is written in assembly, and should be valid to treat as performing volatile accesses.

But both GCC and Clang special-case the memcpy function.  For example, if you call memcpy with a small constant as the size, the optimizer will transform the call into one or more regular loads/stores, which can then be optimized mostly like any other loads/stores (except for opting out of alignment and type-based aliasing assumptions).  Even if the call isn’t transformed, the optimizer will still make assumptions.  LLVM will automatically mark memcpy `nosync`, which makes it undefined behavior if the function “communicate[s] (synchronize[s]) with another thread”, including through “volatile accesses”. [1]

However, these optimizations should rarely trigger misbehavior in practice, so I wouldn’t be surprised if Linux had some code that expected memcpy to act volatile…

But I’m not familiar enough with the codebase to know whether such code actually exists, or where.

(Incidentally, there is a compiler flag to turn the memcpy special-casing off, -fno-builtin.  I pretty much expected that Linux used it.  But I just checked, and it doesn’t.)

For Rust, I don’t know why we haven’t exposed volatile_copy_memory yet.  All I can find are some years-old discussions with no obvious blockers.  I guess nobody has cared enough.  There is also a somewhat stagnant RFC for *atomic* memcpy. [2]

[1] https://llvm.org/docs/LangRef.html, search for 'nosync'
[2] https://github.com/rust-lang/rfcs/pull/3301

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-04 20:18                     ` comex
@ 2025-03-05  3:24                       ` Boqun Feng
  2025-03-05 13:10                         ` Ralf Jung
  0 siblings, 1 reply; 70+ messages in thread
From: Boqun Feng @ 2025-03-05  3:24 UTC (permalink / raw)
  To: comex
  Cc: Ralf Jung, Andreas Hindborg, Alice Ryhl, Daniel Almeida,
	Benno Lossin, Abdiel Janulgue, dakr, robin.murphy, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
	Trevor Gross, Valentin Obst, linux-kernel, Christoph Hellwig,
	Marek Szyprowski, airlied, iommu, lkmm

On Tue, Mar 04, 2025 at 12:18:28PM -0800, comex wrote:
> 
> > On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
> > 
> > Those already exist in Rust, albeit only unstably:
> > <https://doc.rust-lang.org/nightly/std/intrinsics/fn.volatile_copy_memory.html>.
> > However, I am not sure how you'd even generate such a call in C? The
> > standard memcpy function is not doing volatile accesses, to my
> > knowledge.
> 
> The actual memcpy symbol that exists at runtime is written in
> assembly, and should be valid to treat as performing volatile
> accesses.
> 
> But both GCC and Clang special-case the memcpy function.  For example,
> if you call memcpy with a small constant as the size, the optimizer
> will transform the call into one or more regular loads/stores, which
> can then be optimized mostly like any other loads/stores (except for
> opting out of alignment and type-based aliasing assumptions).  Even if
> the call isn’t transformed, the optimizer will still make assumptions.
> LLVM will automatically mark memcpy `nosync`, which makes it undefined
> behavior if the function “communicate[s] (synchronize[s]) with another
> thread”, including through “volatile accesses”. [1]
> 
> However, these optimizations should rarely trigger misbehavior in
> practice, so I wouldn’t be surprised if Linux had some code that
> expected memcpy to act volatile…
> 

Also in this particular case we are discussing [1], it's a memcpy (from
or to) a DMA buffer, which means the device can also read or write the
memory, therefore the content of the memory may be altered outside the
program (the kernel), so we cannot use copy_nonoverlapping() I believe.

[1]: https://lore.kernel.org/rust-for-linux/87bjuil15w.fsf@kernel.org/

Regards,
Boqun

> But I’m not familiar enough with the codebase to know whether such
> code actually exists, or where.
> 
> (Incidentally, there is a compiler flag to turn the memcpy
> special-casing off, -fno-builtin.  I pretty much expected that Linux
> used it.  But I just checked, and it doesn’t.)
> 
> For Rust, I don’t know why we haven’t exposed volatile_copy_memory
> yet.  All I can find are some years-old discussions with no obvious
> blockers.  I guess nobody has cared enough.  There is also a somewhat
> stagnant RFC for *atomic* memcpy. [2]
> 
> [1] https://llvm.org/docs/LangRef.html, search for 'nosync'
> [2] https://github.com/rust-lang/rfcs/pull/3301
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05  3:24                       ` Boqun Feng
@ 2025-03-05 13:10                         ` Ralf Jung
  2025-03-05 13:23                           ` Alice Ryhl
                                             ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Ralf Jung @ 2025-03-05 13:10 UTC (permalink / raw)
  To: Boqun Feng, comex
  Cc: Andreas Hindborg, Alice Ryhl, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

Hi,

On 05.03.25 04:24, Boqun Feng wrote:
> On Tue, Mar 04, 2025 at 12:18:28PM -0800, comex wrote:
>>
>>> On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
>>>
>>> Those already exist in Rust, albeit only unstably:
>>> <https://doc.rust-lang.org/nightly/std/intrinsics/fn.volatile_copy_memory.html>.
>>> However, I am not sure how you'd even generate such a call in C? The
>>> standard memcpy function is not doing volatile accesses, to my
>>> knowledge.
>>
>> The actual memcpy symbol that exists at runtime is written in
>> assembly, and should be valid to treat as performing volatile
>> accesses.

memcpy is often written in C... and AFAIK compilers understand what that 
function does and will, for instance, happily eliminate the call if they can 
prove that the destination memory is not being read from again. So, it doesn't 
behave like a volatile access at all.

>> But both GCC and Clang special-case the memcpy function.  For example,
>> if you call memcpy with a small constant as the size, the optimizer
>> will transform the call into one or more regular loads/stores, which
>> can then be optimized mostly like any other loads/stores (except for
>> opting out of alignment and type-based aliasing assumptions).  Even if
>> the call isn’t transformed, the optimizer will still make assumptions.
>> LLVM will automatically mark memcpy `nosync`, which makes it undefined
>> behavior if the function “communicate[s] (synchronize[s]) with another
>> thread”, including through “volatile accesses”. [1]

The question is more,  what do clang and GCC document / guarantee in a stable 
way regarding memcpy? I have not seen any indication so far that a memcpy call 
would ever be considered volatile, so we have to treat it like a non-volatile 
non-atomic operation.

>> However, these optimizations should rarely trigger misbehavior in
>> practice, so I wouldn’t be surprised if Linux had some code that
>> expected memcpy to act volatile…
>>
> 
> Also in this particular case we are discussing [1], it's a memcpy (from
> or to) a DMA buffer, which means the device can also read or write the
> memory, therefore the content of the memory may be altered outside the
> program (the kernel), so we cannot use copy_nonoverlapping() I believe.
> 
> [1]: https://lore.kernel.org/rust-for-linux/87bjuil15w.fsf@kernel.org/

Is there actually a potential for races (with reads by hardware, not other 
threads) on the memcpy'd memory? Or is this the pattern where you copy some data 
somewhere and then set a flag in an MMIO register to indicate that the data is 
ready and the device can start reading it? In the latter case, the actual data 
copy does not race with anything, so it can be a regular non-atomic non-volatile 
memcpy. The flag write *should* be a release write, and release volatile writes 
do not exist, so that is a problem, but it's a separate problem from volatile 
memcpy. One can use a release fence followed by a relaxed write instead. 
Volatile writes do not currently act like relaxed writes, but you need that 
anyway for WRITE_ONCE to make sense so it seems fine to rely on that here as well.

Rust should have atomic volatile accesses, and various ideas have been proposed 
over the years, but sadly nobody has shown up to try and push this through.

If the memcpy itself can indeed race, you need an atomic volatile memcpy -- 
which neither C nor Rust have, though there are proposals for atomic memcpy (and 
arguably, there should be a way to interact with a device using non-volatile 
atomics... but anyway in the LKMM, atomics are modeled with volatile, so things 
are even more entangled than usual ;).

Kind regards,
Ralf

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 13:10                         ` Ralf Jung
@ 2025-03-05 13:23                           ` Alice Ryhl
  2025-03-05 13:27                             ` Ralf Jung
  2025-03-05 18:41                             ` Andreas Hindborg
  2025-03-05 14:25                           ` Daniel Almeida
  2025-03-05 18:38                           ` Andreas Hindborg
  2 siblings, 2 replies; 70+ messages in thread
From: Alice Ryhl @ 2025-03-05 13:23 UTC (permalink / raw)
  To: Ralf Jung
  Cc: Boqun Feng, comex, Andreas Hindborg, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

On Wed, Mar 5, 2025 at 2:10 PM Ralf Jung <post@ralfj.de> wrote:
>
> Hi,
>
> On 05.03.25 04:24, Boqun Feng wrote:
> > On Tue, Mar 04, 2025 at 12:18:28PM -0800, comex wrote:
> >>
> >>> On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
> >> However, these optimizations should rarely trigger misbehavior in
> >> practice, so I wouldn’t be surprised if Linux had some code that
> >> expected memcpy to act volatile…
> >>
> >
> > Also in this particular case we are discussing [1], it's a memcpy (from
> > or to) a DMA buffer, which means the device can also read or write the
> > memory, therefore the content of the memory may be altered outside the
> > program (the kernel), so we cannot use copy_nonoverlapping() I believe.
> >
> > [1]: https://lore.kernel.org/rust-for-linux/87bjuil15w.fsf@kernel.org/
>
> Is there actually a potential for races (with reads by hardware, not other
> threads) on the memcpy'd memory? Or is this the pattern where you copy some data
> somewhere and then set a flag in an MMIO register to indicate that the data is
> ready and the device can start reading it? In the latter case, the actual data
> copy does not race with anything, so it can be a regular non-atomic non-volatile
> memcpy. The flag write *should* be a release write, and release volatile writes
> do not exist, so that is a problem, but it's a separate problem from volatile
> memcpy. One can use a release fence followed by a relaxed write instead.
> Volatile writes do not currently act like relaxed writes, but you need that
> anyway for WRITE_ONCE to make sense so it seems fine to rely on that here as well.
>
> Rust should have atomic volatile accesses, and various ideas have been proposed
> over the years, but sadly nobody has shown up to try and push this through.
>
> If the memcpy itself can indeed race, you need an atomic volatile memcpy --
> which neither C nor Rust have, though there are proposals for atomic memcpy (and
> arguably, there should be a way to interact with a device using non-volatile
> atomics... but anyway in the LKMM, atomics are modeled with volatile, so things
> are even more entangled than usual ;).

For some kinds of hardware, we might not want to trust the hardware.
I.e., there is no race under normal operation, but the hardware could
have a bug or be malicious and we might not want that to result in UB.
This is pretty similar to syscalls that take a pointer into userspace
memory and read it - userspace shouldn't modify that memory during the
syscall, but it can and if it does, that should be well-defined.
(Though in the case of userspace, the copy happens in asm since it
also needs to deal with virtual memory and so on.)

Another thing is that it can be pretty inconvenient if writing to the
DMA memory has to take &mut self. We might need to write to disjoint
regions in parallel, but ownership-wise it behaves like a big Vec<u8>.
Being able to have a &self method for writing is just a lot more
convenient API-wise.

Alice

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 13:23                           ` Alice Ryhl
@ 2025-03-05 13:27                             ` Ralf Jung
  2025-03-05 14:40                               ` Robin Murphy
  2025-03-05 18:43                               ` Andreas Hindborg
  2025-03-05 18:41                             ` Andreas Hindborg
  1 sibling, 2 replies; 70+ messages in thread
From: Ralf Jung @ 2025-03-05 13:27 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Boqun Feng, comex, Andreas Hindborg, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

Hi,

On 05.03.25 14:23, Alice Ryhl wrote:
> On Wed, Mar 5, 2025 at 2:10 PM Ralf Jung <post@ralfj.de> wrote:
>>
>> Hi,
>>
>> On 05.03.25 04:24, Boqun Feng wrote:
>>> On Tue, Mar 04, 2025 at 12:18:28PM -0800, comex wrote:
>>>>
>>>>> On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
>>>> However, these optimizations should rarely trigger misbehavior in
>>>> practice, so I wouldn’t be surprised if Linux had some code that
>>>> expected memcpy to act volatile…
>>>>
>>>
>>> Also in this particular case we are discussing [1], it's a memcpy (from
>>> or to) a DMA buffer, which means the device can also read or write the
>>> memory, therefore the content of the memory may be altered outside the
>>> program (the kernel), so we cannot use copy_nonoverlapping() I believe.
>>>
>>> [1]: https://lore.kernel.org/rust-for-linux/87bjuil15w.fsf@kernel.org/
>>
>> Is there actually a potential for races (with reads by hardware, not other
>> threads) on the memcpy'd memory? Or is this the pattern where you copy some data
>> somewhere and then set a flag in an MMIO register to indicate that the data is
>> ready and the device can start reading it? In the latter case, the actual data
>> copy does not race with anything, so it can be a regular non-atomic non-volatile
>> memcpy. The flag write *should* be a release write, and release volatile writes
>> do not exist, so that is a problem, but it's a separate problem from volatile
>> memcpy. One can use a release fence followed by a relaxed write instead.
>> Volatile writes do not currently act like relaxed writes, but you need that
>> anyway for WRITE_ONCE to make sense so it seems fine to rely on that here as well.
>>
>> Rust should have atomic volatile accesses, and various ideas have been proposed
>> over the years, but sadly nobody has shown up to try and push this through.
>>
>> If the memcpy itself can indeed race, you need an atomic volatile memcpy --
>> which neither C nor Rust have, though there are proposals for atomic memcpy (and
>> arguably, there should be a way to interact with a device using non-volatile
>> atomics... but anyway in the LKMM, atomics are modeled with volatile, so things
>> are even more entangled than usual ;).
> 
> For some kinds of hardware, we might not want to trust the hardware.
> I.e., there is no race under normal operation, but the hardware could
> have a bug or be malicious and we might not want that to result in UB.
> This is pretty similar to syscalls that take a pointer into userspace
> memory and read it - userspace shouldn't modify that memory during the
> syscall, but it can and if it does, that should be well-defined.
> (Though in the case of userspace, the copy happens in asm since it
> also needs to deal with virtual memory and so on.)

Wow you are really doing your best to combine all the hard problems at the same 
time. ;)
Sharing memory with untrusted parties is another tricky issue, and even leaving 
aside all the theoretical trouble, practically speaking you'll want to 
exclusively use atomic accesses to interact with such memory. So doing this 
properly requires atomic memcpy. I don't know what that is blocked on, but it is 
good to know that it would help the kernel.

Kind regards,
Ralf


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 13:10                         ` Ralf Jung
  2025-03-05 13:23                           ` Alice Ryhl
@ 2025-03-05 14:25                           ` Daniel Almeida
  2025-03-05 18:38                           ` Andreas Hindborg
  2 siblings, 0 replies; 70+ messages in thread
From: Daniel Almeida @ 2025-03-05 14:25 UTC (permalink / raw)
  To: Ralf Jung
  Cc: Boqun Feng, comex, Andreas Hindborg, Alice Ryhl, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

Hi Rafl,

> On 5 Mar 2025, at 10:10, Ralf Jung <post@ralfj.de> wrote:
> 
> Hi,
> 
> On 05.03.25 04:24, Boqun Feng wrote:
>> On Tue, Mar 04, 2025 at 12:18:28PM -0800, comex wrote:
>>> 
>>>> On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
>>>> 
>>>> Those already exist in Rust, albeit only unstably:
>>>> <https://doc.rust-lang.org/nightly/std/intrinsics/fn.volatile_copy_memory.html>.
>>>> However, I am not sure how you'd even generate such a call in C? The
>>>> standard memcpy function is not doing volatile accesses, to my
>>>> knowledge.
>>> 
>>> The actual memcpy symbol that exists at runtime is written in
>>> assembly, and should be valid to treat as performing volatile
>>> accesses.
> 
> memcpy is often written in C... and AFAIK compilers understand what that function does and will, for instance, happily eliminate the call if they can prove that the destination memory is not being read from again. So, it doesn't behave like a volatile access at all.
> 
>>> But both GCC and Clang special-case the memcpy function.  For example,
>>> if you call memcpy with a small constant as the size, the optimizer
>>> will transform the call into one or more regular loads/stores, which
>>> can then be optimized mostly like any other loads/stores (except for
>>> opting out of alignment and type-based aliasing assumptions).  Even if
>>> the call isn’t transformed, the optimizer will still make assumptions.
>>> LLVM will automatically mark memcpy `nosync`, which makes it undefined
>>> behavior if the function “communicate[s] (synchronize[s]) with another
>>> thread”, including through “volatile accesses”. [1]
> 
> The question is more,  what do clang and GCC document / guarantee in a stable way regarding memcpy? I have not seen any indication so far that a memcpy call would ever be considered volatile, so we have to treat it like a non-volatile non-atomic operation.
> 
>>> However, these optimizations should rarely trigger misbehavior in
>>> practice, so I wouldn’t be surprised if Linux had some code that
>>> expected memcpy to act volatile…
>>> 
>> Also in this particular case we are discussing [1], it's a memcpy (from
>> or to) a DMA buffer, which means the device can also read or write the
>> memory, therefore the content of the memory may be altered outside the
>> program (the kernel), so we cannot use copy_nonoverlapping() I believe.
>> [1]: https://lore.kernel.org/rust-for-linux/87bjuil15w.fsf@kernel.org/
> 
> Is there actually a potential for races (with reads by hardware, not other threads) on the memcpy'd memory? Or is this the pattern where you copy some data somewhere and then set a flag in an MMIO register to indicate that the data is ready and the device can start reading it? In the latter case, the actual data copy does not race with anything, so it can be a

This is device-specific.

e.g.: for video codecs, if you don’t set a bit that starts the decode or
encode process for the current frame, everything remains as-is, i.e.: untouched
by the hardware. I can’t vouch for all other devices, and people have
already chimed in to say this is not necessarily the case for some of them.

> regular non-atomic non-volatile memcpy. The flag write *should* be a release write, and release volatile writes do not exist, so that is a problem, but it's a separate problem from volatile memcpy. One can use a release fence followed by a relaxed write instead. Volatile writes do not currently act like relaxed writes, but you need that anyway for WRITE_ONCE to make sense so it seems fine to rely on that here as well.
> 
> Rust should have atomic volatile accesses, and various ideas have been proposed over the years, but sadly nobody has shown up to try and push this through.
> 
> If the memcpy itself can indeed race, you need an atomic volatile memcpy -- which neither C nor Rust have, though there are proposals for atomic memcpy (and arguably, there should be a way to interact with a device using non-volatile atomics... but anyway in the LKMM, atomics are modeled with volatile, so things are even more entangled than usual ;).
> 
> Kind regards,
> Ralf
> 

— Daniel


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 13:27                             ` Ralf Jung
@ 2025-03-05 14:40                               ` Robin Murphy
  2025-03-05 18:43                               ` Andreas Hindborg
  1 sibling, 0 replies; 70+ messages in thread
From: Robin Murphy @ 2025-03-05 14:40 UTC (permalink / raw)
  To: Ralf Jung, Alice Ryhl
  Cc: Boqun Feng, comex, Andreas Hindborg, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Gary Guo, Björn Roy Baron, Trevor Gross, Valentin Obst,
	linux-kernel, Christoph Hellwig, Marek Szyprowski, airlied, iommu,
	lkmm

On 05/03/2025 1:27 pm, Ralf Jung wrote:
> Hi,
> 
> On 05.03.25 14:23, Alice Ryhl wrote:
>> On Wed, Mar 5, 2025 at 2:10 PM Ralf Jung <post@ralfj.de> wrote:
>>>
>>> Hi,
>>>
>>> On 05.03.25 04:24, Boqun Feng wrote:
>>>> On Tue, Mar 04, 2025 at 12:18:28PM -0800, comex wrote:
>>>>>
>>>>>> On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
>>>>> However, these optimizations should rarely trigger misbehavior in
>>>>> practice, so I wouldn’t be surprised if Linux had some code that
>>>>> expected memcpy to act volatile…
>>>>>
>>>>
>>>> Also in this particular case we are discussing [1], it's a memcpy (from
>>>> or to) a DMA buffer, which means the device can also read or write the
>>>> memory, therefore the content of the memory may be altered outside the
>>>> program (the kernel), so we cannot use copy_nonoverlapping() I believe.
>>>>
>>>> [1]: https://lore.kernel.org/rust-for-linux/87bjuil15w.fsf@kernel.org/
>>>
>>> Is there actually a potential for races (with reads by hardware, not 
>>> other
>>> threads) on the memcpy'd memory? Or is this the pattern where you 
>>> copy some data
>>> somewhere and then set a flag in an MMIO register to indicate that 
>>> the data is
>>> ready and the device can start reading it? In the latter case, the 
>>> actual data
>>> copy does not race with anything, so it can be a regular non-atomic 
>>> non-volatile
>>> memcpy. The flag write *should* be a release write, and release 
>>> volatile writes
>>> do not exist, so that is a problem, but it's a separate problem from 
>>> volatile
>>> memcpy. One can use a release fence followed by a relaxed write instead.
>>> Volatile writes do not currently act like relaxed writes, but you 
>>> need that
>>> anyway for WRITE_ONCE to make sense so it seems fine to rely on that 
>>> here as well.
>>>
>>> Rust should have atomic volatile accesses, and various ideas have 
>>> been proposed
>>> over the years, but sadly nobody has shown up to try and push this 
>>> through.
>>>
>>> If the memcpy itself can indeed race, you need an atomic volatile 
>>> memcpy --
>>> which neither C nor Rust have, though there are proposals for atomic 
>>> memcpy (and
>>> arguably, there should be a way to interact with a device using 
>>> non-volatile
>>> atomics... but anyway in the LKMM, atomics are modeled with volatile, 
>>> so things
>>> are even more entangled than usual ;).
>>
>> For some kinds of hardware, we might not want to trust the hardware.
>> I.e., there is no race under normal operation, but the hardware could
>> have a bug or be malicious and we might not want that to result in UB.
>> This is pretty similar to syscalls that take a pointer into userspace
>> memory and read it - userspace shouldn't modify that memory during the
>> syscall, but it can and if it does, that should be well-defined.
>> (Though in the case of userspace, the copy happens in asm since it
>> also needs to deal with virtual memory and so on.)
> 
> Wow you are really doing your best to combine all the hard problems at 
> the same time. ;)
> Sharing memory with untrusted parties is another tricky issue, and even 
> leaving aside all the theoretical trouble, practically speaking you'll 
> want to exclusively use atomic accesses to interact with such memory. So 
> doing this properly requires atomic memcpy. I don't know what that is 
> blocked on, but it is good to know that it would help the kernel.

If you don't trust the device then I wouldn't think it actually matters 
what happens at this level - the higher-level driver is already going to 
have to carefully check and sanitise whatever data it reads back from 
the buffer before consuming it, at which point reading a torn value due 
to a race would be essentially indistinguishable from if the device had 
gone wrong and simply written that nonsense value itself.

I think the more significant case is when polling for the device to 
write back some kind of status word, where in C code the driver would 
use READ_ONCE() to ensure a single-copy-atomic read of the same size the 
device is going to write - sticking a regular memcpy() into the middle 
of that can't necessarily be trusted to work correctly (even if it may 
appear to 99% of the time).

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-02-24 11:49 ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
                     ` (6 preceding siblings ...)
  2025-03-03 13:08   ` Robin Murphy
@ 2025-03-05 17:41   ` Jason Gunthorpe
  2025-03-06 13:37     ` Danilo Krummrich
  7 siblings, 1 reply; 70+ messages in thread
From: Jason Gunthorpe @ 2025-03-05 17:41 UTC (permalink / raw)
  To: Abdiel Janulgue
  Cc: aliceryhl, dakr, robin.murphy, daniel.almeida, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS

On Mon, Feb 24, 2025 at 01:49:06PM +0200, Abdiel Janulgue wrote:

> +impl<T: AsBytes + FromBytes> Drop for CoherentAllocation<T> {
> +    fn drop(&mut self) {
> +        let size = self.count * core::mem::size_of::<T>();
> +        // SAFETY: the device, cpu address, and the dma handle is valid due to the
> +        // type invariants on `CoherentAllocation`.
> +        unsafe {
> +            bindings::dma_free_attrs(
> +                self.dev.as_raw(),
> +                size,
> +                self.cpu_addr as _,
> +                self.dma_handle,
> +                self.dma_attrs.as_raw(),
> +            )

I mentioned this in another thread..

There is an additional C API restriction here that the DMA API
functions may only be called by a driver after probe() starts and
before remove() completes. This applies to dma_free_attrs().

It is not enough that a refcount is held on device.

Otherwise the kernel may crash as the driver core allows resources
used by the DMA API to be changed once the driver is removed.

See the related discussion here, with an example of what the crash can
look like:

https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/#m0c7dda0fb5981240879c5ca489176987d688844c

 > a device with no driver bound should not be passed to the DMA API,
 > much less a dead device that's already been removed from its parent
 > bus.

My rust is non-existent, but I did not see anything about this
point.

Also note that any HW configured to do DMA must be halted before the
free is allowed otherwise it is a UAF bug. It is worth mentioning that
in the documentation.

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 13:10                         ` Ralf Jung
  2025-03-05 13:23                           ` Alice Ryhl
  2025-03-05 14:25                           ` Daniel Almeida
@ 2025-03-05 18:38                           ` Andreas Hindborg
  2025-03-05 22:01                             ` Ralf Jung
  2 siblings, 1 reply; 70+ messages in thread
From: Andreas Hindborg @ 2025-03-05 18:38 UTC (permalink / raw)
  To: Ralf Jung
  Cc: Boqun Feng, comex, Alice Ryhl, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

"Ralf Jung" <post@ralfj.de> writes:

> Hi,
>
> On 05.03.25 04:24, Boqun Feng wrote:
>> On Tue, Mar 04, 2025 at 12:18:28PM -0800, comex wrote:
>>>
>>>> On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
>>>>
>>>> Those already exist in Rust, albeit only unstably:
>>>> <https://doc.rust-lang.org/nightly/std/intrinsics/fn.volatile_copy_memory.html>.
>>>> However, I am not sure how you'd even generate such a call in C? The
>>>> standard memcpy function is not doing volatile accesses, to my
>>>> knowledge.
>>>
>>> The actual memcpy symbol that exists at runtime is written in
>>> assembly, and should be valid to treat as performing volatile
>>> accesses.
>
> memcpy is often written in C... and AFAIK compilers understand what that
> function does and will, for instance, happily eliminate the call if they can
> prove that the destination memory is not being read from again. So, it doesn't
> behave like a volatile access at all.
>
>>> But both GCC and Clang special-case the memcpy function.  For example,
>>> if you call memcpy with a small constant as the size, the optimizer
>>> will transform the call into one or more regular loads/stores, which
>>> can then be optimized mostly like any other loads/stores (except for
>>> opting out of alignment and type-based aliasing assumptions).  Even if
>>> the call isn’t transformed, the optimizer will still make assumptions.
>>> LLVM will automatically mark memcpy `nosync`, which makes it undefined
>>> behavior if the function “communicate[s] (synchronize[s]) with another
>>> thread”, including through “volatile accesses”. [1]
>
> The question is more,  what do clang and GCC document / guarantee in a stable
> way regarding memcpy? I have not seen any indication so far that a memcpy call
> would ever be considered volatile, so we have to treat it like a non-volatile
> non-atomic operation.
>
>>> However, these optimizations should rarely trigger misbehavior in
>>> practice, so I wouldn’t be surprised if Linux had some code that
>>> expected memcpy to act volatile…
>>>
>>
>> Also in this particular case we are discussing [1], it's a memcpy (from
>> or to) a DMA buffer, which means the device can also read or write the
>> memory, therefore the content of the memory may be altered outside the
>> program (the kernel), so we cannot use copy_nonoverlapping() I believe.
>>
>> [1]: https://lore.kernel.org/rust-for-linux/87bjuil15w.fsf@kernel.org/
>
> Is there actually a potential for races (with reads by hardware, not other
> threads) on the memcpy'd memory?

There is another use case for this: copying data to/from a page that is
mapped into user space. In this case, a user space process can
potentially modify the data in the mapped page while we are
reading/writing that data. This would be a misbehaved user space
process, but it should not be able to cause UB in the kernel anyway.

The C kernel just calls memcpy directly for this use case.

For this use case, we do not interpret or make control flow decisions
based on the data we read/write. And _if_ user space decides to do
concurrent writes to the page, we don't care if the data becomes
garbage. We just need the UB to be confined to the data moved from that
page, and not leak into the rest of the kernel.


Best regards,
Andreas Hindborg



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 13:23                           ` Alice Ryhl
  2025-03-05 13:27                             ` Ralf Jung
@ 2025-03-05 18:41                             ` Andreas Hindborg
  1 sibling, 0 replies; 70+ messages in thread
From: Andreas Hindborg @ 2025-03-05 18:41 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Ralf Jung, Boqun Feng, comex, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

"Alice Ryhl" <aliceryhl@google.com> writes:

> On Wed, Mar 5, 2025 at 2:10 PM Ralf Jung <post@ralfj.de> wrote:
>>
>> Hi,
>>
>> On 05.03.25 04:24, Boqun Feng wrote:
>> > On Tue, Mar 04, 2025 at 12:18:28PM -0800, comex wrote:
>> >>
>> >>> On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
>> >> However, these optimizations should rarely trigger misbehavior in
>> >> practice, so I wouldn’t be surprised if Linux had some code that
>> >> expected memcpy to act volatile…
>> >>
>> >
>> > Also in this particular case we are discussing [1], it's a memcpy (from
>> > or to) a DMA buffer, which means the device can also read or write the
>> > memory, therefore the content of the memory may be altered outside the
>> > program (the kernel), so we cannot use copy_nonoverlapping() I believe.
>> >
>> > [1]: https://lore.kernel.org/rust-for-linux/87bjuil15w.fsf@kernel.org/
>>
>> Is there actually a potential for races (with reads by hardware, not other
>> threads) on the memcpy'd memory? Or is this the pattern where you copy some data
>> somewhere and then set a flag in an MMIO register to indicate that the data is
>> ready and the device can start reading it? In the latter case, the actual data
>> copy does not race with anything, so it can be a regular non-atomic non-volatile
>> memcpy. The flag write *should* be a release write, and release volatile writes
>> do not exist, so that is a problem, but it's a separate problem from volatile
>> memcpy. One can use a release fence followed by a relaxed write instead.
>> Volatile writes do not currently act like relaxed writes, but you need that
>> anyway for WRITE_ONCE to make sense so it seems fine to rely on that here as well.
>>
>> Rust should have atomic volatile accesses, and various ideas have been proposed
>> over the years, but sadly nobody has shown up to try and push this through.
>>
>> If the memcpy itself can indeed race, you need an atomic volatile memcpy --
>> which neither C nor Rust have, though there are proposals for atomic memcpy (and
>> arguably, there should be a way to interact with a device using non-volatile
>> atomics... but anyway in the LKMM, atomics are modeled with volatile, so things
>> are even more entangled than usual ;).
>
> For some kinds of hardware, we might not want to trust the hardware.
> I.e., there is no race under normal operation, but the hardware could
> have a bug or be malicious and we might not want that to result in UB.
> This is pretty similar to syscalls that take a pointer into userspace
> memory and read it - userspace shouldn't modify that memory during the
> syscall, but it can and if it does, that should be well-defined.
> (Though in the case of userspace, the copy happens in asm since it
> also needs to deal with virtual memory and so on.)

Could you point me to this code? As mentioned in a parallel email in
this thread, zero copy file i/o has this property. User space pages are
mapped into the kernel and read from / written to. C just calls `memcpy`
for this.


Best regards,
Andreas Hindborg



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 13:27                             ` Ralf Jung
  2025-03-05 14:40                               ` Robin Murphy
@ 2025-03-05 18:43                               ` Andreas Hindborg
  2025-03-05 19:30                                 ` Alan Stern
  2025-03-05 19:42                                 ` Ralf Jung
  1 sibling, 2 replies; 70+ messages in thread
From: Andreas Hindborg @ 2025-03-05 18:43 UTC (permalink / raw)
  To: Ralf Jung
  Cc: Alice Ryhl, Boqun Feng, comex, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

"Ralf Jung" <post@ralfj.de> writes:

> Hi,
>
> On 05.03.25 14:23, Alice Ryhl wrote:
>> On Wed, Mar 5, 2025 at 2:10 PM Ralf Jung <post@ralfj.de> wrote:
>>>
>>> Hi,
>>>
>>> On 05.03.25 04:24, Boqun Feng wrote:
>>>> On Tue, Mar 04, 2025 at 12:18:28PM -0800, comex wrote:
>>>>>
>>>>>> On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
>>>>> However, these optimizations should rarely trigger misbehavior in
>>>>> practice, so I wouldn’t be surprised if Linux had some code that
>>>>> expected memcpy to act volatile…
>>>>>
>>>>
>>>> Also in this particular case we are discussing [1], it's a memcpy (from
>>>> or to) a DMA buffer, which means the device can also read or write the
>>>> memory, therefore the content of the memory may be altered outside the
>>>> program (the kernel), so we cannot use copy_nonoverlapping() I believe.
>>>>
>>>> [1]: https://lore.kernel.org/rust-for-linux/87bjuil15w.fsf@kernel.org/
>>>
>>> Is there actually a potential for races (with reads by hardware, not other
>>> threads) on the memcpy'd memory? Or is this the pattern where you copy some data
>>> somewhere and then set a flag in an MMIO register to indicate that the data is
>>> ready and the device can start reading it? In the latter case, the actual data
>>> copy does not race with anything, so it can be a regular non-atomic non-volatile
>>> memcpy. The flag write *should* be a release write, and release volatile writes
>>> do not exist, so that is a problem, but it's a separate problem from volatile
>>> memcpy. One can use a release fence followed by a relaxed write instead.
>>> Volatile writes do not currently act like relaxed writes, but you need that
>>> anyway for WRITE_ONCE to make sense so it seems fine to rely on that here as well.
>>>
>>> Rust should have atomic volatile accesses, and various ideas have been proposed
>>> over the years, but sadly nobody has shown up to try and push this through.
>>>
>>> If the memcpy itself can indeed race, you need an atomic volatile memcpy --
>>> which neither C nor Rust have, though there are proposals for atomic memcpy (and
>>> arguably, there should be a way to interact with a device using non-volatile
>>> atomics... but anyway in the LKMM, atomics are modeled with volatile, so things
>>> are even more entangled than usual ;).
>>
>> For some kinds of hardware, we might not want to trust the hardware.
>> I.e., there is no race under normal operation, but the hardware could
>> have a bug or be malicious and we might not want that to result in UB.
>> This is pretty similar to syscalls that take a pointer into userspace
>> memory and read it - userspace shouldn't modify that memory during the
>> syscall, but it can and if it does, that should be well-defined.
>> (Though in the case of userspace, the copy happens in asm since it
>> also needs to deal with virtual memory and so on.)
>
> Wow you are really doing your best to combine all the hard problems at the same
> time. ;)
> Sharing memory with untrusted parties is another tricky issue, and even leaving
> aside all the theoretical trouble, practically speaking you'll want to
> exclusively use atomic accesses to interact with such memory. So doing this
> properly requires atomic memcpy. I don't know what that is blocked on, but it is
> good to know that it would help the kernel.

I am sort of baffled by this, since the C kernel has no such thing and
has worked fine for a few years. Is it a property of Rust that causes us
to need atomic memcpy, or is what the C kernel is doing potentially dangerous?


Best regards,
Andreas Hindborg


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 18:43                               ` Andreas Hindborg
@ 2025-03-05 19:30                                 ` Alan Stern
  2025-03-05 19:42                                 ` Ralf Jung
  1 sibling, 0 replies; 70+ messages in thread
From: Alan Stern @ 2025-03-05 19:30 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Ralf Jung, Alice Ryhl, Boqun Feng, comex, Daniel Almeida,
	Benno Lossin, Abdiel Janulgue, dakr, robin.murphy, rust-for-linux,
	Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
	Trevor Gross, Valentin Obst, linux-kernel, Christoph Hellwig,
	Marek Szyprowski, airlied, iommu, lkmm

On Wed, Mar 05, 2025 at 07:43:59PM +0100, Andreas Hindborg wrote:
> "Ralf Jung" <post@ralfj.de> writes:
> 
> > Hi,
> >
> > On 05.03.25 14:23, Alice Ryhl wrote:
> >> On Wed, Mar 5, 2025 at 2:10 PM Ralf Jung <post@ralfj.de> wrote:
> >>>
> >>> Hi,
> >>>
> >>> On 05.03.25 04:24, Boqun Feng wrote:
> >>>> On Tue, Mar 04, 2025 at 12:18:28PM -0800, comex wrote:
> >>>>>
> >>>>>> On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
> >>>>> However, these optimizations should rarely trigger misbehavior in
> >>>>> practice, so I wouldn’t be surprised if Linux had some code that
> >>>>> expected memcpy to act volatile…
> >>>>>
> >>>>
> >>>> Also in this particular case we are discussing [1], it's a memcpy (from
> >>>> or to) a DMA buffer, which means the device can also read or write the
> >>>> memory, therefore the content of the memory may be altered outside the
> >>>> program (the kernel), so we cannot use copy_nonoverlapping() I believe.
> >>>>
> >>>> [1]: https://lore.kernel.org/rust-for-linux/87bjuil15w.fsf@kernel.org/
> >>>
> >>> Is there actually a potential for races (with reads by hardware, not other
> >>> threads) on the memcpy'd memory? Or is this the pattern where you copy some data
> >>> somewhere and then set a flag in an MMIO register to indicate that the data is
> >>> ready and the device can start reading it? In the latter case, the actual data
> >>> copy does not race with anything, so it can be a regular non-atomic non-volatile
> >>> memcpy. The flag write *should* be a release write, and release volatile writes
> >>> do not exist, so that is a problem, but it's a separate problem from volatile
> >>> memcpy. One can use a release fence followed by a relaxed write instead.
> >>> Volatile writes do not currently act like relaxed writes, but you need that
> >>> anyway for WRITE_ONCE to make sense so it seems fine to rely on that here as well.
> >>>
> >>> Rust should have atomic volatile accesses, and various ideas have been proposed
> >>> over the years, but sadly nobody has shown up to try and push this through.
> >>>
> >>> If the memcpy itself can indeed race, you need an atomic volatile memcpy --
> >>> which neither C nor Rust have, though there are proposals for atomic memcpy (and
> >>> arguably, there should be a way to interact with a device using non-volatile
> >>> atomics... but anyway in the LKMM, atomics are modeled with volatile, so things
> >>> are even more entangled than usual ;).
> >>
> >> For some kinds of hardware, we might not want to trust the hardware.
> >> I.e., there is no race under normal operation, but the hardware could
> >> have a bug or be malicious and we might not want that to result in UB.
> >> This is pretty similar to syscalls that take a pointer into userspace
> >> memory and read it - userspace shouldn't modify that memory during the
> >> syscall, but it can and if it does, that should be well-defined.
> >> (Though in the case of userspace, the copy happens in asm since it
> >> also needs to deal with virtual memory and so on.)
> >
> > Wow you are really doing your best to combine all the hard problems at the same
> > time. ;)
> > Sharing memory with untrusted parties is another tricky issue, and even leaving
> > aside all the theoretical trouble, practically speaking you'll want to
> > exclusively use atomic accesses to interact with such memory. So doing this
> > properly requires atomic memcpy. I don't know what that is blocked on, but it is
> > good to know that it would help the kernel.
> 
> I am sort of baffled by this, since the C kernel has no such thing and
> has worked fine for a few years. Is it a property of Rust that causes us
> to need atomic memcpy, or is what the C kernel is doing potentially dangerous?

I agree; this is a strange discussion.

What is it that people want to protect against?  If the issue is 
undefined behavior caused by a second party modifying the source of a 
memcpy() while the copy is in progress -- well, there's no way to 
protect against that.  You just have to make sure either that it cannot 
happen or else that you can cope with potentially torn values in the 
copy's destination.

Is the issue a matter of informing verifiers or sanitizers that a data 
race during a memcpy() shouldn't count as undefined behavior?  Surely 
the way to do this depends on the verifier/sanitizer in question.  As 
far as I know, there is no version of memcpy() whose arguments are 
declared to be pointers to atomics.  (And if such a thing did exist, it 
would be part of C++, not of C.)

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 18:43                               ` Andreas Hindborg
  2025-03-05 19:30                                 ` Alan Stern
@ 2025-03-05 19:42                                 ` Ralf Jung
  2025-03-05 21:26                                   ` Andreas Hindborg
  1 sibling, 1 reply; 70+ messages in thread
From: Ralf Jung @ 2025-03-05 19:42 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Alice Ryhl, Boqun Feng, comex, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

Hi all,

>>> For some kinds of hardware, we might not want to trust the hardware.
>>> I.e., there is no race under normal operation, but the hardware could
>>> have a bug or be malicious and we might not want that to result in UB.
>>> This is pretty similar to syscalls that take a pointer into userspace
>>> memory and read it - userspace shouldn't modify that memory during the
>>> syscall, but it can and if it does, that should be well-defined.
>>> (Though in the case of userspace, the copy happens in asm since it
>>> also needs to deal with virtual memory and so on.)
>>
>> Wow you are really doing your best to combine all the hard problems at the same
>> time. ;)
>> Sharing memory with untrusted parties is another tricky issue, and even leaving
>> aside all the theoretical trouble, practically speaking you'll want to
>> exclusively use atomic accesses to interact with such memory. So doing this
>> properly requires atomic memcpy. I don't know what that is blocked on, but it is
>> good to know that it would help the kernel.
> 
> I am sort of baffled by this, since the C kernel has no such thing and
> has worked fine for a few years. Is it a property of Rust that causes us
> to need atomic memcpy, or is what the C kernel is doing potentially dangerous?

It's the same in C: a memcpy is a non-atomic access. If something else 
concurrently mutates the memory you are copying from, or something else 
concurrently reads/writes the memory you are copying two, that is UB.
This is not specific to memcpy; it's the same for regular pointer loads/stores. 
That's why you need READ_ONCE and WRITE_ONCE to specifically indicate to the 
compiler that these are special accesses that need to be treated differently. 
Something similar is needed for memcpy.

Kind regards,
Ralf


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 19:42                                 ` Ralf Jung
@ 2025-03-05 21:26                                   ` Andreas Hindborg
  2025-03-05 21:53                                     ` Ralf Jung
  0 siblings, 1 reply; 70+ messages in thread
From: Andreas Hindborg @ 2025-03-05 21:26 UTC (permalink / raw)
  To: Ralf Jung
  Cc: Alice Ryhl, Boqun Feng, comex, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

"Ralf Jung" <post@ralfj.de> writes:

> Hi all,
>
>>>> For some kinds of hardware, we might not want to trust the hardware.
>>>> I.e., there is no race under normal operation, but the hardware could
>>>> have a bug or be malicious and we might not want that to result in UB.
>>>> This is pretty similar to syscalls that take a pointer into userspace
>>>> memory and read it - userspace shouldn't modify that memory during the
>>>> syscall, but it can and if it does, that should be well-defined.
>>>> (Though in the case of userspace, the copy happens in asm since it
>>>> also needs to deal with virtual memory and so on.)
>>>
>>> Wow you are really doing your best to combine all the hard problems at the same
>>> time. ;)
>>> Sharing memory with untrusted parties is another tricky issue, and even leaving
>>> aside all the theoretical trouble, practically speaking you'll want to
>>> exclusively use atomic accesses to interact with such memory. So doing this
>>> properly requires atomic memcpy. I don't know what that is blocked on, but it is
>>> good to know that it would help the kernel.
>>
>> I am sort of baffled by this, since the C kernel has no such thing and
>> has worked fine for a few years. Is it a property of Rust that causes us
>> to need atomic memcpy, or is what the C kernel is doing potentially dangerous?
>
> It's the same in C: a memcpy is a non-atomic access. If something else
> concurrently mutates the memory you are copying from, or something else
> concurrently reads/writes the memory you are copying two, that is UB.
> This is not specific to memcpy; it's the same for regular pointer loads/stores.
> That's why you need READ_ONCE and WRITE_ONCE to specifically indicate to the
> compiler that these are special accesses that need to be treated differently.
> Something similar is needed for memcpy.

I'm not a compiler engineer, so I might be wrong about this, but. If I
do a C `memcpy` from place A to place B where A is experiencing racy
writes, if I don't interpret the data at place B after the copy
operation, the rest of my C program is fine and will work as expected. I
may even later copy the data at place B to place C where C might have
concurrent reads and/or writes, and the kernel will not experience UB
because of this. The data may be garbage, but that is fine. I am not
interpreting the data, or making control flow decisions based on it. I
am just moving the data.

My understand is: In Rust, this program would be illegal and might
experience UB in unpredictable ways, not limited to just the data that
is being moved.

One option I have explored is just calling C memcpy directly, but
because of LTO, that is no different than doing the operation in Rust.

I don't think I need atomic memcpy, I just need my program not to
explode if I move some data to or from a place that is experiencing
concurrent writes without synchronization. Not in general, but for some
special cases where I promise not to look at the data outside of moving
it.

Best regards,
Andreas Hindborg

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 21:26                                   ` Andreas Hindborg
@ 2025-03-05 21:53                                     ` Ralf Jung
  2025-03-07  8:43                                       ` Andreas Hindborg
  0 siblings, 1 reply; 70+ messages in thread
From: Ralf Jung @ 2025-03-05 21:53 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Alice Ryhl, Boqun Feng, comex, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

Hi all,

On 05.03.25 22:26, Andreas Hindborg wrote:
> "Ralf Jung" <post@ralfj.de> writes:
> 
>> Hi all,
>>
>>>>> For some kinds of hardware, we might not want to trust the hardware.
>>>>> I.e., there is no race under normal operation, but the hardware could
>>>>> have a bug or be malicious and we might not want that to result in UB.
>>>>> This is pretty similar to syscalls that take a pointer into userspace
>>>>> memory and read it - userspace shouldn't modify that memory during the
>>>>> syscall, but it can and if it does, that should be well-defined.
>>>>> (Though in the case of userspace, the copy happens in asm since it
>>>>> also needs to deal with virtual memory and so on.)
>>>>
>>>> Wow you are really doing your best to combine all the hard problems at the same
>>>> time. ;)
>>>> Sharing memory with untrusted parties is another tricky issue, and even leaving
>>>> aside all the theoretical trouble, practically speaking you'll want to
>>>> exclusively use atomic accesses to interact with such memory. So doing this
>>>> properly requires atomic memcpy. I don't know what that is blocked on, but it is
>>>> good to know that it would help the kernel.
>>>
>>> I am sort of baffled by this, since the C kernel has no such thing and
>>> has worked fine for a few years. Is it a property of Rust that causes us
>>> to need atomic memcpy, or is what the C kernel is doing potentially dangerous?
>>
>> It's the same in C: a memcpy is a non-atomic access. If something else
>> concurrently mutates the memory you are copying from, or something else
>> concurrently reads/writes the memory you are copying two, that is UB.
>> This is not specific to memcpy; it's the same for regular pointer loads/stores.
>> That's why you need READ_ONCE and WRITE_ONCE to specifically indicate to the
>> compiler that these are special accesses that need to be treated differently.
>> Something similar is needed for memcpy.
> 
> I'm not a compiler engineer, so I might be wrong about this, but. If I
> do a C `memcpy` from place A to place B where A is experiencing racy
> writes, if I don't interpret the data at place B after the copy
> operation, the rest of my C program is fine and will work as expected.

The program has UB in that case. A program that has UB may work as expected 
today, but that changes nothing about it having UB.
The C standard is abundantly clear here:
"The execution of a program contains a data race if it contains two conflicting 
actions in different threads, at least one of which is not atomic, and neither 
happens before the other. Any such data race results in undefined behavior."
(C23, §5.1.2.4)

You are describing a hypothetical language that treats data races in a different 
way. Is such a language *possible*? Definitely. For the specific case you 
describe here, one "just" has to declare read-write races to be not UB, but to 
return "poison data" on the read side (poison data is a bit like uninitialized 
memory or padding), which the memcpy would then store on the target side. Any 
future interpretation of the target memory would be UB ("poison data" is not the 
same as "random data"). Such a model has actually been studied [1], though no a 
lot, and not as a proposal for a semantics of a user-facing language. (Rather, 
that was a proposal for an internal compiler IR.) The extra complications 
incurred by this choice are significant -- there is no free lunch here.

[1]: https://sf.snu.ac.kr/publications/promising-ir-full.pdf

However, C is not that language, and neither is Rust. Defining a concurrency 
memory model is extremely non-trivial (there's literally hundreds of papers 
proposing various different models, and there are still some unsolved problems). 
The route the C++ model took was to strictly rule out all data races, and since 
they were the first to actually undertake the effort of defining a model at this 
level of rigor (for a language not willing to pay the cost that would be 
incurred by the Java concurrency memory model), that has been the standard ever 
since. There's a lot of subtle trade-offs here, and I am far from an expert on 
the exact consequences each different choice would have. I just want to caution 
against the obvious reaction of "why don't they just". :)

> I
> may even later copy the data at place B to place C where C might have
> concurrent reads and/or writes, and the kernel will not experience UB
> because of this. The data may be garbage, but that is fine. I am not
> interpreting the data, or making control flow decisions based on it. I
> am just moving the data.
> 
> My understand is: In Rust, this program would be illegal and might
> experience UB in unpredictable ways, not limited to just the data that
> is being moved.

That is correct. C and Rust behave the same here.

> One option I have explored is just calling C memcpy directly, but
> because of LTO, that is no different than doing the operation in Rust.
> 
> I don't think I need atomic memcpy, I just need my program not to
> explode if I move some data to or from a place that is experiencing
> concurrent writes without synchronization. Not in general, but for some
> special cases where I promise not to look at the data outside of moving
> it.

I'm afraid I do not know of a language, other than assembly, that can provide this.

Atomic memcpy, however, should be able to cover your use-case, so it seems like 
a reasonable solution to me? Marking things as atomic is literally how you tell 
the compiler "don't blow up if there are concurrent accesses".

Kind regards,
Ralf

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 18:38                           ` Andreas Hindborg
@ 2025-03-05 22:01                             ` Ralf Jung
  0 siblings, 0 replies; 70+ messages in thread
From: Ralf Jung @ 2025-03-05 22:01 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Boqun Feng, comex, Alice Ryhl, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

Hi,

On 05.03.25 19:38, Andreas Hindborg wrote:
> "Ralf Jung" <post@ralfj.de> writes:
> 
>> Hi,
>>
>> On 05.03.25 04:24, Boqun Feng wrote:
>>> On Tue, Mar 04, 2025 at 12:18:28PM -0800, comex wrote:
>>>>
>>>>> On Mar 4, 2025, at 11:03 AM, Ralf Jung <post@ralfj.de> wrote:
>>>>>
>>>>> Those already exist in Rust, albeit only unstably:
>>>>> <https://doc.rust-lang.org/nightly/std/intrinsics/fn.volatile_copy_memory.html>.
>>>>> However, I am not sure how you'd even generate such a call in C? The
>>>>> standard memcpy function is not doing volatile accesses, to my
>>>>> knowledge.
>>>>
>>>> The actual memcpy symbol that exists at runtime is written in
>>>> assembly, and should be valid to treat as performing volatile
>>>> accesses.
>>
>> memcpy is often written in C... and AFAIK compilers understand what that
>> function does and will, for instance, happily eliminate the call if they can
>> prove that the destination memory is not being read from again. So, it doesn't
>> behave like a volatile access at all.
>>
>>>> But both GCC and Clang special-case the memcpy function.  For example,
>>>> if you call memcpy with a small constant as the size, the optimizer
>>>> will transform the call into one or more regular loads/stores, which
>>>> can then be optimized mostly like any other loads/stores (except for
>>>> opting out of alignment and type-based aliasing assumptions).  Even if
>>>> the call isn’t transformed, the optimizer will still make assumptions.
>>>> LLVM will automatically mark memcpy `nosync`, which makes it undefined
>>>> behavior if the function “communicate[s] (synchronize[s]) with another
>>>> thread”, including through “volatile accesses”. [1]
>>
>> The question is more,  what do clang and GCC document / guarantee in a stable
>> way regarding memcpy? I have not seen any indication so far that a memcpy call
>> would ever be considered volatile, so we have to treat it like a non-volatile
>> non-atomic operation.
>>
>>>> However, these optimizations should rarely trigger misbehavior in
>>>> practice, so I wouldn’t be surprised if Linux had some code that
>>>> expected memcpy to act volatile…
>>>>
>>>
>>> Also in this particular case we are discussing [1], it's a memcpy (from
>>> or to) a DMA buffer, which means the device can also read or write the
>>> memory, therefore the content of the memory may be altered outside the
>>> program (the kernel), so we cannot use copy_nonoverlapping() I believe.
>>>
>>> [1]: https://lore.kernel.org/rust-for-linux/87bjuil15w.fsf@kernel.org/
>>
>> Is there actually a potential for races (with reads by hardware, not other
>> threads) on the memcpy'd memory?
> 
> There is another use case for this: copying data to/from a page that is
> mapped into user space. In this case, a user space process can
> potentially modify the data in the mapped page while we are
> reading/writing that data. This would be a misbehaved user space
> process, but it should not be able to cause UB in the kernel anyway.

Yeah that sounds like *the* prototypical case of sharing memory with an 
untrusted third party.

> 
> The C kernel just calls memcpy directly for this use case.
> 
> For this use case, we do not interpret or make control flow decisions
> based on the data we read/write. And _if_ user space decides to do
> concurrent writes to the page, we don't care if the data becomes
> garbage. We just need the UB to be confined to the data moved from that
> page, and not leak into the rest of the kernel.

There is no such thing as "confined UB". Well, there is "poison data", which can 
act a bit like that, but sadly the C standard is extremely ambiguous on that 
subject and has been for decades despite repeated requests for clarifications, 
so it is entirely unclear whether and how "poison data" could exist in C. clang, 
for once, has decided that "poison data" is UB in most situations (including 
just copying it to / returning it from another function), and this is consistent 
with some of the messaging of the standards committee. I don't know enough about 
the internals of gcc to comment on what they do.

Personally, I think that's a mistake; there needs to be some clear way to deal 
with uninitialized memory (which is the typical example of "poison data").

In Rust we have a fairly clear idea of what our rules should be here, and you 
can have "poison data" inside the `MaybeUninit` type. However, neither Rust nor 
C have a way to do reads where data races cause "poison data" rather than UB. 
See my other email I just sent for the rest of this line of discussion.
(I'm not used to the sprawling tree of a discussion that is this mailing list, 
so not sure how to best deal with replies that want to "merge" things said in 
different emails.)

Kind regards,
Ralf


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-05 17:41   ` Jason Gunthorpe
@ 2025-03-06 13:37     ` Danilo Krummrich
  2025-03-06 15:21       ` Simona Vetter
  0 siblings, 1 reply; 70+ messages in thread
From: Danilo Krummrich @ 2025-03-06 13:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Abdiel Janulgue, aliceryhl, robin.murphy, daniel.almeida,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS

On Wed, Mar 05, 2025 at 01:41:19PM -0400, Jason Gunthorpe wrote:
> On Mon, Feb 24, 2025 at 01:49:06PM +0200, Abdiel Janulgue wrote:
> 
> > +impl<T: AsBytes + FromBytes> Drop for CoherentAllocation<T> {
> > +    fn drop(&mut self) {
> > +        let size = self.count * core::mem::size_of::<T>();
> > +        // SAFETY: the device, cpu address, and the dma handle is valid due to the
> > +        // type invariants on `CoherentAllocation`.
> > +        unsafe {
> > +            bindings::dma_free_attrs(
> > +                self.dev.as_raw(),
> > +                size,
> > +                self.cpu_addr as _,
> > +                self.dma_handle,
> > +                self.dma_attrs.as_raw(),
> > +            )
> 
> I mentioned this in another thread..
> 
> There is an additional C API restriction here that the DMA API
> functions may only be called by a driver after probe() starts and
> before remove() completes. This applies to dma_free_attrs().
> 
> It is not enough that a refcount is held on device.
> 
> Otherwise the kernel may crash as the driver core allows resources
> used by the DMA API to be changed once the driver is removed.
> 
> See the related discussion here, with an example of what the crash can
> look like:
> 
> https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/#m0c7dda0fb5981240879c5ca489176987d688844c
> 
>  > a device with no driver bound should not be passed to the DMA API,
>  > much less a dead device that's already been removed from its parent
>  > bus.

Thanks for bringing this up!

I assume that's because of potential iommu mappings, the memory itself should
not be critical.

> 
> My rust is non-existent, but I did not see anything about this
> point.

Indeed, this needs to be fixed. It means that a CoherentAllocation also needs to
be embedded in a Devres container.

> 
> Also note that any HW configured to do DMA must be halted before the
> free is allowed otherwise it is a UAF bug. It is worth mentioning that
> in the documentation.

Agreed, makes sense to document. For embedding the CoherentAllocation into
Devres this shouldn't be an issue, since a driver must stop operating the device
in remove() by definition.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-06 13:37     ` Danilo Krummrich
@ 2025-03-06 15:21       ` Simona Vetter
  2025-03-06 15:49         ` Danilo Krummrich
                           ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Simona Vetter @ 2025-03-06 15:21 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Jason Gunthorpe, Abdiel Janulgue, aliceryhl, robin.murphy,
	daniel.almeida, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Valentin Obst, open list,
	Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

On Thu, Mar 06, 2025 at 02:37:07PM +0100, Danilo Krummrich wrote:
> On Wed, Mar 05, 2025 at 01:41:19PM -0400, Jason Gunthorpe wrote:
> > On Mon, Feb 24, 2025 at 01:49:06PM +0200, Abdiel Janulgue wrote:
> > 
> > > +impl<T: AsBytes + FromBytes> Drop for CoherentAllocation<T> {
> > > +    fn drop(&mut self) {
> > > +        let size = self.count * core::mem::size_of::<T>();
> > > +        // SAFETY: the device, cpu address, and the dma handle is valid due to the
> > > +        // type invariants on `CoherentAllocation`.
> > > +        unsafe {
> > > +            bindings::dma_free_attrs(
> > > +                self.dev.as_raw(),
> > > +                size,
> > > +                self.cpu_addr as _,
> > > +                self.dma_handle,
> > > +                self.dma_attrs.as_raw(),
> > > +            )
> > 
> > I mentioned this in another thread..
> > 
> > There is an additional C API restriction here that the DMA API
> > functions may only be called by a driver after probe() starts and
> > before remove() completes. This applies to dma_free_attrs().
> > 
> > It is not enough that a refcount is held on device.
> > 
> > Otherwise the kernel may crash as the driver core allows resources
> > used by the DMA API to be changed once the driver is removed.
> > 
> > See the related discussion here, with an example of what the crash can
> > look like:
> > 
> > https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/#m0c7dda0fb5981240879c5ca489176987d688844c
> > 
> >  > a device with no driver bound should not be passed to the DMA API,
> >  > much less a dead device that's already been removed from its parent
> >  > bus.
> 
> Thanks for bringing this up!
> 
> I assume that's because of potential iommu mappings, the memory itself should
> not be critical.
> 
> > 
> > My rust is non-existent, but I did not see anything about this
> > point.
> 
> Indeed, this needs to be fixed. It means that a CoherentAllocation also needs to
> be embedded in a Devres container.
> 
> > 
> > Also note that any HW configured to do DMA must be halted before the
> > free is allowed otherwise it is a UAF bug. It is worth mentioning that
> > in the documentation.
> 
> Agreed, makes sense to document. For embedding the CoherentAllocation into
> Devres this shouldn't be an issue, since a driver must stop operating the device
> in remove() by definition.

I think for basic driver allocations that you just need to run the device
stuffing it all into devres is ok. But for dma mappings at runtime this
will be too slow, so I guess we'll need subsystem specific abstractions
which guarantee that all dma-api mappings have disappared when device
removal finishes. For drm I guess this means the gpuvm bindings would need
to take care of dma-api mapping (at least as an optional extension), and
you can only get at the dma-api addresses within revoceable critical
sections. Similar for any other subsytem that shovely substantial amounts
of data around. For some this might already be solved entirely at the C
level, if the subsystem already tracks all buffers allocated to a device
(media might work like that at least if you use videobuf helpers, but not
sure).

So lots of good fun here, but I not unsurmountable.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-06 15:21       ` Simona Vetter
@ 2025-03-06 15:49         ` Danilo Krummrich
  2025-03-06 15:54         ` Danilo Krummrich
  2025-03-06 16:09         ` Jason Gunthorpe
  2 siblings, 0 replies; 70+ messages in thread
From: Danilo Krummrich @ 2025-03-06 15:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Abdiel Janulgue, aliceryhl, robin.murphy,
	daniel.almeida, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Valentin Obst, open list,
	Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

On Thu, Mar 06, 2025 at 04:21:51PM +0100, Simona Vetter wrote:
> On Thu, Mar 06, 2025 at 02:37:07PM +0100, Danilo Krummrich wrote:
> > On Wed, Mar 05, 2025 at 01:41:19PM -0400, Jason Gunthorpe wrote:
> > > On Mon, Feb 24, 2025 at 01:49:06PM +0200, Abdiel Janulgue wrote:
> > > 
> > > > +impl<T: AsBytes + FromBytes> Drop for CoherentAllocation<T> {
> > > > +    fn drop(&mut self) {
> > > > +        let size = self.count * core::mem::size_of::<T>();
> > > > +        // SAFETY: the device, cpu address, and the dma handle is valid due to the
> > > > +        // type invariants on `CoherentAllocation`.
> > > > +        unsafe {
> > > > +            bindings::dma_free_attrs(
> > > > +                self.dev.as_raw(),
> > > > +                size,
> > > > +                self.cpu_addr as _,
> > > > +                self.dma_handle,
> > > > +                self.dma_attrs.as_raw(),
> > > > +            )
> > > 
> > > I mentioned this in another thread..
> > > 
> > > There is an additional C API restriction here that the DMA API
> > > functions may only be called by a driver after probe() starts and
> > > before remove() completes. This applies to dma_free_attrs().
> > > 
> > > It is not enough that a refcount is held on device.
> > > 
> > > Otherwise the kernel may crash as the driver core allows resources
> > > used by the DMA API to be changed once the driver is removed.
> > > 
> > > See the related discussion here, with an example of what the crash can
> > > look like:
> > > 
> > > https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/#m0c7dda0fb5981240879c5ca489176987d688844c
> > > 
> > >  > a device with no driver bound should not be passed to the DMA API,
> > >  > much less a dead device that's already been removed from its parent
> > >  > bus.
> > 
> > Thanks for bringing this up!
> > 
> > I assume that's because of potential iommu mappings, the memory itself should
> > not be critical.
> > 
> > > 
> > > My rust is non-existent, but I did not see anything about this
> > > point.
> > 
> > Indeed, this needs to be fixed. It means that a CoherentAllocation also needs to
> > be embedded in a Devres container.
> > 
> > > 
> > > Also note that any HW configured to do DMA must be halted before the
> > > free is allowed otherwise it is a UAF bug. It is worth mentioning that
> > > in the documentation.
> > 
> > Agreed, makes sense to document. For embedding the CoherentAllocation into
> > Devres this shouldn't be an issue, since a driver must stop operating the device
> > in remove() by definition.
> 
> I think for basic driver allocations that you just need to run the device
> stuffing it all into devres is ok.

What exactly do you mean with that? DMA memory allocations or "normal" memory
allocations?

The latter should never be in a Devres container. The Devres container should
only hold things that, for safety reasons, are not allowed to out-live device
/ driver unbind.

> But for dma mappings at runtime this will be too slow.

What exactly do you mean with "DMA mappings at runtime"? What to you think is
is slow in this aspect?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-06 15:21       ` Simona Vetter
  2025-03-06 15:49         ` Danilo Krummrich
@ 2025-03-06 15:54         ` Danilo Krummrich
  2025-03-06 16:18           ` Jason Gunthorpe
  2025-03-06 16:09         ` Jason Gunthorpe
  2 siblings, 1 reply; 70+ messages in thread
From: Danilo Krummrich @ 2025-03-06 15:54 UTC (permalink / raw)
  To: Simona Vetter
  Cc: Jason Gunthorpe, Abdiel Janulgue, aliceryhl, robin.murphy,
	daniel.almeida, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Valentin Obst, open list,
	Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

(For some reason, when replying to this mail, mutt removed Sima from To: and
instead switched Cc: to To:, hence resending.)

On Thu, Mar 06, 2025 at 04:21:51PM +0100, Simona Vetter wrote:
> On Thu, Mar 06, 2025 at 02:37:07PM +0100, Danilo Krummrich wrote:
> > On Wed, Mar 05, 2025 at 01:41:19PM -0400, Jason Gunthorpe wrote:
> > > On Mon, Feb 24, 2025 at 01:49:06PM +0200, Abdiel Janulgue wrote:
> > > 
> > > > +impl<T: AsBytes + FromBytes> Drop for CoherentAllocation<T> {
> > > > +    fn drop(&mut self) {
> > > > +        let size = self.count * core::mem::size_of::<T>();
> > > > +        // SAFETY: the device, cpu address, and the dma handle is valid due to the
> > > > +        // type invariants on `CoherentAllocation`.
> > > > +        unsafe {
> > > > +            bindings::dma_free_attrs(
> > > > +                self.dev.as_raw(),
> > > > +                size,
> > > > +                self.cpu_addr as _,
> > > > +                self.dma_handle,
> > > > +                self.dma_attrs.as_raw(),
> > > > +            )
> > > 
> > > I mentioned this in another thread..
> > > 
> > > There is an additional C API restriction here that the DMA API
> > > functions may only be called by a driver after probe() starts and
> > > before remove() completes. This applies to dma_free_attrs().
> > > 
> > > It is not enough that a refcount is held on device.
> > > 
> > > Otherwise the kernel may crash as the driver core allows resources
> > > used by the DMA API to be changed once the driver is removed.
> > > 
> > > See the related discussion here, with an example of what the crash can
> > > look like:
> > > 
> > > https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/#m0c7dda0fb5981240879c5ca489176987d688844c
> > > 
> > >  > a device with no driver bound should not be passed to the DMA API,
> > >  > much less a dead device that's already been removed from its parent
> > >  > bus.
> > 
> > Thanks for bringing this up!
> > 
> > I assume that's because of potential iommu mappings, the memory itself should
> > not be critical.
> > 
> > > 
> > > My rust is non-existent, but I did not see anything about this
> > > point.
> > 
> > Indeed, this needs to be fixed. It means that a CoherentAllocation also needs to
> > be embedded in a Devres container.
> > 
> > > 
> > > Also note that any HW configured to do DMA must be halted before the
> > > free is allowed otherwise it is a UAF bug. It is worth mentioning that
> > > in the documentation.
> > 
> > Agreed, makes sense to document. For embedding the CoherentAllocation into
> > Devres this shouldn't be an issue, since a driver must stop operating the device
> > in remove() by definition.
> 
> I think for basic driver allocations that you just need to run the device
> stuffing it all into devres is ok.

What exactly do you mean with that? DMA memory allocations or "normal" memory
allocations?

The latter should never be in a Devres container. The Devres container should
only hold things that, for safety reasons, are not allowed to out-live device
/ driver unbind.

> But for dma mappings at runtime this will be too slow.

What exactly do you mean with "DMA mappings at runtime"? What to you think is
is slow in this aspect?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-06 15:21       ` Simona Vetter
  2025-03-06 15:49         ` Danilo Krummrich
  2025-03-06 15:54         ` Danilo Krummrich
@ 2025-03-06 16:09         ` Jason Gunthorpe
  2025-03-07  8:50           ` Danilo Krummrich
  2 siblings, 1 reply; 70+ messages in thread
From: Jason Gunthorpe @ 2025-03-06 16:09 UTC (permalink / raw)
  To: Danilo Krummrich, Abdiel Janulgue, aliceryhl, robin.murphy,
	daniel.almeida, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Valentin Obst, open list,
	Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

On Thu, Mar 06, 2025 at 04:21:51PM +0100, Simona Vetter wrote:
> > >  > a device with no driver bound should not be passed to the DMA API,
> > >  > much less a dead device that's already been removed from its parent
> > >  > bus.
> > 
> > Thanks for bringing this up!
> > 
> > I assume that's because of potential iommu mappings, the memory itself should
> > not be critical.

There is a lot of state tied to the struct device lifecycle that the
DMA API and iommu implicitly manages. It is not just iommu mappings.

It is incorrect to view the struct device as a simple refcount object
where holding the refcount means it is alive and safe to use. There
are three broad substates (No Driver, Driver Attached, Zombie) that
the struct device can be in that are relevant.

Technically it is unsafe and oopsable to call the allocation API as
well on a device that has no driver. This issue is also ignored in
these bindings and cannot be solved with revoke.

IOW I do not belive you can create bindings here that are truely safe
without also teaching rust to understand the concept of a scope
guaranteed to be within a probed driver's lifetime.

> > > Also note that any HW configured to do DMA must be halted before the
> > > free is allowed otherwise it is a UAF bug. It is worth mentioning that
> > > in the documentation.
> > 
> > Agreed, makes sense to document. For embedding the CoherentAllocation into
> > Devres this shouldn't be an issue, since a driver must stop operating the device
> > in remove() by definition.
>
> I think for basic driver allocations that you just need to run the device
> stuffing it all into devres is ok. 

What exactly will this revokable critical region protect?

The actual critical region extends into the HW itself, it is not
simple to model this with a pure SW construct of bracketing some
allocation. You need to bracket the *entire lifecycle* of the
dma_addr_t that has been returned and passed into HW, until the
dma_addr_t is removed from HW.

You cannot exit your critical region until the HW has ended it's DMA.

Is that possible? I think not, as least not as you imagine it with a
non-sleepable RCU.

Further forcing driver writers to make critical regions that do not
bracket the lifetime of the actual thing being protected (the
dma_addr_t) is a complete farce, IMHO.

All this does is confuse people about what is actually required to
write a correct driver and increases the chance of incorrect removal
sequencing.

My dislike of revoke has only increased as this discussion has gone
on, I think it should not be part of the generic Rust bidings at
all.

> But for dma mappings at runtime this will be too slow, so I guess
> we'll need subsystem specific abstractions which guarantee that all
> dma-api mappings have disappared when device removal finishes.

Yes, it will be way too slow.

I don't know about "subsystem level abstractions". That seems to be a
very big and invasive ask.

> sections. Similar for any other subsytem that shovely substantial amounts
> of data around. For some this might already be solved entirely at the C
> level, if the subsystem already tracks all buffers allocated to a device
> (media might work like that at least if you use videobuf helpers, but not
> sure).
> 
> So lots of good fun here, but I not unsurmountable.

I disagree. I think this is unsurmountable. The idea of fine grained
revoke is dead in my mind. If DRM really wants to attempt it, I think
that should be DRM's mess to figure out alone.

Revoke should not be part of the generic rust bindings and this idea
should not be exported to other subsystems.

We should be encouring drivers and subsystems to follow the RDMA model
for lifecycle which is proven to work.

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-06 15:54         ` Danilo Krummrich
@ 2025-03-06 16:18           ` Jason Gunthorpe
  2025-03-06 16:34             ` Danilo Krummrich
  0 siblings, 1 reply; 70+ messages in thread
From: Jason Gunthorpe @ 2025-03-06 16:18 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Simona Vetter, Abdiel Janulgue, aliceryhl, robin.murphy,
	daniel.almeida, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Valentin Obst, open list,
	Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

On Thu, Mar 06, 2025 at 04:54:14PM +0100, Danilo Krummrich wrote:
> (For some reason, when replying to this mail, mutt removed Sima from To: and
> instead switched Cc: to To:, hence resending.)

It is normal, Simona's mail client is setup to do that.

> > I think for basic driver allocations that you just need to run the device
> > stuffing it all into devres is ok.
> 
> What exactly do you mean with that? DMA memory allocations or "normal" memory
> allocations?

Simona means things like a coherent allocation backing something
allocated once like a global queue for talking to the device.

Ie DMA API usage that is not on the performance path.

> > But for dma mappings at runtime this will be too slow.
> 
> What exactly do you mean with "DMA mappings at runtime"? What to you think is
> is slow in this aspect?

Things like dma_map_sg(), dma_map_page(), etc, etc.

You cannot propose to add any runtime overhead to those paths and get
any support from the kernel community. They are performance paths
optimized to be fast.

For example: proposing to wrap their allocation in a devm container -
allocate a tracking structure, acquire the per-device spinlock and
thread the tracking into the devm linked list. For every single IO on
the performance path.

That idea would get a very hard NAK.

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-06 16:18           ` Jason Gunthorpe
@ 2025-03-06 16:34             ` Danilo Krummrich
  2025-03-07 10:20               ` Simona Vetter
  0 siblings, 1 reply; 70+ messages in thread
From: Danilo Krummrich @ 2025-03-06 16:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Simona Vetter, Abdiel Janulgue, aliceryhl, robin.murphy,
	daniel.almeida, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Valentin Obst, open list,
	Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

On Thu, Mar 06, 2025 at 12:18:18PM -0400, Jason Gunthorpe wrote:
> On Thu, Mar 06, 2025 at 04:54:14PM +0100, Danilo Krummrich wrote:
> > (For some reason, when replying to this mail, mutt removed Sima from To: and
> > instead switched Cc: to To:, hence resending.)
> 
> It is normal, Simona's mail client is setup to do that.

Huh! Never noticed that in the past.

> 
> > > I think for basic driver allocations that you just need to run the device
> > > stuffing it all into devres is ok.
> > 
> > What exactly do you mean with that? DMA memory allocations or "normal" memory
> > allocations?
> 
> Simona means things like a coherent allocation backing something
> allocated once like a global queue for talking to the device.

Yeah, that's what I propose then.

> 
> Ie DMA API usage that is not on the performance path.
> 
> > > But for dma mappings at runtime this will be too slow.
> > 
> > What exactly do you mean with "DMA mappings at runtime"? What to you think is
> > is slow in this aspect?
> 
> Things like dma_map_sg(), dma_map_page(), etc, etc.
> 
> You cannot propose to add any runtime overhead to those paths and get
> any support from the kernel community. They are performance paths
> optimized to be fast.

Oh, I didn't do that. How could I, since I did not knew what was referred
to? :-)

Quite the opposite, I fully agree with that.

I think for this we need higher level abstraction (which now that I know what
was meant I know Sima proposed already), or maybe provide an API that can
consolidate single operations for a single Devres container, etc.

But that's out of scope for this series.

- Danilo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-05 21:53                                     ` Ralf Jung
@ 2025-03-07  8:43                                       ` Andreas Hindborg
  2025-03-18 14:44                                         ` Ralf Jung
  0 siblings, 1 reply; 70+ messages in thread
From: Andreas Hindborg @ 2025-03-07  8:43 UTC (permalink / raw)
  To: Ralf Jung
  Cc: Alice Ryhl, Boqun Feng, comex, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

"Ralf Jung" <post@ralfj.de> writes:

> Hi all,
>
> On 05.03.25 22:26, Andreas Hindborg wrote:
>> "Ralf Jung" <post@ralfj.de> writes:
>>
>>> Hi all,
>>>
>>>>>> For some kinds of hardware, we might not want to trust the hardware.
>>>>>> I.e., there is no race under normal operation, but the hardware could
>>>>>> have a bug or be malicious and we might not want that to result in UB.
>>>>>> This is pretty similar to syscalls that take a pointer into userspace
>>>>>> memory and read it - userspace shouldn't modify that memory during the
>>>>>> syscall, but it can and if it does, that should be well-defined.
>>>>>> (Though in the case of userspace, the copy happens in asm since it
>>>>>> also needs to deal with virtual memory and so on.)
>>>>>
>>>>> Wow you are really doing your best to combine all the hard problems at the same
>>>>> time. ;)
>>>>> Sharing memory with untrusted parties is another tricky issue, and even leaving
>>>>> aside all the theoretical trouble, practically speaking you'll want to
>>>>> exclusively use atomic accesses to interact with such memory. So doing this
>>>>> properly requires atomic memcpy. I don't know what that is blocked on, but it is
>>>>> good to know that it would help the kernel.
>>>>
>>>> I am sort of baffled by this, since the C kernel has no such thing and
>>>> has worked fine for a few years. Is it a property of Rust that causes us
>>>> to need atomic memcpy, or is what the C kernel is doing potentially dangerous?
>>>
>>> It's the same in C: a memcpy is a non-atomic access. If something else
>>> concurrently mutates the memory you are copying from, or something else
>>> concurrently reads/writes the memory you are copying two, that is UB.
>>> This is not specific to memcpy; it's the same for regular pointer loads/stores.
>>> That's why you need READ_ONCE and WRITE_ONCE to specifically indicate to the
>>> compiler that these are special accesses that need to be treated differently.
>>> Something similar is needed for memcpy.
>>
>> I'm not a compiler engineer, so I might be wrong about this, but. If I
>> do a C `memcpy` from place A to place B where A is experiencing racy
>> writes, if I don't interpret the data at place B after the copy
>> operation, the rest of my C program is fine and will work as expected.
>
> The program has UB in that case. A program that has UB may work as expected
> today, but that changes nothing about it having UB.
> The C standard is abundantly clear here:
> "The execution of a program contains a data race if it contains two conflicting
> actions in different threads, at least one of which is not atomic, and neither
> happens before the other. Any such data race results in undefined behavior."
> (C23, §5.1.2.4)
>
> You are describing a hypothetical language that treats data races in a different
> way. Is such a language *possible*? Definitely. For the specific case you
> describe here, one "just" has to declare read-write races to be not UB, but to
> return "poison data" on the read side (poison data is a bit like uninitialized
> memory or padding), which the memcpy would then store on the target side. Any
> future interpretation of the target memory would be UB ("poison data" is not the
> same as "random data"). Such a model has actually been studied [1], though no a
> lot, and not as a proposal for a semantics of a user-facing language. (Rather,
> that was a proposal for an internal compiler IR.) The extra complications
> incurred by this choice are significant -- there is no free lunch here.
>
> [1]: https://sf.snu.ac.kr/publications/promising-ir-full.pdf
>
> However, C is not that language, and neither is Rust. Defining a concurrency
> memory model is extremely non-trivial (there's literally hundreds of papers
> proposing various different models, and there are still some unsolved problems).
> The route the C++ model took was to strictly rule out all data races, and since
> they were the first to actually undertake the effort of defining a model at this
> level of rigor (for a language not willing to pay the cost that would be
> incurred by the Java concurrency memory model), that has been the standard ever
> since. There's a lot of subtle trade-offs here, and I am far from an expert on
> the exact consequences each different choice would have. I just want to caution
> against the obvious reaction of "why don't they just". :)
>

Thanks for the elaborate explanation.

>
>> I
>> may even later copy the data at place B to place C where C might have
>> concurrent reads and/or writes, and the kernel will not experience UB
>> because of this. The data may be garbage, but that is fine. I am not
>> interpreting the data, or making control flow decisions based on it. I
>> am just moving the data.
>>
>> My understand is: In Rust, this program would be illegal and might
>> experience UB in unpredictable ways, not limited to just the data that
>> is being moved.
>
> That is correct. C and Rust behave the same here.

Is there a difference between formal models of the languages and
practical implementations of the languages here? I'm asking this because
C kernel developers seem to be writing these programs that are illegal
under the formal spec of the C language, but work well in practice.
Could it be the same in Rust?

That is, can I do this copy and get away with it in practice under the
circumstances outlined earlier?

>
>> One option I have explored is just calling C memcpy directly, but
>> because of LTO, that is no different than doing the operation in Rust.
>>
>> I don't think I need atomic memcpy, I just need my program not to
>> explode if I move some data to or from a place that is experiencing
>> concurrent writes without synchronization. Not in general, but for some
>> special cases where I promise not to look at the data outside of moving
>> it.
>
> I'm afraid I do not know of a language, other than assembly, that can provide this.
>
> Atomic memcpy, however, should be able to cover your use-case, so it seems like
> a reasonable solution to me? Marking things as atomic is literally how you tell
> the compiler "don't blow up if there are concurrent accesses".

If atomic memcpy is what we really need to write these kinds of programs in
Rust, what would be the next steps to get this in the language?

Also, would there be a performance price to pay for this?


Best regards,
Andreas Hindborg




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-06 16:09         ` Jason Gunthorpe
@ 2025-03-07  8:50           ` Danilo Krummrich
  2025-03-07 10:18             ` Simona Vetter
  2025-03-07 12:48             ` Jason Gunthorpe
  0 siblings, 2 replies; 70+ messages in thread
From: Danilo Krummrich @ 2025-03-07  8:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Abdiel Janulgue, aliceryhl, robin.murphy, daniel.almeida,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS

On Thu, Mar 06, 2025 at 12:09:07PM -0400, Jason Gunthorpe wrote:
> On Thu, Mar 06, 2025 at 04:21:51PM +0100, Simona Vetter wrote:
> > > >  > a device with no driver bound should not be passed to the DMA API,
> > > >  > much less a dead device that's already been removed from its parent
> > > >  > bus.
> > > 
> > > Thanks for bringing this up!
> > > 
> > > I assume that's because of potential iommu mappings, the memory itself should
> > > not be critical.
> 
> There is a lot of state tied to the struct device lifecycle that the
> DMA API and iommu implicitly manages. It is not just iommu mappings.
> 
> It is incorrect to view the struct device as a simple refcount object
> where holding the refcount means it is alive and safe to use. There
> are three broad substates (No Driver, Driver Attached, Zombie) that
> the struct device can be in that are relevant.
> 
> Technically it is unsafe and oopsable to call the allocation API as
> well on a device that has no driver. This issue is also ignored in
> these bindings and cannot be solved with revoke.

This is correct, and I am well aware of it. I brought this up once when working
on the initial device / driver, devres and I/O abstractions.

It's on my list to make the creation of the Devres container fallible in this
aspect, which would prevent this issue.

For now it's probably not too critical; we never hand out device references
before probe(). The only source of error is when a driver tries to create new
device resources after the device has been unbound.

> IOW I do not belive you can create bindings here that are truely safe
> without also teaching rust to understand the concept of a scope
> guaranteed to be within a probed driver's lifetime.
> 
> > > > Also note that any HW configured to do DMA must be halted before the
> > > > free is allowed otherwise it is a UAF bug. It is worth mentioning that
> > > > in the documentation.
> > > 
> > > Agreed, makes sense to document. For embedding the CoherentAllocation into
> > > Devres this shouldn't be an issue, since a driver must stop operating the device
> > > in remove() by definition.
> >
> > I think for basic driver allocations that you just need to run the device
> > stuffing it all into devres is ok. 
> 
> What exactly will this revokable critical region protect?
> 
> The actual critical region extends into the HW itself, it is not
> simple to model this with a pure SW construct of bracketing some
> allocation. You need to bracket the *entire lifecycle* of the
> dma_addr_t that has been returned and passed into HW, until the
> dma_addr_t is removed from HW.

Devres callbacks run after remove(). It's the drivers job to stop operating the
device latest in remove(). Which means that the design is correct.

Now, you ask for a step further, i.e. make it that we can enforce that a driver
actually stopped the device in remove().

But that's just impossible, because obviously no one else than the driver knows
the semantics of the devicei; it's the whole purpose of the driver. So, this is
one of the exceptions where just have to trust the driver doing the correct
thing.

Having that said, it doesn't need to be an "all or nothing", let's catch the
ones we can actually catch.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-07  8:50           ` Danilo Krummrich
@ 2025-03-07 10:18             ` Simona Vetter
  2025-03-07 12:48             ` Jason Gunthorpe
  1 sibling, 0 replies; 70+ messages in thread
From: Simona Vetter @ 2025-03-07 10:18 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Jason Gunthorpe, Abdiel Janulgue, aliceryhl, robin.murphy,
	daniel.almeida, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Valentin Obst, open list,
	Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

On Fri, Mar 07, 2025 at 09:50:07AM +0100, Danilo Krummrich wrote:
> On Thu, Mar 06, 2025 at 12:09:07PM -0400, Jason Gunthorpe wrote:
> > On Thu, Mar 06, 2025 at 04:21:51PM +0100, Simona Vetter wrote:
> > > > >  > a device with no driver bound should not be passed to the DMA API,
> > > > >  > much less a dead device that's already been removed from its parent
> > > > >  > bus.
> > > > 
> > > > Thanks for bringing this up!
> > > > 
> > > > I assume that's because of potential iommu mappings, the memory itself should
> > > > not be critical.
> > 
> > There is a lot of state tied to the struct device lifecycle that the
> > DMA API and iommu implicitly manages. It is not just iommu mappings.
> > 
> > It is incorrect to view the struct device as a simple refcount object
> > where holding the refcount means it is alive and safe to use. There
> > are three broad substates (No Driver, Driver Attached, Zombie) that
> > the struct device can be in that are relevant.
> > 
> > Technically it is unsafe and oopsable to call the allocation API as
> > well on a device that has no driver. This issue is also ignored in
> > these bindings and cannot be solved with revoke.
> 
> This is correct, and I am well aware of it. I brought this up once when working
> on the initial device / driver, devres and I/O abstractions.
> 
> It's on my list to make the creation of the Devres container fallible in this
> aspect, which would prevent this issue.
> 
> For now it's probably not too critical; we never hand out device references
> before probe(). The only source of error is when a driver tries to create new
> device resources after the device has been unbound.
> 
> > IOW I do not belive you can create bindings here that are truely safe
> > without also teaching rust to understand the concept of a scope
> > guaranteed to be within a probed driver's lifetime.
> > 
> > > > > Also note that any HW configured to do DMA must be halted before the
> > > > > free is allowed otherwise it is a UAF bug. It is worth mentioning that
> > > > > in the documentation.
> > > > 
> > > > Agreed, makes sense to document. For embedding the CoherentAllocation into
> > > > Devres this shouldn't be an issue, since a driver must stop operating the device
> > > > in remove() by definition.
> > >
> > > I think for basic driver allocations that you just need to run the device
> > > stuffing it all into devres is ok. 
> > 
> > What exactly will this revokable critical region protect?
> > 
> > The actual critical region extends into the HW itself, it is not
> > simple to model this with a pure SW construct of bracketing some
> > allocation. You need to bracket the *entire lifecycle* of the
> > dma_addr_t that has been returned and passed into HW, until the
> > dma_addr_t is removed from HW.
> 
> Devres callbacks run after remove(). It's the drivers job to stop operating the
> device latest in remove(). Which means that the design is correct.
> 
> Now, you ask for a step further, i.e. make it that we can enforce that a driver
> actually stopped the device in remove().
> 
> But that's just impossible, because obviously no one else than the driver knows
> the semantics of the devicei; it's the whole purpose of the driver. So, this is
> one of the exceptions where just have to trust the driver doing the correct
> thing.

In general it's impossible, but I think for specific cases like pci we can
enforce that bus mastering/interrupt generation/whatever else might cause
havoc is force-disabled after ->remove finishes. For platform devices this
is more annoying, but then it's much harder to physically yank a platform
devices. So I'm less worried about that being a practical concern there.

> Having that said, it doesn't need to be an "all or nothing", let's catch the
> ones we can actually catch.

Yeah, agreed.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-06 16:34             ` Danilo Krummrich
@ 2025-03-07 10:20               ` Simona Vetter
  0 siblings, 0 replies; 70+ messages in thread
From: Simona Vetter @ 2025-03-07 10:20 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Jason Gunthorpe, Simona Vetter, Abdiel Janulgue, aliceryhl,
	robin.murphy, daniel.almeida, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Valentin Obst,
	open list, Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

On Thu, Mar 06, 2025 at 05:34:21PM +0100, Danilo Krummrich wrote:
> On Thu, Mar 06, 2025 at 12:18:18PM -0400, Jason Gunthorpe wrote:
> > On Thu, Mar 06, 2025 at 04:54:14PM +0100, Danilo Krummrich wrote:
> > > (For some reason, when replying to this mail, mutt removed Sima from To: and
> > > instead switched Cc: to To:, hence resending.)
> > 
> > It is normal, Simona's mail client is setup to do that.
> 
> Huh! Never noticed that in the past.
> 
> > 
> > > > I think for basic driver allocations that you just need to run the device
> > > > stuffing it all into devres is ok.
> > > 
> > > What exactly do you mean with that? DMA memory allocations or "normal" memory
> > > allocations?
> > 
> > Simona means things like a coherent allocation backing something
> > allocated once like a global queue for talking to the device.
> 
> Yeah, that's what I propose then.
> 
> > 
> > Ie DMA API usage that is not on the performance path.
> > 
> > > > But for dma mappings at runtime this will be too slow.
> > > 
> > > What exactly do you mean with "DMA mappings at runtime"? What to you think is
> > > is slow in this aspect?
> > 
> > Things like dma_map_sg(), dma_map_page(), etc, etc.
> > 
> > You cannot propose to add any runtime overhead to those paths and get
> > any support from the kernel community. They are performance paths
> > optimized to be fast.
> 
> Oh, I didn't do that. How could I, since I did not knew what was referred
> to? :-)
> 
> Quite the opposite, I fully agree with that.
> 
> I think for this we need higher level abstraction (which now that I know what
> was meant I know Sima proposed already), or maybe provide an API that can
> consolidate single operations for a single Devres container, etc.

Apologies for creating a confusion here, and thanks for Jason for
explaining what I really meant :-)

> But that's out of scope for this series.

Yeah I think for coherent allocations we can get away for now by making
sure you can only get it in safe rust code wrapped in a DevRes. I think
that's solid enough, everything else is for later.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-07  8:50           ` Danilo Krummrich
  2025-03-07 10:18             ` Simona Vetter
@ 2025-03-07 12:48             ` Jason Gunthorpe
  2025-03-07 13:16               ` Simona Vetter
  2025-03-07 16:09               ` Danilo Krummrich
  1 sibling, 2 replies; 70+ messages in thread
From: Jason Gunthorpe @ 2025-03-07 12:48 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Abdiel Janulgue, aliceryhl, robin.murphy, daniel.almeida,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS

On Fri, Mar 07, 2025 at 09:50:07AM +0100, Danilo Krummrich wrote:
> > Technically it is unsafe and oopsable to call the allocation API as
> > well on a device that has no driver. This issue is also ignored in
> > these bindings and cannot be solved with revoke.
> 
> This is correct, and I am well aware of it. I brought this up once when working
> on the initial device / driver, devres and I/O abstractions.

Yes it is also incorrect to call any devm function on an unprobed
driver.

> It's on my list to make the creation of the Devres container fallible in this
> aspect, which would prevent this issue.

I expect that will require new locking.

> > The actual critical region extends into the HW itself, it is not
> > simple to model this with a pure SW construct of bracketing some
> > allocation. You need to bracket the *entire lifecycle* of the
> > dma_addr_t that has been returned and passed into HW, until the
> > dma_addr_t is removed from HW.
> 
> Devres callbacks run after remove(). It's the drivers job to stop operating the
> device latest in remove(). Which means that the design is correct.

It could be the drivers job to unmap the dma as well if you take that
logic.

You still didn't answer the question, what is the critical region of
the DevRes for a dma_alloc_coherent() actually going to protect?

You also have to urgently fix the synchronize_rcu() repitition of you
plan to do this.

> Now, you ask for a step further, i.e. make it that we can enforce that a driver
> actually stopped the device in remove().

So where do you draw the line on bugs Rust should prevent and bugs
Rust requires the programmer to fix?

Allow UAF from forgetting to shutdown DMA, but try to mitigate UAF
from failing to call a dma unmap function. It is the *very same*
driver bug: incorrect shutdown of DMA activity.

I said this for MMIO, and I say it more strongly here. The correct
thing is to throw a warning if the driver has malfunctioned and leaked
a DMA Mapping. This indicates a driver bug. Silently fixing the issue
does nothing to help driver writers make correct drivers. It may even
confuse authors as to what their responsiblities are since so much is
handled "magically".

> Having that said, it doesn't need to be an "all or nothing", let's catch the
> ones we can actually catch.

Well, that's refreshing. Maybe it would be nice to have an agreed
binding design policy on what is wortwhile to catch with runtime
overhead and what is not.

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-07 12:48             ` Jason Gunthorpe
@ 2025-03-07 13:16               ` Simona Vetter
  2025-03-07 14:38                 ` Jason Gunthorpe
  2025-03-07 16:09               ` Danilo Krummrich
  1 sibling, 1 reply; 70+ messages in thread
From: Simona Vetter @ 2025-03-07 13:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Danilo Krummrich, Abdiel Janulgue, aliceryhl, robin.murphy,
	daniel.almeida, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Valentin Obst, open list,
	Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

On Fri, Mar 07, 2025 at 08:48:09AM -0400, Jason Gunthorpe wrote:
> On Fri, Mar 07, 2025 at 09:50:07AM +0100, Danilo Krummrich wrote:
> > > Technically it is unsafe and oopsable to call the allocation API as
> > > well on a device that has no driver. This issue is also ignored in
> > > these bindings and cannot be solved with revoke.
> > 
> > This is correct, and I am well aware of it. I brought this up once when working
> > on the initial device / driver, devres and I/O abstractions.
> 
> Yes it is also incorrect to call any devm function on an unprobed
> driver.

You can, devm groups nest, and I think by default (might misremember)
there's 3:

- probe/remove i.e. driver bound lifetime
- up to device_del in case you've called devm outside of a driver being
  bound.
- Final kref_put on the device.

For added fun, you can create your own nested groups within this and nuke
them as needed. component.c does some of that, which is why I regretfully
know about this stuff.

> > It's on my list to make the creation of the Devres container fallible in this
> > aspect, which would prevent this issue.
> 
> I expect that will require new locking.
> 
> > > The actual critical region extends into the HW itself, it is not
> > > simple to model this with a pure SW construct of bracketing some
> > > allocation. You need to bracket the *entire lifecycle* of the
> > > dma_addr_t that has been returned and passed into HW, until the
> > > dma_addr_t is removed from HW.
> > 
> > Devres callbacks run after remove(). It's the drivers job to stop operating the
> > device latest in remove(). Which means that the design is correct.
> 
> It could be the drivers job to unmap the dma as well if you take that
> logic.
> 
> You still didn't answer the question, what is the critical region of
> the DevRes for a dma_alloc_coherent() actually going to protect?
> 
> You also have to urgently fix the synchronize_rcu() repitition of you
> plan to do this.
> 
> > Now, you ask for a step further, i.e. make it that we can enforce that a driver
> > actually stopped the device in remove().
> 
> So where do you draw the line on bugs Rust should prevent and bugs
> Rust requires the programmer to fix?
> 
> Allow UAF from forgetting to shutdown DMA, but try to mitigate UAF
> from failing to call a dma unmap function. It is the *very same*
> driver bug: incorrect shutdown of DMA activity.
> 
> I said this for MMIO, and I say it more strongly here. The correct
> thing is to throw a warning if the driver has malfunctioned and leaked
> a DMA Mapping. This indicates a driver bug. Silently fixing the issue
> does nothing to help driver writers make correct drivers. It may even
> confuse authors as to what their responsiblities are since so much is
> handled "magically".

I think thus far the guideline is that software uaf should be impossible.
So calling dma_* functions on deleted devices should not be doable (or
result in runtime failures on the rust side).

I think for the actual hw uaf if you leak dma_addr_t that are unmapped in
hw device tables that would need to be outside of the scope of what rust
can prevent. Simply because rust doesn't know about how the hw works.

Wrt magically cleaning up, that is generally the preferred rust approach
with the Drop trait. But there are special traits where you _must_
manually clean up an object with a call that consumes its reference, and
the compiler will fail if you just leak a reference somewhere because it
knows it's not allowed to automatically drop that object. So both patterns
are possible, but rust has a very strong preference for the automagic
approach, unlike C. So there will be somewhat of a style different here in
what a native-feeling api looks like in C or rust.

> > Having that said, it doesn't need to be an "all or nothing", let's catch the
> > ones we can actually catch.
> 
> Well, that's refreshing. Maybe it would be nice to have an agreed
> binding design policy on what is wortwhile to catch with runtime
> overhead and what is not.

Yeah clarifying this stuff is probably a good idea.

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-07 13:16               ` Simona Vetter
@ 2025-03-07 14:38                 ` Jason Gunthorpe
  2025-03-07 17:30                   ` Danilo Krummrich
  0 siblings, 1 reply; 70+ messages in thread
From: Jason Gunthorpe @ 2025-03-07 14:38 UTC (permalink / raw)
  To: Danilo Krummrich, Abdiel Janulgue, aliceryhl, robin.murphy,
	daniel.almeida, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Valentin Obst, open list,
	Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

On Fri, Mar 07, 2025 at 02:16:11PM +0100, Simona Vetter wrote:
> On Fri, Mar 07, 2025 at 08:48:09AM -0400, Jason Gunthorpe wrote:
> > On Fri, Mar 07, 2025 at 09:50:07AM +0100, Danilo Krummrich wrote:
> > > > Technically it is unsafe and oopsable to call the allocation API as
> > > > well on a device that has no driver. This issue is also ignored in
> > > > these bindings and cannot be solved with revoke.
> > > 
> > > This is correct, and I am well aware of it. I brought this up once when working
> > > on the initial device / driver, devres and I/O abstractions.
> > 
> > Yes it is also incorrect to call any devm function on an unprobed
> > driver.
> 
> You can, devm groups nest, and I think by default (might misremember)
> there's 3:

This isn't about groups and the group nesting, it is about what
devres_release_all() does and when it is called. Yes it is called in
all of these places:

> - probe/remove i.e. driver bound lifetime
> - up to device_del in case you've called devm outside of a driver being
>   bound.
> - Final kref_put on the device.

However, AFAICT it always wipes out everything. You don't get to pick
which of the above scopes your devm is associated with.

IIRC there are two different use cases at play here:
 - struct devices that are expected to use drivers
 - struct devices that are known to never use drivers

devm can serve both, but with different rules and lifecycle. Remember
if a driver is attached and probe fails (eg even with EPROBE_DEFER)
then devres_release_all() is called and *all* the resources are
gone. It would be functionally wrong to attach non-driver resources to
the devm in this case since they would not reliably exist across the
device liftetime.

However, if a driver is never bound then the device owner can use devm
over the device's own lifecycle ending in either device_del on a
success path or possibly kref_put on pre-add failure paths.

From a rust perspective I think it should focus only on driver bound
cases.

> > I said this for MMIO, and I say it more strongly here. The correct
> > thing is to throw a warning if the driver has malfunctioned and leaked
> > a DMA Mapping. This indicates a driver bug. Silently fixing the issue
> > does nothing to help driver writers make correct drivers. It may even
> > confuse authors as to what their responsiblities are since so much is
> > handled "magically".
> 
> I think thus far the guideline is that software uaf should be impossible.
> So calling dma_* functions on deleted devices should not be doable (or
> result in runtime failures on the rust side).

I'm not advocating for UAF, I am arguing about what the correct
response is to a driver leaking a HW linked resource like DMA past
remove. I think it should trigger a runtime error, and maybe a sleep
wait to prevent UAF, not be silently partially "fixed".

The goal is to help people write correct drivers, and correct dirvers
should be written in a style where the map the DMA, program the HW,
deprogram the HW and then unmap the DMA is clearly written.

Requiring that pattern be explicitly written in code so that the lack
of language provided safety has clear markers and expectations so it
can be reviewed and audited.

> I think for the actual hw uaf if you leak dma_addr_t that are unmapped in
> hw device tables that would need to be outside of the scope of what rust
> can prevent. Simply because rust doesn't know about how the hw works.

I accept that, but I think you need to force this issue into the face
of the driver writers and demand they deal with it.

Silently partially-fixing the leak of dma mappings is far too quiet
and encourging of dangerous driver design.

> knows it's not allowed to automatically drop that object. So both patterns
> are possible, but rust has a very strong preference for the automagic
> approach, unlike C. So there will be somewhat of a style different here in
> what a native-feeling api looks like in C or rust.

I don't think it is simple "style difference" :( This Rust direction
is radically transforming how the driver lifecycle model of the kernel
works with almost no review from any kernel experts.

Certainly I object to it. I think others will too if they understood
what was going on here.

Write a position paper in Documentation/ on how you imagine lifecycle
will work with Rust driver bindings and get some acks from experienced
people??

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-07 12:48             ` Jason Gunthorpe
  2025-03-07 13:16               ` Simona Vetter
@ 2025-03-07 16:09               ` Danilo Krummrich
  2025-03-07 16:57                 ` Jason Gunthorpe
  1 sibling, 1 reply; 70+ messages in thread
From: Danilo Krummrich @ 2025-03-07 16:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Abdiel Janulgue, aliceryhl, robin.murphy, daniel.almeida,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS

On Fri, Mar 07, 2025 at 08:48:09AM -0400, Jason Gunthorpe wrote:
> On Fri, Mar 07, 2025 at 09:50:07AM +0100, Danilo Krummrich wrote:
> > > The actual critical region extends into the HW itself, it is not
> > > simple to model this with a pure SW construct of bracketing some
> > > allocation. You need to bracket the *entire lifecycle* of the
> > > dma_addr_t that has been returned and passed into HW, until the
> > > dma_addr_t is removed from HW.
> > 
> > Devres callbacks run after remove(). It's the drivers job to stop operating the
> > device latest in remove(). Which means that the design is correct.
> 
> It could be the drivers job to unmap the dma as well if you take that
> logic.

I really don't understand what you want: *You* brought up that the
CoherentAllocation is not allowed to out-live driver unbind.

We agreed and provided a way that solves this. But then you point out the
unsolvable problem of malicious (or wrongly programmed) hardware and use it to
question why we then even bother solving the problem you just pointed out, that
is solvable.

So, what do you ask for?

> You still didn't answer the question, what is the critical region of
> the DevRes for a dma_alloc_coherent() actually going to protect?

Devres, just like in C, ensures that an object can't out-live driver unbind. The
RCU read side critical section is to revoke access to the then invalid pointer
of the object.

C leaves you with an invalid pointer, whereas Rust revokes the access to the
invalid pointer for safety reasons. The pointer is never written to, except for
on driver unbind, hence RCU.

We discussed all this in other threads already.

> 
> You also have to urgently fix the synchronize_rcu() repitition of you
> plan to do this.

I mentioned this a few days ago, and I did not forget it. :-)

> 
> > Now, you ask for a step further, i.e. make it that we can enforce that a driver
> > actually stopped the device in remove().
> 
> So where do you draw the line on bugs Rust should prevent and bugs
> Rust requires the programmer to fix?

It should prevent all safety related bug, but the one above is impossible to
solve, so we have to live with it. But that doesn't mean it's a justification to
stop preventing bugs we can actually prevent? Do you disagree?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-07 16:09               ` Danilo Krummrich
@ 2025-03-07 16:57                 ` Jason Gunthorpe
  2025-03-07 19:03                   ` Danilo Krummrich
  0 siblings, 1 reply; 70+ messages in thread
From: Jason Gunthorpe @ 2025-03-07 16:57 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Abdiel Janulgue, aliceryhl, robin.murphy, daniel.almeida,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS

On Fri, Mar 07, 2025 at 05:09:17PM +0100, Danilo Krummrich wrote:
> On Fri, Mar 07, 2025 at 08:48:09AM -0400, Jason Gunthorpe wrote:
> > On Fri, Mar 07, 2025 at 09:50:07AM +0100, Danilo Krummrich wrote:
> > > > The actual critical region extends into the HW itself, it is not
> > > > simple to model this with a pure SW construct of bracketing some
> > > > allocation. You need to bracket the *entire lifecycle* of the
> > > > dma_addr_t that has been returned and passed into HW, until the
> > > > dma_addr_t is removed from HW.
> > > 
> > > Devres callbacks run after remove(). It's the drivers job to stop operating the
> > > device latest in remove(). Which means that the design is correct.
> > 
> > It could be the drivers job to unmap the dma as well if you take that
> > logic.
> 
> I really don't understand what you want: *You* brought up that the
> CoherentAllocation is not allowed to out-live driver unbind.

Really? I don't want you to use revoke to solve these problems when
the kernel design pattern is fence.

I thought that was clear.

> > You still didn't answer the question, what is the critical region of
> > the DevRes for a dma_alloc_coherent() actually going to protect?
> 
> Devres, just like in C, ensures that an object can't out-live driver unbind. The
> RCU read side critical section is to revoke access to the then invalid pointer
> of the object.
> 
> C leaves you with an invalid pointer, whereas Rust revokes the access to the
> invalid pointer for safety reasons. The pointer is never written to, except for
> on driver unbind, hence RCU.
> 
> We discussed all this in other threads already.

Why are you explaining very simple concepts as though I do not
understand how RCU or devm works?

I asked you what you intend to protect with the critical region.

I belive you intend to wrapper every memcpy/etc of the allocated
coherent memory with a RCU critical section, correct?

Meaning something like:

  mem.ptr = dma_alloc_coherent(&handle)
  make_hw_do_dma(handle)

  start RCU critical section on mem:
      copy_to_user(mem.ptr) // Sleeps! Can't do it!
  dma_free_coherent(mem, handle)

Right?

Further, if the critical section ever fails to obtain mem.ptr the
above code is *BUGGY* because it has left a HW DMA running, UAF'd the
now free'd buffer *and the driver author cannot fix it*.

This is an API design that is impossible for a driver author to use
correctly.

Even worse it actively discourages the driver author from thinking
about the lifetime issues at work here because it has this magical
critical section that advertised to provide safety, but actually has a
great big hole in it that the driver author has to understand and
mitigate.

I don't care one bit if the HW UAF issue is in scope or out for Rust -
I *EXPECT* driver authors to prevent it regardless.

> It should prevent all safety related bug, but the one above is impossible to
> solve, so we have to live with it. 

You have to live with it, but you should not *ignore* it and should
try to make the problem visible to the driver author and provide
assistance to implement the correct design patterns that do address
it.

Revoke is doing the opposite in my opinion.

In any event, I have to leave the keyboard for some travel, so this
will probably be my last posting on this topic.

Regards,
Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-07 14:38                 ` Jason Gunthorpe
@ 2025-03-07 17:30                   ` Danilo Krummrich
  2025-03-07 18:02                     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 70+ messages in thread
From: Danilo Krummrich @ 2025-03-07 17:30 UTC (permalink / raw)
  To: Jason Gunthorpe, Greg Kroah-Hartman
  Cc: Abdiel Janulgue, aliceryhl, robin.murphy, daniel.almeida,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS

On Fri, Mar 07, 2025 at 10:38:21AM -0400, Jason Gunthorpe wrote:
> 
> This Rust direction is radically transforming how the driver lifecycle model
> of the kernel works

I already explained in other threads that the Rust abstractions follow the device
/ driver model. There is no "radical change" to it. Since I assume you still
refer to the Devres aspect of it, please find another explanation in [1].

But please also show in detail where you see the "radical change" of the driver
model, maybe there's some misunderstanding?

> with almost no review from any kernel experts.

Do you question Greg being an expert on the driver model? Who do you think
applied the patches after month of discussions on the mailing list and at
conferences?

> Certainly I object to it. I think others will too if they understood
> what was going on here.

Again, this stuff was on the corresponding mailing lists for month; I haven't
seen such objections.

I'm absolutely willing to address yours though. And I'm happy to be questioned
on this stuff by you and in general.

But please start doing so in a constructive way, i.e. if you find issues help
solving them instead of just trying to blame people. If you think there is no
issue, but still think another approach would be better, please send patches.

> Write a position paper in Documentation/ on how you imagine lifecycle
> will work with Rust driver bindings

Rust abstractions follow the driver model, just like the C code does.

What - and please provide details - should this position paper describe?
The driver model itself, implementation details of the Rust abstractions or
something else?

> and get some acks from experienced people??

Again, this stuff was on the list for month, I can't force people to review it.

Do you have a list of those experienced people? Maybe you can get them to
revisit things and contribute improvements?

However, let me also say that experienced people *did* review it and work on it.

---

[1] Devres - C vs. Rust
-----------------------

Starting with C, let's pick the following two functions.

	pcim_iomap()
	pcim_request_region()

Those two are called from probe() and the pointer returned from pcim_iomap() is
stored in some driver specific structure, which, depending on the subsystem and
driver, may out-live driver unbind.

If the driver is unbound the following functions are called automatically after
remove() from the corresponding devres callbacks.

	pci_iounmap()
	pci_release_region()

The pointer in the driver specific structure (if still existent) becomes
invalid.

In Rust the lifecycle of the I/O memory mapping and the resource region are
bound to a structure called pci::Bar.

Creating a new pci::Bar calls pci_iomap() and pci_request_region(). Dropping the
object calls pci_iounmap() and pci_release_region(). The pointer to the memory
mapping is embedded in the pci::Bar object.

The driver model prescribes that device resources must be released when the
driver is unbound. Hence, we can't hand out a raw pci::Bar object to driver,
because the object lifetime could be extended across the "driver is unbound"
boundary.

This is why we only ever give out the pci::Bar object in a Devres container,
i.e. Devres<pci::Bar>.

Devres puts the pci::Bar in a Revocable and sets up the devres callback. Once
the devres callback is called the embedded pci::Bar is dropped, which calls
pci_iounmap() and pci_release_region().

Subsequently, the access to the pci::Bar for the owner of the Devres<pci::Bar>
object is revoked, since the pointer to the memory mapping within the pci::Bar
just became invalid.

The latter is the only additional step the Rust abstraction does, in order to
not leave drivers with an invalid pointer. However, this additional safety
aspect is *not* a change of the driver model itself.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-07 17:30                   ` Danilo Krummrich
@ 2025-03-07 18:02                     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 70+ messages in thread
From: Greg Kroah-Hartman @ 2025-03-07 18:02 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Jason Gunthorpe, Abdiel Janulgue, aliceryhl, robin.murphy,
	daniel.almeida, rust-for-linux, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Valentin Obst, open list,
	Christoph Hellwig, Marek Szyprowski, airlied,
	open list:DMA MAPPING HELPERS

On Fri, Mar 07, 2025 at 06:30:13PM +0100, Danilo Krummrich wrote:
> On Fri, Mar 07, 2025 at 10:38:21AM -0400, Jason Gunthorpe wrote:
> Devres puts the pci::Bar in a Revocable and sets up the devres callback. Once
> the devres callback is called the embedded pci::Bar is dropped, which calls
> pci_iounmap() and pci_release_region().
> 
> Subsequently, the access to the pci::Bar for the owner of the Devres<pci::Bar>
> object is revoked, since the pointer to the memory mapping within the pci::Bar
> just became invalid.
> 
> The latter is the only additional step the Rust abstraction does, in order to
> not leave drivers with an invalid pointer. However, this additional safety
> aspect is *not* a change of the driver model itself.

Correct, it's one that the C driver model code wishes it could have :)

Danilo is totally right here.  There's many things "wrong" with the
driver model that we have today when dealing with resources that can
span multiple reference counted devices.  It's always been that way,
heck the original coding style document warned about that very thing way
back in the early 2000's!  The rust code here is a step forward in
making those issues go away, which is a good thing, and should not be
chastised because somehow it doesn't solve _all_ of our problems.

Let's try this out for now.  If it doesn't work out, great, we'll fix it
and update the drivers and move on.  It's not like we haven't been doing
that for the past 30+ years now, nothing different here at all.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.
  2025-03-07 16:57                 ` Jason Gunthorpe
@ 2025-03-07 19:03                   ` Danilo Krummrich
  0 siblings, 0 replies; 70+ messages in thread
From: Danilo Krummrich @ 2025-03-07 19:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Abdiel Janulgue, aliceryhl, robin.murphy, daniel.almeida,
	rust-for-linux, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Valentin Obst, open list, Christoph Hellwig,
	Marek Szyprowski, airlied, open list:DMA MAPPING HELPERS

On Fri, Mar 07, 2025 at 12:57:51PM -0400, Jason Gunthorpe wrote:
> 
> Why are you explaining very simple concepts as though I do not
> understand how RCU or devm works?
> 
> I asked you what you intend to protect with the critical region.

When you ask what the critical region protects, I think that it protects the
resource pointer from changing.

I did not read it as "what's meant to be within the critical region".

> 
> I belive you intend to wrapper every memcpy/etc of the allocated
> coherent memory with a RCU critical section, correct?
> 
> Meaning something like:
> 
>   mem.ptr = dma_alloc_coherent(&handle)
>   make_hw_do_dma(handle)
> 
>   start RCU critical section on mem:
>       copy_to_user(mem.ptr) // Sleeps! Can't do it!
>   dma_free_coherent(mem, handle)
> 
> Right?

Yes, that would indeed be a problem. Thanks for pointing it out.

While we could do an SRCU variant, provide separate try_access() methods, etc.,
I think we should do something more efficient:

There is no reason to revoke the *whole* CoherentAllocation object, but only the
the parts that are critical to actually cleanup on driver unbind, i.e. iommu
mappings, etc.

The actual memory allocation itself is not an issue in terms of living beyond
driver unbind and hence doesn't need to be revoked.

With this, you would not need any critical section to access the
CoherentAllocation's memory from the driver.

> Further, if the critical section ever fails to obtain mem.ptr the
> above code is *BUGGY* because it has left a HW DMA running, UAF'd the
> now free'd buffer *and the driver author cannot fix it*.

I don't think that'd be the case in a Rust driver, your example is in C, and
hence doesn't do the Rust style error and cleanup handling that the
corresponding Rust code would do.

But as mentioned above, putting the whole CoherentAllocation in a Devres
container seems wrong anyways. We need to do it at a finer granularity.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Allow data races on some read/write operations
  2025-03-07  8:43                                       ` Andreas Hindborg
@ 2025-03-18 14:44                                         ` Ralf Jung
  0 siblings, 0 replies; 70+ messages in thread
From: Ralf Jung @ 2025-03-18 14:44 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Alice Ryhl, Boqun Feng, comex, Daniel Almeida, Benno Lossin,
	Abdiel Janulgue, dakr, robin.murphy, rust-for-linux, Miguel Ojeda,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Trevor Gross,
	Valentin Obst, linux-kernel, Christoph Hellwig, Marek Szyprowski,
	airlied, iommu, lkmm

Hi all,

>>> I
>>> may even later copy the data at place B to place C where C might have
>>> concurrent reads and/or writes, and the kernel will not experience UB
>>> because of this. The data may be garbage, but that is fine. I am not
>>> interpreting the data, or making control flow decisions based on it. I
>>> am just moving the data.
>>>
>>> My understand is: In Rust, this program would be illegal and might
>>> experience UB in unpredictable ways, not limited to just the data that
>>> is being moved.
>>
>> That is correct. C and Rust behave the same here.
> 
> Is there a difference between formal models of the languages and
> practical implementations of the languages here? I'm asking this because
> C kernel developers seem to be writing these programs that are illegal
> under the formal spec of the C language, but work well in practice.
> Could it be the same in Rust?
> 
> That is, can I do this copy and get away with it in practice under the
> circumstances outlined earlier?

As with off-label drug usage, things can of course go well even if you 
deliberately leave the range of well-defined usage defined by the manufacturer.
However, answering your question conclusively requires intimate knowledge of the 
entire compilation chain. I'm not even sure if there's a single person that has 
everything from front-end transformations to back-end lowering in their head...
At the scale that compilers have reached, I think we have to compartmentalize by 
establishing abstractions (such as the Rust / C language specs, and the LLVM IR 
language spec). This enables each part of the compiler to locally ensure their 
consistency with the spec (hopefully that one part still fits in one person's 
head), and as long as everyone uses the same spec and interprets it the same 
way, we achieve a consistent end-to-end result from many individually consistent 
pieces.

Personally my goal has always been to identify the cases where programmers 
deliberately reach for such off-label usage, figure out the missing parts in the 
language that motivate them to do this, and add them, so that we can move on 
having everything on solid footing. :)   I did not realize that atomic memcpy is 
so crucial for the kernel, but it makes sense in hindsight. So IMO that is where 
we should spend our effort, rather than digging through the entire compilation 
pipeline to determine some works-in-practice off-label alternative.

>>> One option I have explored is just calling C memcpy directly, but
>>> because of LTO, that is no different than doing the operation in Rust.
>>>
>>> I don't think I need atomic memcpy, I just need my program not to
>>> explode if I move some data to or from a place that is experiencing
>>> concurrent writes without synchronization. Not in general, but for some
>>> special cases where I promise not to look at the data outside of moving
>>> it.
>>
>> I'm afraid I do not know of a language, other than assembly, that can provide this.
>>
>> Atomic memcpy, however, should be able to cover your use-case, so it seems like
>> a reasonable solution to me? Marking things as atomic is literally how you tell
>> the compiler "don't blow up if there are concurrent accesses".
> 
> If atomic memcpy is what we really need to write these kinds of programs in
> Rust, what would be the next steps to get this in the language?

There is an RFC, but it has been stalled for a while: 
<https://github.com/rust-lang/rfcs/pull/3301>. I do not know its exact status. 
It might be blocked on having this in the C++ model, though at least unstable 
experimentation should be possible before C++ has fully standardized the way 
this will look. (We'll want to ensure consistency of the C++ and Rust models 
here to ensure that C, C++, and Rust can interop on shared memory in a coherent 
way.)
On the C++ side (where the atomic memcpy would likely be added to the 
concurrency memory model, to be then adopted by C and Rust), I heard there was a 
lot of non-technical trouble due to ISO changing their procedural rules for how 
they wanted changes to the standard to look like. I don't know any further 
details here as I am not directly involved.

> Also, would there be a performance price to pay for this?

I know little about evaluating performance at the low-level architectural or 
even microarchitectural level. However I would think in the end the memcpy 
itself (when using the "relaxed" atomic ordering) would be the same existing 
operation, the same assembly, it is just treated differently by optimizations 
before reaching the assembly stage.

Kind regards,
Ralf

> 
> 
> Best regards,
> Andreas Hindborg
> 
> 
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2025-03-18 14:53 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-24 11:49 [PATCH v12 0/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
2025-02-24 11:49 ` [PATCH v12 1/3] rust: error: Add EOVERFLOW Abdiel Janulgue
2025-02-24 13:11   ` Andreas Hindborg
2025-02-24 11:49 ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
2025-02-24 13:21   ` Alice Ryhl
2025-02-24 16:27     ` Abdiel Janulgue
2025-02-24 13:30   ` QUENTIN BOYER
2025-02-24 16:30     ` Abdiel Janulgue
2025-02-24 14:40   ` Andreas Hindborg
2025-02-24 16:27     ` Abdiel Janulgue
2025-02-24 22:35       ` Daniel Almeida
2025-02-28  8:35       ` Alexandre Courbot
2025-02-28 10:01         ` Danilo Krummrich
2025-02-24 20:07   ` Benno Lossin
2025-02-24 21:40     ` Miguel Ojeda
2025-02-24 23:12     ` Daniel Almeida
2025-03-03 13:00       ` Andreas Hindborg
2025-03-03 13:13         ` Alice Ryhl
2025-03-03 15:21           ` Andreas Hindborg
2025-03-03 15:44             ` Alice Ryhl
2025-03-03 18:45               ` Andreas Hindborg
2025-03-03 19:00               ` Allow data races on some read/write operations Andreas Hindborg
2025-03-03 20:08                 ` Boqun Feng
2025-03-04 19:03                   ` Ralf Jung
2025-03-04 20:18                     ` comex
2025-03-05  3:24                       ` Boqun Feng
2025-03-05 13:10                         ` Ralf Jung
2025-03-05 13:23                           ` Alice Ryhl
2025-03-05 13:27                             ` Ralf Jung
2025-03-05 14:40                               ` Robin Murphy
2025-03-05 18:43                               ` Andreas Hindborg
2025-03-05 19:30                                 ` Alan Stern
2025-03-05 19:42                                 ` Ralf Jung
2025-03-05 21:26                                   ` Andreas Hindborg
2025-03-05 21:53                                     ` Ralf Jung
2025-03-07  8:43                                       ` Andreas Hindborg
2025-03-18 14:44                                         ` Ralf Jung
2025-03-05 18:41                             ` Andreas Hindborg
2025-03-05 14:25                           ` Daniel Almeida
2025-03-05 18:38                           ` Andreas Hindborg
2025-03-05 22:01                             ` Ralf Jung
2025-03-04  8:28           ` [PATCH v12 2/3] rust: add dma coherent allocator abstraction Abdiel Janulgue
2025-02-25  8:15     ` Abdiel Janulgue
2025-02-25  9:09       ` Alice Ryhl
2025-02-24 22:05   ` Miguel Ojeda
2025-02-25  8:15     ` Abdiel Janulgue
2025-03-03 11:30   ` Andreas Hindborg
2025-03-04  8:58     ` Abdiel Janulgue
2025-03-03 13:08   ` Robin Murphy
2025-03-05 17:41   ` Jason Gunthorpe
2025-03-06 13:37     ` Danilo Krummrich
2025-03-06 15:21       ` Simona Vetter
2025-03-06 15:49         ` Danilo Krummrich
2025-03-06 15:54         ` Danilo Krummrich
2025-03-06 16:18           ` Jason Gunthorpe
2025-03-06 16:34             ` Danilo Krummrich
2025-03-07 10:20               ` Simona Vetter
2025-03-06 16:09         ` Jason Gunthorpe
2025-03-07  8:50           ` Danilo Krummrich
2025-03-07 10:18             ` Simona Vetter
2025-03-07 12:48             ` Jason Gunthorpe
2025-03-07 13:16               ` Simona Vetter
2025-03-07 14:38                 ` Jason Gunthorpe
2025-03-07 17:30                   ` Danilo Krummrich
2025-03-07 18:02                     ` Greg Kroah-Hartman
2025-03-07 16:09               ` Danilo Krummrich
2025-03-07 16:57                 ` Jason Gunthorpe
2025-03-07 19:03                   ` Danilo Krummrich
2025-02-24 11:49 ` [PATCH v12 3/3] MAINTAINERS: add entry for Rust dma mapping helpers device driver API Abdiel Janulgue
2025-02-24 13:10   ` Andreas Hindborg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).