linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/19] Rust abstractions for VFS
@ 2023-10-18 12:24 Wedson Almeida Filho
  2023-10-18 12:25 ` [RFC PATCH 01/19] rust: fs: add registration/unregistration of file systems Wedson Almeida Filho
                   ` (20 more replies)
  0 siblings, 21 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:24 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

This series introduces Rust abstractions that allow page-cache-backed read-only
file systems to be written in Rust.

There are two file systems that are built on top of these abstractions: tarfs
and puzzlefs. The former has zero unsafe blocks and is included as a patch in
this series; the latter is described elsewhere [1]. We limit the functionality
to the bare minimum needed to implement them.

Rust file system modules can be declared with the `module_fs` macro and are
required to implement the following functions (which are part of the
`FileSystem` trait):

impl FileSystem for MyFS {
    fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>>;
    fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>>;
    fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result;
    fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>>;
    fn read_folio(inode: &INode<Self>, folio: LockedFolio<'_>) -> Result;
}

They can optionally implement the following:

fn read_xattr(inode: &INode<Self>, name: &CStr, outbuf: &mut [u8]) -> Result<usize>;
fn statfs(sb: &SuperBlock<Self>) -> Result<Stat>;

They may also choose the type of the data they can attach to superblocks and/or
inodes.

There a couple of issues that are likely to lead to unsoundness that have to do
with the unregistration of file systems. I will send separate emails about
them.

A git tree is available here:
    git://github.com/wedsonaf/linux.git vfs

Web:
    https://github.com/wedsonaf/linux/commits/vfs

[1]: The PuzzleFS container filesystem: https://lwn.net/Articles/945320/

Wedson Almeida Filho (19):
  rust: fs: add registration/unregistration of file systems
  rust: fs: introduce the `module_fs` macro
  samples: rust: add initial ro file system sample
  rust: fs: introduce `FileSystem::super_params`
  rust: fs: introduce `INode<T>`
  rust: fs: introduce `FileSystem::init_root`
  rust: fs: introduce `FileSystem::read_dir`
  rust: fs: introduce `FileSystem::lookup`
  rust: folio: introduce basic support for folios
  rust: fs: introduce `FileSystem::read_folio`
  rust: fs: introduce `FileSystem::read_xattr`
  rust: fs: introduce `FileSystem::statfs`
  rust: fs: introduce more inode types
  rust: fs: add per-superblock data
  rust: fs: add basic support for fs buffer heads
  rust: fs: allow file systems backed by a block device
  rust: fs: allow per-inode data
  rust: fs: export file type from mode constants
  tarfs: introduce tar fs

 fs/Kconfig                        |    1 +
 fs/Makefile                       |    1 +
 fs/tarfs/Kconfig                  |   16 +
 fs/tarfs/Makefile                 |    8 +
 fs/tarfs/defs.rs                  |   80 ++
 fs/tarfs/tar.rs                   |  322 +++++++
 rust/bindings/bindings_helper.h   |   13 +
 rust/bindings/lib.rs              |    6 +
 rust/helpers.c                    |  142 ++++
 rust/kernel/error.rs              |    6 +-
 rust/kernel/folio.rs              |  214 +++++
 rust/kernel/fs.rs                 | 1290 +++++++++++++++++++++++++++++
 rust/kernel/fs/buffer.rs          |   60 ++
 rust/kernel/lib.rs                |    2 +
 rust/kernel/mem_cache.rs          |    2 -
 samples/rust/Kconfig              |   10 +
 samples/rust/Makefile             |    1 +
 samples/rust/rust_rofs.rs         |  154 ++++
 scripts/generate_rust_analyzer.py |    2 +-
 19 files changed, 2324 insertions(+), 6 deletions(-)
 create mode 100644 fs/tarfs/Kconfig
 create mode 100644 fs/tarfs/Makefile
 create mode 100644 fs/tarfs/defs.rs
 create mode 100644 fs/tarfs/tar.rs
 create mode 100644 rust/kernel/folio.rs
 create mode 100644 rust/kernel/fs.rs
 create mode 100644 rust/kernel/fs/buffer.rs
 create mode 100644 samples/rust/rust_rofs.rs


base-commit: b0bc357ef7a98904600826dea3de79c0c67eb0a7
-- 
2.34.1


^ permalink raw reply	[flat|nested] 125+ messages in thread

* [RFC PATCH 01/19] rust: fs: add registration/unregistration of file systems
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-18 15:38   ` Benno Lossin
  2023-10-18 12:25 ` [RFC PATCH 02/19] rust: fs: introduce the `module_fs` macro Wedson Almeida Filho
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow basic registration and unregistration of Rust file system types.
Unregistration happens automatically when a registration variable is
dropped (e.g., when it goes out of scope).

File systems registered this way are visible in `/proc/filesystems` but
cannot be mounted yet because `init_fs_context` fails.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |  1 +
 rust/kernel/error.rs            |  2 -
 rust/kernel/fs.rs               | 80 +++++++++++++++++++++++++++++++++
 rust/kernel/lib.rs              |  1 +
 4 files changed, 82 insertions(+), 2 deletions(-)
 create mode 100644 rust/kernel/fs.rs

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 3b620ae07021..9c23037b33d0 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -8,6 +8,7 @@
 
 #include <kunit/test.h>
 #include <linux/errname.h>
+#include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/refcount.h>
 #include <linux/wait.h>
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 05fcab6abfe6..e6d7ce46be55 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -320,8 +320,6 @@ pub(crate) fn from_err_ptr<T>(ptr: *mut T) -> Result<*mut T> {
 ///     })
 /// }
 /// ```
-// TODO: Remove `dead_code` marker once an in-kernel client is available.
-#[allow(dead_code)]
 pub(crate) fn from_result<T, F>(f: F) -> T
 where
     T: From<i16>,
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
new file mode 100644
index 000000000000..f3fb09db41ba
--- /dev/null
+++ b/rust/kernel/fs.rs
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Kernel file systems.
+//!
+//! This module allows Rust code to register new kernel file systems.
+//!
+//! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
+
+use crate::error::{code::*, from_result, to_result, Error};
+use crate::types::Opaque;
+use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
+use core::{marker::PhantomPinned, pin::Pin};
+use macros::{pin_data, pinned_drop};
+
+/// A file system type.
+pub trait FileSystem {
+    /// The name of the file system type.
+    const NAME: &'static CStr;
+}
+
+/// A registration of a file system.
+#[pin_data(PinnedDrop)]
+pub struct Registration {
+    #[pin]
+    fs: Opaque<bindings::file_system_type>,
+    #[pin]
+    _pin: PhantomPinned,
+}
+
+// SAFETY: `Registration` doesn't provide any `&self` methods, so it is safe to pass references
+// to it around.
+unsafe impl Sync for Registration {}
+
+// SAFETY: Both registration and unregistration are implemented in C and safe to be performed
+// from any thread, so `Registration` is `Send`.
+unsafe impl Send for Registration {}
+
+impl Registration {
+    /// Creates the initialiser of a new file system registration.
+    pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<Self, Error> {
+        try_pin_init!(Self {
+            _pin: PhantomPinned,
+            fs <- Opaque::try_ffi_init(|fs_ptr: *mut bindings::file_system_type| {
+                // SAFETY: `try_ffi_init` guarantees that `fs_ptr` is valid for write.
+                unsafe { fs_ptr.write(bindings::file_system_type::default()) };
+
+                // SAFETY: `try_ffi_init` guarantees that `fs_ptr` is valid for write, and it has
+                // just been initialised above, so it's also valid for read.
+                let fs = unsafe { &mut *fs_ptr };
+                fs.owner = module.0;
+                fs.name = T::NAME.as_char_ptr();
+                fs.init_fs_context = Some(Self::init_fs_context_callback);
+                fs.kill_sb = Some(Self::kill_sb_callback);
+                fs.fs_flags = 0;
+
+                // SAFETY: Pointers stored in `fs` are static so will live for as long as the
+                // registration is active (it is undone in `drop`).
+                to_result(unsafe { bindings::register_filesystem(fs_ptr) })
+            }),
+        })
+    }
+
+    unsafe extern "C" fn init_fs_context_callback(
+        _fc_ptr: *mut bindings::fs_context,
+    ) -> core::ffi::c_int {
+        from_result(|| Err(ENOTSUPP))
+    }
+
+    unsafe extern "C" fn kill_sb_callback(_sb_ptr: *mut bindings::super_block) {}
+}
+
+#[pinned_drop]
+impl PinnedDrop for Registration {
+    fn drop(self: Pin<&mut Self>) {
+        // SAFETY: If an instance of `Self` has been successfully created, a call to
+        // `register_filesystem` has necessarily succeeded. So it's ok to call
+        // `unregister_filesystem` on the previously registered fs.
+        unsafe { bindings::unregister_filesystem(self.fs.get()) };
+    }
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 187d58f906a5..00059b80c240 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -34,6 +34,7 @@
 mod allocator;
 mod build_assert;
 pub mod error;
+pub mod fs;
 pub mod init;
 pub mod ioctl;
 #[cfg(CONFIG_KUNIT)]
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 02/19] rust: fs: introduce the `module_fs` macro
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
  2023-10-18 12:25 ` [RFC PATCH 01/19] rust: fs: add registration/unregistration of file systems Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-18 12:25 ` [RFC PATCH 03/19] samples: rust: add initial ro file system sample Wedson Almeida Filho
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Simplify the declaration of modules that only expose a file system type.
They can now do it using the `module_fs` macro.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs.rs | 56 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index f3fb09db41ba..1df54c234101 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -9,7 +9,7 @@
 use crate::error::{code::*, from_result, to_result, Error};
 use crate::types::Opaque;
 use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
-use core::{marker::PhantomPinned, pin::Pin};
+use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin};
 use macros::{pin_data, pinned_drop};
 
 /// A file system type.
@@ -78,3 +78,57 @@ fn drop(self: Pin<&mut Self>) {
         unsafe { bindings::unregister_filesystem(self.fs.get()) };
     }
 }
+
+/// Kernel module that exposes a single file system implemented by `T`.
+#[pin_data]
+pub struct Module<T: FileSystem + ?Sized> {
+    #[pin]
+    fs_reg: Registration,
+    _p: PhantomData<T>,
+}
+
+impl<T: FileSystem + ?Sized + Sync + Send> crate::InPlaceModule for Module<T> {
+    fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
+        try_pin_init!(Self {
+            fs_reg <- Registration::new::<T>(module),
+            _p: PhantomData,
+        })
+    }
+}
+
+/// Declares a kernel module that exposes a single file system.
+///
+/// The `type` argument must be a type which implements the [`FileSystem`] trait. Also accepts
+/// various forms of kernel metadata.
+///
+/// # Examples
+///
+/// ```
+/// # mod module_fs_sample {
+/// use kernel::prelude::*;
+/// use kernel::{c_str, fs};
+///
+/// kernel::module_fs! {
+///     type: MyFs,
+///     name: "myfs",
+///     author: "Rust for Linux Contributors",
+///     description: "My Rust fs",
+///     license: "GPL",
+/// }
+///
+/// struct MyFs;
+/// impl fs::FileSystem for MyFs {
+///     const NAME: &'static CStr = c_str!("myfs");
+/// }
+/// # }
+/// ```
+#[macro_export]
+macro_rules! module_fs {
+    (type: $type:ty, $($f:tt)*) => {
+        type ModuleType = $crate::fs::Module<$type>;
+        $crate::macros::module! {
+            type: ModuleType,
+            $($f)*
+        }
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 03/19] samples: rust: add initial ro file system sample
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
  2023-10-18 12:25 ` [RFC PATCH 01/19] rust: fs: add registration/unregistration of file systems Wedson Almeida Filho
  2023-10-18 12:25 ` [RFC PATCH 02/19] rust: fs: introduce the `module_fs` macro Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-28 16:18   ` Alice Ryhl
  2023-10-18 12:25 ` [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params` Wedson Almeida Filho
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Introduce a basic sample that for now only registers the file system and
doesn't really provide any functionality beyond having it listed in
`/proc/filesystems`. New functionality will be added to the sample in
subsequent patches as their abstractions are introduced.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 samples/rust/Kconfig      | 10 ++++++++++
 samples/rust/Makefile     |  1 +
 samples/rust/rust_rofs.rs | 19 +++++++++++++++++++
 3 files changed, 30 insertions(+)
 create mode 100644 samples/rust/rust_rofs.rs

diff --git a/samples/rust/Kconfig b/samples/rust/Kconfig
index 59f44a8b6958..2f26c5c52813 100644
--- a/samples/rust/Kconfig
+++ b/samples/rust/Kconfig
@@ -41,6 +41,16 @@ config SAMPLE_RUST_PRINT
 
 	  If unsure, say N.
 
+config SAMPLE_RUST_ROFS
+	tristate "Read-only file system"
+	help
+	  This option builds the Rust read-only file system sample.
+
+	  To compile this as a module, choose M here:
+	  the module will be called rust_rofs.
+
+	  If unsure, say N.
+
 config SAMPLE_RUST_HOSTPROGS
 	bool "Host programs"
 	help
diff --git a/samples/rust/Makefile b/samples/rust/Makefile
index 791fc18180e9..df1e4341ae95 100644
--- a/samples/rust/Makefile
+++ b/samples/rust/Makefile
@@ -3,5 +3,6 @@
 obj-$(CONFIG_SAMPLE_RUST_MINIMAL)		+= rust_minimal.o
 obj-$(CONFIG_SAMPLE_RUST_INPLACE)		+= rust_inplace.o
 obj-$(CONFIG_SAMPLE_RUST_PRINT)			+= rust_print.o
+obj-$(CONFIG_SAMPLE_RUST_ROFS)			+= rust_rofs.o
 
 subdir-$(CONFIG_SAMPLE_RUST_HOSTPROGS)		+= hostprogs
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
new file mode 100644
index 000000000000..1c00b1da8b94
--- /dev/null
+++ b/samples/rust/rust_rofs.rs
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Rust read-only file system sample.
+
+use kernel::prelude::*;
+use kernel::{c_str, fs};
+
+kernel::module_fs! {
+    type: RoFs,
+    name: "rust_rofs",
+    author: "Rust for Linux Contributors",
+    description: "Rust read-only file system sample",
+    license: "GPL",
+}
+
+struct RoFs;
+impl fs::FileSystem for RoFs {
+    const NAME: &'static CStr = c_str!("rust-fs");
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params`
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (2 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 03/19] samples: rust: add initial ro file system sample Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-18 16:34   ` Benno Lossin
                     ` (2 more replies)
  2023-10-18 12:25 ` [RFC PATCH 05/19] rust: fs: introduce `INode<T>` Wedson Almeida Filho
                   ` (16 subsequent siblings)
  20 siblings, 3 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to initialise superblocks, which allows them
to be mounted (though they are still empty).

Some scaffolding code is added to create an empty directory as the root.
It is replaced by proper inode creation in a subsequent patch in this
series.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |   5 +
 rust/bindings/lib.rs            |   4 +
 rust/kernel/fs.rs               | 176 ++++++++++++++++++++++++++++++--
 samples/rust/rust_rofs.rs       |  10 ++
 4 files changed, 189 insertions(+), 6 deletions(-)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 9c23037b33d0..ca1898ce9527 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -9,6 +9,7 @@
 #include <kunit/test.h>
 #include <linux/errname.h>
 #include <linux/fs.h>
+#include <linux/fs_context.h>
 #include <linux/slab.h>
 #include <linux/refcount.h>
 #include <linux/wait.h>
@@ -22,3 +23,7 @@ const gfp_t BINDINGS___GFP_ZERO = __GFP_ZERO;
 const slab_flags_t BINDINGS_SLAB_RECLAIM_ACCOUNT = SLAB_RECLAIM_ACCOUNT;
 const slab_flags_t BINDINGS_SLAB_MEM_SPREAD = SLAB_MEM_SPREAD;
 const slab_flags_t BINDINGS_SLAB_ACCOUNT = SLAB_ACCOUNT;
+
+const unsigned long BINDINGS_SB_RDONLY = SB_RDONLY;
+
+const loff_t BINDINGS_MAX_LFS_FILESIZE = MAX_LFS_FILESIZE;
diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs
index 6a8c6cd17e45..426915d3fb57 100644
--- a/rust/bindings/lib.rs
+++ b/rust/bindings/lib.rs
@@ -55,3 +55,7 @@ mod bindings_helper {
 pub const SLAB_RECLAIM_ACCOUNT: slab_flags_t = BINDINGS_SLAB_RECLAIM_ACCOUNT;
 pub const SLAB_MEM_SPREAD: slab_flags_t = BINDINGS_SLAB_MEM_SPREAD;
 pub const SLAB_ACCOUNT: slab_flags_t = BINDINGS_SLAB_ACCOUNT;
+
+pub const SB_RDONLY: core::ffi::c_ulong = BINDINGS_SB_RDONLY;
+
+pub const MAX_LFS_FILESIZE: loff_t = BINDINGS_MAX_LFS_FILESIZE;
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 1df54c234101..31cf643aaded 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -6,16 +6,22 @@
 //!
 //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
 
-use crate::error::{code::*, from_result, to_result, Error};
+use crate::error::{code::*, from_result, to_result, Error, Result};
 use crate::types::Opaque;
 use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
 use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin};
 use macros::{pin_data, pinned_drop};
 
+/// Maximum size of an inode.
+pub const MAX_LFS_FILESIZE: i64 = bindings::MAX_LFS_FILESIZE;
+
 /// A file system type.
 pub trait FileSystem {
     /// The name of the file system type.
     const NAME: &'static CStr;
+
+    /// Returns the parameters to initialise a super block.
+    fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams>;
 }
 
 /// A registration of a file system.
@@ -49,7 +55,7 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
                 let fs = unsafe { &mut *fs_ptr };
                 fs.owner = module.0;
                 fs.name = T::NAME.as_char_ptr();
-                fs.init_fs_context = Some(Self::init_fs_context_callback);
+                fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
                 fs.kill_sb = Some(Self::kill_sb_callback);
                 fs.fs_flags = 0;
 
@@ -60,13 +66,22 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
         })
     }
 
-    unsafe extern "C" fn init_fs_context_callback(
-        _fc_ptr: *mut bindings::fs_context,
+    unsafe extern "C" fn init_fs_context_callback<T: FileSystem + ?Sized>(
+        fc_ptr: *mut bindings::fs_context,
     ) -> core::ffi::c_int {
-        from_result(|| Err(ENOTSUPP))
+        from_result(|| {
+            // SAFETY: The C callback API guarantees that `fc_ptr` is valid.
+            let fc = unsafe { &mut *fc_ptr };
+            fc.ops = &Tables::<T>::CONTEXT;
+            Ok(0)
+        })
     }
 
-    unsafe extern "C" fn kill_sb_callback(_sb_ptr: *mut bindings::super_block) {}
+    unsafe extern "C" fn kill_sb_callback(sb_ptr: *mut bindings::super_block) {
+        // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
+        // the appropriate function to call for cleanup.
+        unsafe { bindings::kill_anon_super(sb_ptr) };
+    }
 }
 
 #[pinned_drop]
@@ -79,6 +94,151 @@ fn drop(self: Pin<&mut Self>) {
     }
 }
 
+/// A file system super block.
+///
+/// Wraps the kernel's `struct super_block`.
+#[repr(transparent)]
+pub struct SuperBlock<T: FileSystem + ?Sized>(Opaque<bindings::super_block>, PhantomData<T>);
+
+/// Required superblock parameters.
+///
+/// This is returned by implementations of [`FileSystem::super_params`].
+pub struct SuperParams {
+    /// The magic number of the superblock.
+    pub magic: u32,
+
+    /// The size of a block in powers of 2 (i.e., for a value of `n`, the size is `2^n`).
+    pub blocksize_bits: u8,
+
+    /// Maximum size of a file.
+    ///
+    /// The maximum allowed value is [`MAX_LFS_FILESIZE`].
+    pub maxbytes: i64,
+
+    /// Granularity of c/m/atime in ns (cannot be worse than a second).
+    pub time_gran: u32,
+}
+
+/// A superblock that is still being initialised.
+///
+/// # Invariants
+///
+/// The superblock is a newly-created one and this is the only active pointer to it.
+#[repr(transparent)]
+pub struct NewSuperBlock<T: FileSystem + ?Sized>(bindings::super_block, PhantomData<T>);
+
+struct Tables<T: FileSystem + ?Sized>(T);
+impl<T: FileSystem + ?Sized> Tables<T> {
+    const CONTEXT: bindings::fs_context_operations = bindings::fs_context_operations {
+        free: None,
+        parse_param: None,
+        get_tree: Some(Self::get_tree_callback),
+        reconfigure: None,
+        parse_monolithic: None,
+        dup: None,
+    };
+
+    unsafe extern "C" fn get_tree_callback(fc: *mut bindings::fs_context) -> core::ffi::c_int {
+        // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
+        // the right type and is a valid callback.
+        unsafe { bindings::get_tree_nodev(fc, Some(Self::fill_super_callback)) }
+    }
+
+    unsafe extern "C" fn fill_super_callback(
+        sb_ptr: *mut bindings::super_block,
+        _fc: *mut bindings::fs_context,
+    ) -> core::ffi::c_int {
+        from_result(|| {
+            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
+            // newly-created superblock.
+            let sb = unsafe { &mut *sb_ptr.cast() };
+            let params = T::super_params(sb)?;
+
+            sb.0.s_magic = params.magic as _;
+            sb.0.s_op = &Tables::<T>::SUPER_BLOCK;
+            sb.0.s_maxbytes = params.maxbytes;
+            sb.0.s_time_gran = params.time_gran;
+            sb.0.s_blocksize_bits = params.blocksize_bits;
+            sb.0.s_blocksize = 1;
+            if sb.0.s_blocksize.leading_zeros() < params.blocksize_bits.into() {
+                return Err(EINVAL);
+            }
+            sb.0.s_blocksize = 1 << sb.0.s_blocksize_bits;
+            sb.0.s_flags |= bindings::SB_RDONLY;
+
+            // The following is scaffolding code that will be removed in a subsequent patch. It is
+            // needed to build a root dentry, otherwise core code will BUG().
+            // SAFETY: `sb` is the superblock being initialised, it is valid for read and write.
+            let inode = unsafe { bindings::new_inode(&mut sb.0) };
+            if inode.is_null() {
+                return Err(ENOMEM);
+            }
+
+            // SAFETY: `inode` is valid for write.
+            unsafe { bindings::set_nlink(inode, 2) };
+
+            {
+                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
+                // safe to mutably dereference it.
+                let inode = unsafe { &mut *inode };
+                inode.i_ino = 1;
+                inode.i_mode = (bindings::S_IFDIR | 0o755) as _;
+
+                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
+                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
+
+                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
+                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
+            }
+
+            // SAFETY: `d_make_root` requires that `inode` be valid and referenced, which is the
+            // case for this call.
+            //
+            // It takes over the inode, even on failure, so we don't need to clean it up.
+            let dentry = unsafe { bindings::d_make_root(inode) };
+            if dentry.is_null() {
+                return Err(ENOMEM);
+            }
+
+            sb.0.s_root = dentry;
+
+            Ok(0)
+        })
+    }
+
+    const SUPER_BLOCK: bindings::super_operations = bindings::super_operations {
+        alloc_inode: None,
+        destroy_inode: None,
+        free_inode: None,
+        dirty_inode: None,
+        write_inode: None,
+        drop_inode: None,
+        evict_inode: None,
+        put_super: None,
+        sync_fs: None,
+        freeze_super: None,
+        freeze_fs: None,
+        thaw_super: None,
+        unfreeze_fs: None,
+        statfs: None,
+        remount_fs: None,
+        umount_begin: None,
+        show_options: None,
+        show_devname: None,
+        show_path: None,
+        show_stats: None,
+        #[cfg(CONFIG_QUOTA)]
+        quota_read: None,
+        #[cfg(CONFIG_QUOTA)]
+        quota_write: None,
+        #[cfg(CONFIG_QUOTA)]
+        get_dquots: None,
+        nr_cached_objects: None,
+        free_cached_objects: None,
+        shutdown: None,
+    };
+}
+
 /// Kernel module that exposes a single file system implemented by `T`.
 #[pin_data]
 pub struct Module<T: FileSystem + ?Sized> {
@@ -105,6 +265,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///
 /// ```
 /// # mod module_fs_sample {
+/// use kernel::fs::{NewSuperBlock, SuperParams};
 /// use kernel::prelude::*;
 /// use kernel::{c_str, fs};
 ///
@@ -119,6 +280,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 /// struct MyFs;
 /// impl fs::FileSystem for MyFs {
 ///     const NAME: &'static CStr = c_str!("myfs");
+///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams> {
+///         todo!()
+///     }
 /// }
 /// # }
 /// ```
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 1c00b1da8b94..9878bf88b991 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -2,6 +2,7 @@
 
 //! Rust read-only file system sample.
 
+use kernel::fs::{NewSuperBlock, SuperParams};
 use kernel::prelude::*;
 use kernel::{c_str, fs};
 
@@ -16,4 +17,13 @@
 struct RoFs;
 impl fs::FileSystem for RoFs {
     const NAME: &'static CStr = c_str!("rust-fs");
+
+    fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams> {
+        Ok(SuperParams {
+            magic: 0x52555354,
+            blocksize_bits: 12,
+            maxbytes: fs::MAX_LFS_FILESIZE,
+            time_gran: 1,
+        })
+    }
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 05/19] rust: fs: introduce `INode<T>`
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (3 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params` Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-28 18:00   ` Alice Ryhl
                     ` (2 more replies)
  2023-10-18 12:25 ` [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root` Wedson Almeida Filho
                   ` (15 subsequent siblings)
  20 siblings, 3 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to handle typed and ref-counted inodes.

This is in preparation for creating new inodes (for example, to create
the root inode of a new superblock), which comes in the next patch in
the series.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c    |  7 +++++++
 rust/kernel/fs.rs | 53 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/rust/helpers.c b/rust/helpers.c
index 4c86fe4a7e05..fe45f8ddb31f 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -25,6 +25,7 @@
 #include <linux/build_bug.h>
 #include <linux/err.h>
 #include <linux/errname.h>
+#include <linux/fs.h>
 #include <linux/mutex.h>
 #include <linux/refcount.h>
 #include <linux/sched/signal.h>
@@ -144,6 +145,12 @@ struct kunit *rust_helper_kunit_get_current_test(void)
 }
 EXPORT_SYMBOL_GPL(rust_helper_kunit_get_current_test);
 
+off_t rust_helper_i_size_read(const struct inode *inode)
+{
+	return i_size_read(inode);
+}
+EXPORT_SYMBOL_GPL(rust_helper_i_size_read);
+
 /*
  * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
  * use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 31cf643aaded..30fa1f312f33 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -7,9 +7,9 @@
 //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
 
 use crate::error::{code::*, from_result, to_result, Error, Result};
-use crate::types::Opaque;
+use crate::types::{AlwaysRefCounted, Opaque};
 use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
-use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin};
+use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin, ptr};
 use macros::{pin_data, pinned_drop};
 
 /// Maximum size of an inode.
@@ -94,6 +94,55 @@ fn drop(self: Pin<&mut Self>) {
     }
 }
 
+/// The number of an inode.
+pub type Ino = u64;
+
+/// A node in the file system index (inode).
+///
+/// Wraps the kernel's `struct inode`.
+///
+/// # Invariants
+///
+/// Instances of this type are always ref-counted, that is, a call to `ihold` ensures that the
+/// allocation remains valid at least until the matching call to `iput`.
+#[repr(transparent)]
+pub struct INode<T: FileSystem + ?Sized>(Opaque<bindings::inode>, PhantomData<T>);
+
+impl<T: FileSystem + ?Sized> INode<T> {
+    /// Returns the number of the inode.
+    pub fn ino(&self) -> Ino {
+        // SAFETY: `i_ino` is immutable, and `self` is guaranteed to be valid by the existence of a
+        // shared reference (&self) to it.
+        unsafe { (*self.0.get()).i_ino }
+    }
+
+    /// Returns the super-block that owns the inode.
+    pub fn super_block(&self) -> &SuperBlock<T> {
+        // SAFETY: `i_sb` is immutable, and `self` is guaranteed to be valid by the existence of a
+        // shared reference (&self) to it.
+        unsafe { &*(*self.0.get()).i_sb.cast() }
+    }
+
+    /// Returns the size of the inode contents.
+    pub fn size(&self) -> i64 {
+        // SAFETY: `self` is guaranteed to be valid by the existence of a shared reference.
+        unsafe { bindings::i_size_read(self.0.get()) }
+    }
+}
+
+// SAFETY: The type invariants guarantee that `INode` is always ref-counted.
+unsafe impl<T: FileSystem + ?Sized> AlwaysRefCounted for INode<T> {
+    fn inc_ref(&self) {
+        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+        unsafe { bindings::ihold(self.0.get()) };
+    }
+
+    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
+        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
+        unsafe { bindings::iput(obj.cast().as_ptr()) }
+    }
+}
+
 /// A file system super block.
 ///
 /// Wraps the kernel's `struct super_block`.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (4 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 05/19] rust: fs: introduce `INode<T>` Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-19 14:30   ` Benno Lossin
                     ` (2 more replies)
  2023-10-18 12:25 ` [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir` Wedson Almeida Filho
                   ` (14 subsequent siblings)
  20 siblings, 3 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to specify their root directory. Also allow them
to create (and do cache lookups of) directory inodes. (More types of
inodes are added in subsequent patches in the series.)

The `NewINode` type ensures that a new inode is properly initialised
before it is marked so. It also facilitates error paths by automatically
marking inodes as failed if they're not properly initialised.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c            |  12 +++
 rust/kernel/fs.rs         | 178 +++++++++++++++++++++++++++++++-------
 samples/rust/rust_rofs.rs |  22 ++++-
 3 files changed, 181 insertions(+), 31 deletions(-)

diff --git a/rust/helpers.c b/rust/helpers.c
index fe45f8ddb31f..c5a2bec6467d 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -145,6 +145,18 @@ struct kunit *rust_helper_kunit_get_current_test(void)
 }
 EXPORT_SYMBOL_GPL(rust_helper_kunit_get_current_test);
 
+void rust_helper_i_uid_write(struct inode *inode, uid_t uid)
+{
+	i_uid_write(inode, uid);
+}
+EXPORT_SYMBOL_GPL(rust_helper_i_uid_write);
+
+void rust_helper_i_gid_write(struct inode *inode, gid_t gid)
+{
+	i_gid_write(inode, gid);
+}
+EXPORT_SYMBOL_GPL(rust_helper_i_gid_write);
+
 off_t rust_helper_i_size_read(const struct inode *inode)
 {
 	return i_size_read(inode);
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 30fa1f312f33..f3a41cf57502 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -7,9 +7,9 @@
 //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
 
 use crate::error::{code::*, from_result, to_result, Error, Result};
-use crate::types::{AlwaysRefCounted, Opaque};
-use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
-use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin, ptr};
+use crate::types::{ARef, AlwaysRefCounted, Either, Opaque};
+use crate::{bindings, init::PinInit, str::CStr, time::Timespec, try_pin_init, ThisModule};
+use core::{marker::PhantomData, marker::PhantomPinned, mem::ManuallyDrop, pin::Pin, ptr};
 use macros::{pin_data, pinned_drop};
 
 /// Maximum size of an inode.
@@ -22,6 +22,12 @@ pub trait FileSystem {
 
     /// Returns the parameters to initialise a super block.
     fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams>;
+
+    /// Initialises and returns the root inode of the given superblock.
+    ///
+    /// This is called during initialisation of a superblock after [`FileSystem::super_params`] has
+    /// completed successfully.
+    fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>>;
 }
 
 /// A registration of a file system.
@@ -143,12 +149,136 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
     }
 }
 
+/// An inode that is locked and hasn't been initialised yet.
+#[repr(transparent)]
+pub struct NewINode<T: FileSystem + ?Sized>(ARef<INode<T>>);
+
+impl<T: FileSystem + ?Sized> NewINode<T> {
+    /// Initialises the new inode with the given parameters.
+    pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
+        // SAFETY: This is a new inode, so it's safe to manipulate it mutably.
+        let inode = unsafe { &mut *self.0 .0.get() };
+
+        let mode = match params.typ {
+            INodeType::Dir => {
+                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
+                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
+
+                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
+                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
+                bindings::S_IFDIR
+            }
+        };
+
+        inode.i_mode = (params.mode & 0o777) | u16::try_from(mode)?;
+        inode.i_size = params.size;
+        inode.i_blocks = params.blocks;
+
+        inode.__i_ctime = params.ctime.into();
+        inode.i_mtime = params.mtime.into();
+        inode.i_atime = params.atime.into();
+
+        // SAFETY: inode is a new inode, so it is valid for write.
+        unsafe {
+            bindings::set_nlink(inode, params.nlink);
+            bindings::i_uid_write(inode, params.uid);
+            bindings::i_gid_write(inode, params.gid);
+            bindings::unlock_new_inode(inode);
+        }
+
+        // SAFETY: We are manually destructuring `self` and preventing `drop` from being called.
+        Ok(unsafe { (&ManuallyDrop::new(self).0 as *const ARef<INode<T>>).read() })
+    }
+}
+
+impl<T: FileSystem + ?Sized> Drop for NewINode<T> {
+    fn drop(&mut self) {
+        // SAFETY: The new inode failed to be turned into an initialised inode, so it's safe (and
+        // in fact required) to call `iget_failed` on it.
+        unsafe { bindings::iget_failed(self.0 .0.get()) };
+    }
+}
+
+/// The type of the inode.
+#[derive(Copy, Clone)]
+pub enum INodeType {
+    /// Directory type.
+    Dir,
+}
+
+/// Required inode parameters.
+///
+/// This is used when creating new inodes.
+pub struct INodeParams {
+    /// The access mode. It's a mask that grants execute (1), write (2) and read (4) access to
+    /// everyone, the owner group, and the owner.
+    pub mode: u16,
+
+    /// Type of inode.
+    ///
+    /// Also carries additional per-type data.
+    pub typ: INodeType,
+
+    /// Size of the contents of the inode.
+    ///
+    /// Its maximum value is [`MAX_LFS_FILESIZE`].
+    pub size: i64,
+
+    /// Number of blocks.
+    pub blocks: u64,
+
+    /// Number of links to the inode.
+    pub nlink: u32,
+
+    /// User id.
+    pub uid: u32,
+
+    /// Group id.
+    pub gid: u32,
+
+    /// Creation time.
+    pub ctime: Timespec,
+
+    /// Last modification time.
+    pub mtime: Timespec,
+
+    /// Last access time.
+    pub atime: Timespec,
+}
+
 /// A file system super block.
 ///
 /// Wraps the kernel's `struct super_block`.
 #[repr(transparent)]
 pub struct SuperBlock<T: FileSystem + ?Sized>(Opaque<bindings::super_block>, PhantomData<T>);
 
+impl<T: FileSystem + ?Sized> SuperBlock<T> {
+    /// Tries to get an existing inode or create a new one if it doesn't exist yet.
+    pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, NewINode<T>>> {
+        // SAFETY: The only initialisation missing from the superblock is the root, and this
+        // function is needed to create the root, so it's safe to call it.
+        let inode =
+            ptr::NonNull::new(unsafe { bindings::iget_locked(self.0.get(), ino) }).ok_or(ENOMEM)?;
+
+        // SAFETY: `inode` is valid for read, but there could be concurrent writers (e.g., if it's
+        // an already-initialised inode), so we use `read_volatile` to read its current state.
+        let state = unsafe { ptr::read_volatile(ptr::addr_of!((*inode.as_ptr()).i_state)) };
+        if state & u64::from(bindings::I_NEW) == 0 {
+            // The inode is cached. Just return it.
+            //
+            // SAFETY: `inode` had its refcount incremented by `iget_locked`; this increment is now
+            // owned by `ARef`.
+            Ok(Either::Left(unsafe { ARef::from_raw(inode.cast()) }))
+        } else {
+            // SAFETY: The new inode is valid but not fully initialised yet, so it's ok to create a
+            // `NewINode`.
+            Ok(Either::Right(NewINode(unsafe {
+                ARef::from_raw(inode.cast())
+            })))
+        }
+    }
+}
+
 /// Required superblock parameters.
 ///
 /// This is returned by implementations of [`FileSystem::super_params`].
@@ -215,41 +345,28 @@ impl<T: FileSystem + ?Sized> Tables<T> {
             sb.0.s_blocksize = 1 << sb.0.s_blocksize_bits;
             sb.0.s_flags |= bindings::SB_RDONLY;
 
-            // The following is scaffolding code that will be removed in a subsequent patch. It is
-            // needed to build a root dentry, otherwise core code will BUG().
-            // SAFETY: `sb` is the superblock being initialised, it is valid for read and write.
-            let inode = unsafe { bindings::new_inode(&mut sb.0) };
-            if inode.is_null() {
-                return Err(ENOMEM);
-            }
-
-            // SAFETY: `inode` is valid for write.
-            unsafe { bindings::set_nlink(inode, 2) };
-
-            {
-                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
-                // safe to mutably dereference it.
-                let inode = unsafe { &mut *inode };
-                inode.i_ino = 1;
-                inode.i_mode = (bindings::S_IFDIR | 0o755) as _;
-
-                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
-                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
+            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
+            // newly-created (and initialised above) superblock.
+            let sb = unsafe { &mut *sb_ptr.cast() };
+            let root = T::init_root(sb)?;
 
-                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
-                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
+            // Reject root inode if it belongs to a different superblock.
+            if !ptr::eq(root.super_block(), sb) {
+                return Err(EINVAL);
             }
 
             // SAFETY: `d_make_root` requires that `inode` be valid and referenced, which is the
             // case for this call.
             //
             // It takes over the inode, even on failure, so we don't need to clean it up.
-            let dentry = unsafe { bindings::d_make_root(inode) };
+            let dentry = unsafe { bindings::d_make_root(ManuallyDrop::new(root).0.get()) };
             if dentry.is_null() {
                 return Err(ENOMEM);
             }
 
-            sb.0.s_root = dentry;
+            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
+            // newly-created (and initialised above) superblock.
+            unsafe { (*sb_ptr).s_root = dentry };
 
             Ok(0)
         })
@@ -314,9 +431,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///
 /// ```
 /// # mod module_fs_sample {
-/// use kernel::fs::{NewSuperBlock, SuperParams};
+/// use kernel::fs::{INode, NewSuperBlock, SuperBlock, SuperParams};
 /// use kernel::prelude::*;
-/// use kernel::{c_str, fs};
+/// use kernel::{c_str, fs, types::ARef};
 ///
 /// kernel::module_fs! {
 ///     type: MyFs,
@@ -332,6 +449,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams> {
 ///         todo!()
 ///     }
+///     fn init_root(_sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
+///         todo!()
+///     }
 /// }
 /// # }
 /// ```
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 9878bf88b991..9e5f4c7d1c06 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -2,9 +2,9 @@
 
 //! Rust read-only file system sample.
 
-use kernel::fs::{NewSuperBlock, SuperParams};
+use kernel::fs::{INode, INodeParams, INodeType, NewSuperBlock, SuperBlock, SuperParams};
 use kernel::prelude::*;
-use kernel::{c_str, fs};
+use kernel::{c_str, fs, time::UNIX_EPOCH, types::ARef, types::Either};
 
 kernel::module_fs! {
     type: RoFs,
@@ -26,4 +26,22 @@ fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams> {
             time_gran: 1,
         })
     }
+
+    fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
+        match sb.get_or_create_inode(1)? {
+            Either::Left(existing) => Ok(existing),
+            Either::Right(new) => new.init(INodeParams {
+                typ: INodeType::Dir,
+                mode: 0o555,
+                size: 1,
+                blocks: 1,
+                nlink: 2,
+                uid: 0,
+                gid: 0,
+                atime: UNIX_EPOCH,
+                ctime: UNIX_EPOCH,
+                mtime: UNIX_EPOCH,
+            }),
+        }
+    }
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir`
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (5 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root` Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-21  8:33   ` Benno Lossin
                     ` (2 more replies)
  2023-10-18 12:25 ` [RFC PATCH 08/19] rust: fs: introduce `FileSystem::lookup` Wedson Almeida Filho
                   ` (13 subsequent siblings)
  20 siblings, 3 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to report the contents of their directory
inodes. The reported entries cannot be opened yet.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs.rs         | 193 +++++++++++++++++++++++++++++++++++++-
 samples/rust/rust_rofs.rs |  49 +++++++++-
 2 files changed, 236 insertions(+), 6 deletions(-)

diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index f3a41cf57502..89611c44e4c5 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -28,6 +28,70 @@ pub trait FileSystem {
     /// This is called during initialisation of a superblock after [`FileSystem::super_params`] has
     /// completed successfully.
     fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>>;
+
+    /// Reads directory entries from directory inodes.
+    ///
+    /// [`DirEmitter::pos`] holds the current position of the directory reader.
+    fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result;
+}
+
+/// The types of directory entries reported by [`FileSystem::read_dir`].
+#[repr(u32)]
+#[derive(Copy, Clone)]
+pub enum DirEntryType {
+    /// Unknown type.
+    Unknown = bindings::DT_UNKNOWN,
+
+    /// Named pipe (first-in, first-out) type.
+    Fifo = bindings::DT_FIFO,
+
+    /// Character device type.
+    Chr = bindings::DT_CHR,
+
+    /// Directory type.
+    Dir = bindings::DT_DIR,
+
+    /// Block device type.
+    Blk = bindings::DT_BLK,
+
+    /// Regular file type.
+    Reg = bindings::DT_REG,
+
+    /// Symbolic link type.
+    Lnk = bindings::DT_LNK,
+
+    /// Named unix-domain socket type.
+    Sock = bindings::DT_SOCK,
+
+    /// White-out type.
+    Wht = bindings::DT_WHT,
+}
+
+impl From<INodeType> for DirEntryType {
+    fn from(value: INodeType) -> Self {
+        match value {
+            INodeType::Dir => DirEntryType::Dir,
+        }
+    }
+}
+
+impl core::convert::TryFrom<u32> for DirEntryType {
+    type Error = crate::error::Error;
+
+    fn try_from(v: u32) -> Result<Self> {
+        match v {
+            v if v == Self::Unknown as u32 => Ok(Self::Unknown),
+            v if v == Self::Fifo as u32 => Ok(Self::Fifo),
+            v if v == Self::Chr as u32 => Ok(Self::Chr),
+            v if v == Self::Dir as u32 => Ok(Self::Dir),
+            v if v == Self::Blk as u32 => Ok(Self::Blk),
+            v if v == Self::Reg as u32 => Ok(Self::Reg),
+            v if v == Self::Lnk as u32 => Ok(Self::Lnk),
+            v if v == Self::Sock as u32 => Ok(Self::Sock),
+            v if v == Self::Wht as u32 => Ok(Self::Wht),
+            _ => Err(EDOM),
+        }
+    }
 }
 
 /// A registration of a file system.
@@ -161,9 +225,7 @@ pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
 
         let mode = match params.typ {
             INodeType::Dir => {
-                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
-                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
-
+                inode.__bindgen_anon_3.i_fop = &Tables::<T>::DIR_FILE_OPERATIONS;
                 // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
                 inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
                 bindings::S_IFDIR
@@ -403,6 +465,126 @@ impl<T: FileSystem + ?Sized> Tables<T> {
         free_cached_objects: None,
         shutdown: None,
     };
+
+    const DIR_FILE_OPERATIONS: bindings::file_operations = bindings::file_operations {
+        owner: ptr::null_mut(),
+        llseek: Some(bindings::generic_file_llseek),
+        read: Some(bindings::generic_read_dir),
+        write: None,
+        read_iter: None,
+        write_iter: None,
+        iopoll: None,
+        iterate_shared: Some(Self::read_dir_callback),
+        poll: None,
+        unlocked_ioctl: None,
+        compat_ioctl: None,
+        mmap: None,
+        mmap_supported_flags: 0,
+        open: None,
+        flush: None,
+        release: None,
+        fsync: None,
+        fasync: None,
+        lock: None,
+        get_unmapped_area: None,
+        check_flags: None,
+        flock: None,
+        splice_write: None,
+        splice_read: None,
+        splice_eof: None,
+        setlease: None,
+        fallocate: None,
+        show_fdinfo: None,
+        copy_file_range: None,
+        remap_file_range: None,
+        fadvise: None,
+        uring_cmd: None,
+        uring_cmd_iopoll: None,
+    };
+
+    unsafe extern "C" fn read_dir_callback(
+        file: *mut bindings::file,
+        ctx_ptr: *mut bindings::dir_context,
+    ) -> core::ffi::c_int {
+        from_result(|| {
+            // SAFETY: The C API guarantees that `file` is valid for read. And since `f_inode` is
+            // immutable, we can read it directly.
+            let inode = unsafe { &*(*file).f_inode.cast::<INode<T>>() };
+
+            // SAFETY: The C API guarantees that this is the only reference to the `dir_context`
+            // instance.
+            let emitter = unsafe { &mut *ctx_ptr.cast::<DirEmitter>() };
+            let orig_pos = emitter.pos();
+
+            // Call the module implementation. We ignore errors if directory entries have been
+            // succesfully emitted: this is because we want users to see them before the error.
+            match T::read_dir(inode, emitter) {
+                Ok(_) => Ok(0),
+                Err(e) => {
+                    if emitter.pos() == orig_pos {
+                        Err(e)
+                    } else {
+                        Ok(0)
+                    }
+                }
+            }
+        })
+    }
+}
+
+/// Directory entry emitter.
+///
+/// This is used in [`FileSystem::read_dir`] implementations to report the directory entry.
+#[repr(transparent)]
+pub struct DirEmitter(bindings::dir_context);
+
+impl DirEmitter {
+    /// Returns the current position of the emitter.
+    pub fn pos(&self) -> i64 {
+        self.0.pos
+    }
+
+    /// Emits a directory entry.
+    ///
+    /// `pos_inc` is the number with which to increment the current position on success.
+    ///
+    /// `name` is the name of the entry.
+    ///
+    /// `ino` is the inode number of the entry.
+    ///
+    /// `etype` is the type of the entry.
+    ///
+    /// Returns `false` when the entry could not be emitted, possibly because the user-provided
+    /// buffer is full.
+    pub fn emit(&mut self, pos_inc: i64, name: &[u8], ino: Ino, etype: DirEntryType) -> bool {
+        let Ok(name_len) = i32::try_from(name.len()) else {
+            return false;
+        };
+
+        let Some(actor) = self.0.actor else {
+            return false;
+        };
+
+        let Some(new_pos) = self.0.pos.checked_add(pos_inc) else {
+            return false;
+        };
+
+        // SAFETY: `name` is valid at least for the duration of the `actor` call.
+        let ret = unsafe {
+            actor(
+                &mut self.0,
+                name.as_ptr().cast(),
+                name_len,
+                self.0.pos,
+                ino,
+                etype as _,
+            )
+        };
+        if ret {
+            self.0.pos = new_pos;
+        }
+        ret
+    }
 }
 
 /// Kernel module that exposes a single file system implemented by `T`.
@@ -431,7 +613,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///
 /// ```
 /// # mod module_fs_sample {
-/// use kernel::fs::{INode, NewSuperBlock, SuperBlock, SuperParams};
+/// use kernel::fs::{DirEmitter, INode, NewSuperBlock, SuperBlock, SuperParams};
 /// use kernel::prelude::*;
 /// use kernel::{c_str, fs, types::ARef};
 ///
@@ -452,6 +634,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///     fn init_root(_sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
 ///         todo!()
 ///     }
+///     fn read_dir(_: &INode<Self>, _: &mut DirEmitter) -> Result {
+///         todo!()
+///     }
 /// }
 /// # }
 /// ```
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 9e5f4c7d1c06..4e61a94afa70 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -2,7 +2,9 @@
 
 //! Rust read-only file system sample.
 
-use kernel::fs::{INode, INodeParams, INodeType, NewSuperBlock, SuperBlock, SuperParams};
+use kernel::fs::{
+    DirEmitter, INode, INodeParams, INodeType, NewSuperBlock, SuperBlock, SuperParams,
+};
 use kernel::prelude::*;
 use kernel::{c_str, fs, time::UNIX_EPOCH, types::ARef, types::Either};
 
@@ -14,6 +16,30 @@
     license: "GPL",
 }
 
+struct Entry {
+    name: &'static [u8],
+    ino: u64,
+    etype: INodeType,
+}
+
+const ENTRIES: [Entry; 3] = [
+    Entry {
+        name: b".",
+        ino: 1,
+        etype: INodeType::Dir,
+    },
+    Entry {
+        name: b"..",
+        ino: 1,
+        etype: INodeType::Dir,
+    },
+    Entry {
+        name: b"subdir",
+        ino: 2,
+        etype: INodeType::Dir,
+    },
+];
+
 struct RoFs;
 impl fs::FileSystem for RoFs {
     const NAME: &'static CStr = c_str!("rust-fs");
@@ -33,7 +59,7 @@ fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
             Either::Right(new) => new.init(INodeParams {
                 typ: INodeType::Dir,
                 mode: 0o555,
-                size: 1,
+                size: ENTRIES.len().try_into()?,
                 blocks: 1,
                 nlink: 2,
                 uid: 0,
@@ -44,4 +70,23 @@ fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
             }),
         }
     }
+
+    fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result {
+        if inode.ino() != 1 {
+            return Ok(());
+        }
+
+        let pos = emitter.pos();
+        if pos >= ENTRIES.len().try_into()? {
+            return Ok(());
+        }
+
+        for e in ENTRIES.iter().skip(pos.try_into()?) {
+            if !emitter.emit(1, e.name, e.ino, e.etype.into()) {
+                break;
+            }
+        }
+
+        Ok(())
+    }
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 08/19] rust: fs: introduce `FileSystem::lookup`
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (6 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir` Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-18 12:25 ` [RFC PATCH 09/19] rust: folio: introduce basic support for folios Wedson Almeida Filho
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to create inodes that are children of a
directory inode when they're looked up by name.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/error.rs      |  1 -
 rust/kernel/fs.rs         | 65 +++++++++++++++++++++++++++++++++++++--
 samples/rust/rust_rofs.rs | 25 +++++++++++++++
 3 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index e6d7ce46be55..484fa7c11de1 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -131,7 +131,6 @@ pub fn to_errno(self) -> core::ffi::c_int {
     }
 
     /// Returns the error encoded as a pointer.
-    #[allow(dead_code)]
     pub(crate) fn to_ptr<T>(self) -> *mut T {
         // SAFETY: self.0 is a valid error due to its invariant.
         unsafe { bindings::ERR_PTR(self.0.into()) as *mut _ }
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 89611c44e4c5..681fef8e3af1 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -33,6 +33,9 @@ pub trait FileSystem {
     ///
     /// [`DirEmitter::pos`] holds the current position of the directory reader.
     fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result;
+
+    /// Returns the inode corresponding to the directory entry with the given name.
+    fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>>;
 }
 
 /// The types of directory entries reported by [`FileSystem::read_dir`].
@@ -226,8 +229,7 @@ pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
         let mode = match params.typ {
             INodeType::Dir => {
                 inode.__bindgen_anon_3.i_fop = &Tables::<T>::DIR_FILE_OPERATIONS;
-                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
-                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
+                inode.i_op = &Tables::<T>::DIR_INODE_OPERATIONS;
                 bindings::S_IFDIR
             }
         };
@@ -530,6 +532,62 @@ impl<T: FileSystem + ?Sized> Tables<T> {
             }
         })
     }
+
+    const DIR_INODE_OPERATIONS: bindings::inode_operations = bindings::inode_operations {
+        lookup: Some(Self::lookup_callback),
+        get_link: None,
+        permission: None,
+        get_inode_acl: None,
+        readlink: None,
+        create: None,
+        link: None,
+        unlink: None,
+        symlink: None,
+        mkdir: None,
+        rmdir: None,
+        mknod: None,
+        rename: None,
+        setattr: None,
+        getattr: None,
+        listxattr: None,
+        fiemap: None,
+        update_time: None,
+        atomic_open: None,
+        tmpfile: None,
+        get_acl: None,
+        set_acl: None,
+        fileattr_set: None,
+        fileattr_get: None,
+        get_offset_ctx: None,
+    };
+
+    extern "C" fn lookup_callback(
+        parent_ptr: *mut bindings::inode,
+        dentry: *mut bindings::dentry,
+        _flags: u32,
+    ) -> *mut bindings::dentry {
+        // SAFETY: The C API guarantees that `parent_ptr` is a valid inode.
+        let parent = unsafe { &*parent_ptr.cast::<INode<T>>() };
+
+        // SAFETY: The C API guarantees that `dentry` is valid for read. Since the name is
+        // immutable, it's ok to read its length directly.
+        let len = unsafe { (*dentry).d_name.__bindgen_anon_1.__bindgen_anon_1.len };
+        let Ok(name_len) = usize::try_from(len) else {
+            return ENOENT.to_ptr();
+        };
+
+        // SAFETY: The C API guarantees that `dentry` is valid for read. Since the name is
+        // immutable, it's ok to read it directly.
+        let name = unsafe { core::slice::from_raw_parts((*dentry).d_name.name, name_len) };
+        match T::lookup(parent, name) {
+            Err(e) => e.to_ptr(),
+            // SAFETY: The returned inode is valid and referenced (by the type invariants), so
+            // it is ok to transfer this increment to `d_splice_alias`.
+            Ok(inode) => unsafe {
+                bindings::d_splice_alias(ManuallyDrop::new(inode).0.get(), dentry)
+            },
+        }
+    }
 }
 
 /// Directory entry emitter.
@@ -637,6 +695,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///     fn read_dir(_: &INode<Self>, _: &mut DirEmitter) -> Result {
 ///         todo!()
 ///     }
+///     fn lookup(_: &INode<Self>, _: &[u8]) -> Result<ARef<INode<Self>>> {
+///         todo!()
+///     }
 /// }
 /// # }
 /// ```
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 4e61a94afa70..4cc8525884a9 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -89,4 +89,29 @@ fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result {
 
         Ok(())
     }
+
+    fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>> {
+        if parent.ino() != 1 {
+            return Err(ENOENT);
+        }
+
+        match name {
+            b"subdir" => match parent.super_block().get_or_create_inode(2)? {
+                Either::Left(existing) => Ok(existing),
+                Either::Right(new) => new.init(INodeParams {
+                    typ: INodeType::Dir,
+                    mode: 0o555,
+                    size: 0,
+                    blocks: 1,
+                    nlink: 2,
+                    uid: 0,
+                    gid: 0,
+                    atime: UNIX_EPOCH,
+                    ctime: UNIX_EPOCH,
+                    mtime: UNIX_EPOCH,
+                }),
+            },
+            _ => Err(ENOENT),
+        }
+    }
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (7 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 08/19] rust: fs: introduce `FileSystem::lookup` Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-18 17:17   ` Matthew Wilcox
  2023-10-21  9:21   ` Benno Lossin
  2023-10-18 12:25 ` [RFC PATCH 10/19] rust: fs: introduce `FileSystem::read_folio` Wedson Almeida Filho
                   ` (11 subsequent siblings)
  20 siblings, 2 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to handle ref-counted folios.

Provide the minimum needed to implement `read_folio` (part of `struct
address_space_operations`) in read-only file systems and to read
uncached blocks.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |   3 +
 rust/bindings/lib.rs            |   2 +
 rust/helpers.c                  |  81 ++++++++++++
 rust/kernel/folio.rs            | 215 ++++++++++++++++++++++++++++++++
 rust/kernel/lib.rs              |   1 +
 5 files changed, 302 insertions(+)
 create mode 100644 rust/kernel/folio.rs

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index ca1898ce9527..53a99ea512d1 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -11,6 +11,7 @@
 #include <linux/fs.h>
 #include <linux/fs_context.h>
 #include <linux/slab.h>
+#include <linux/pagemap.h>
 #include <linux/refcount.h>
 #include <linux/wait.h>
 #include <linux/sched.h>
@@ -27,3 +28,5 @@ const slab_flags_t BINDINGS_SLAB_ACCOUNT = SLAB_ACCOUNT;
 const unsigned long BINDINGS_SB_RDONLY = SB_RDONLY;
 
 const loff_t BINDINGS_MAX_LFS_FILESIZE = MAX_LFS_FILESIZE;
+
+const size_t BINDINGS_PAGE_SIZE = PAGE_SIZE;
diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs
index 426915d3fb57..a96b7f08e57d 100644
--- a/rust/bindings/lib.rs
+++ b/rust/bindings/lib.rs
@@ -59,3 +59,5 @@ mod bindings_helper {
 pub const SB_RDONLY: core::ffi::c_ulong = BINDINGS_SB_RDONLY;
 
 pub const MAX_LFS_FILESIZE: loff_t = BINDINGS_MAX_LFS_FILESIZE;
+
+pub const PAGE_SIZE: usize = BINDINGS_PAGE_SIZE;
diff --git a/rust/helpers.c b/rust/helpers.c
index c5a2bec6467d..f2ce3e7b688c 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -23,10 +23,14 @@
 #include <kunit/test-bug.h>
 #include <linux/bug.h>
 #include <linux/build_bug.h>
+#include <linux/cacheflush.h>
 #include <linux/err.h>
 #include <linux/errname.h>
 #include <linux/fs.h>
+#include <linux/highmem.h>
+#include <linux/mm.h>
 #include <linux/mutex.h>
+#include <linux/pagemap.h>
 #include <linux/refcount.h>
 #include <linux/sched/signal.h>
 #include <linux/spinlock.h>
@@ -145,6 +149,77 @@ struct kunit *rust_helper_kunit_get_current_test(void)
 }
 EXPORT_SYMBOL_GPL(rust_helper_kunit_get_current_test);
 
+void *rust_helper_kmap(struct page *page)
+{
+	return kmap(page);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kmap);
+
+void rust_helper_kunmap(struct page *page)
+{
+	kunmap(page);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kunmap);
+
+void rust_helper_folio_get(struct folio *folio)
+{
+	folio_get(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_get);
+
+void rust_helper_folio_put(struct folio *folio)
+{
+	folio_put(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_put);
+
+struct page *rust_helper_folio_page(struct folio *folio, size_t n)
+{
+	return folio_page(folio, n);
+}
+
+loff_t rust_helper_folio_pos(struct folio *folio)
+{
+	return folio_pos(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_pos);
+
+size_t rust_helper_folio_size(struct folio *folio)
+{
+	return folio_size(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_size);
+
+void rust_helper_folio_mark_uptodate(struct folio *folio)
+{
+	folio_mark_uptodate(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_mark_uptodate);
+
+void rust_helper_folio_set_error(struct folio *folio)
+{
+	folio_set_error(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_set_error);
+
+void rust_helper_flush_dcache_folio(struct folio *folio)
+{
+	flush_dcache_folio(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_flush_dcache_folio);
+
+void *rust_helper_kmap_local_folio(struct folio *folio, size_t offset)
+{
+	return kmap_local_folio(folio, offset);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kmap_local_folio);
+
+void rust_helper_kunmap_local(const void *vaddr)
+{
+	kunmap_local(vaddr);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kunmap_local);
+
 void rust_helper_i_uid_write(struct inode *inode, uid_t uid)
 {
 	i_uid_write(inode, uid);
@@ -163,6 +238,12 @@ off_t rust_helper_i_size_read(const struct inode *inode)
 }
 EXPORT_SYMBOL_GPL(rust_helper_i_size_read);
 
+void rust_helper_mapping_set_large_folios(struct address_space *mapping)
+{
+	mapping_set_large_folios(mapping);
+}
+EXPORT_SYMBOL_GPL(rust_helper_mapping_set_large_folios);
+
 /*
  * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
  * use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/folio.rs b/rust/kernel/folio.rs
new file mode 100644
index 000000000000..ef8a08b97962
--- /dev/null
+++ b/rust/kernel/folio.rs
@@ -0,0 +1,215 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Groups of contiguous pages, folios.
+//!
+//! C headers: [`include/linux/mm.h`](../../include/linux/mm.h)
+
+use crate::error::{code::*, Result};
+use crate::types::{ARef, AlwaysRefCounted, Opaque, ScopeGuard};
+use core::{cmp::min, ptr};
+
+/// Wraps the kernel's `struct folio`.
+///
+/// # Invariants
+///
+/// Instances of this type are always ref-counted, that is, a call to `folio_get` ensures that the
+/// allocation remains valid at least until the matching call to `folio_put`.
+#[repr(transparent)]
+pub struct Folio(pub(crate) Opaque<bindings::folio>);
+
+// SAFETY: The type invariants guarantee that `Folio` is always ref-counted.
+unsafe impl AlwaysRefCounted for Folio {
+    fn inc_ref(&self) {
+        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+        unsafe { bindings::folio_get(self.0.get()) };
+    }
+
+    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
+        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
+        unsafe { bindings::folio_put(obj.cast().as_ptr()) }
+    }
+}
+
+impl Folio {
+    /// Tries to allocate a new folio.
+    ///
+    /// On success, returns a folio made up of 2^order pages.
+    pub fn try_new(order: u32) -> Result<UniqueFolio> {
+        if order > bindings::MAX_ORDER {
+            return Err(EDOM);
+        }
+
+        // SAFETY: We checked that `order` is within the max allowed value.
+        let f = ptr::NonNull::new(unsafe { bindings::folio_alloc(bindings::GFP_KERNEL, order) })
+            .ok_or(ENOMEM)?;
+
+        // SAFETY: The folio returned by `folio_alloc` is referenced. The ownership of the
+        // reference is transferred to the `ARef` instance.
+        Ok(UniqueFolio(unsafe { ARef::from_raw(f.cast()) }))
+    }
+
+    /// Returns the byte position of this folio in its file.
+    pub fn pos(&self) -> i64 {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_pos(self.0.get()) }
+    }
+
+    /// Returns the byte size of this folio.
+    pub fn size(&self) -> usize {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_size(self.0.get()) }
+    }
+
+    /// Flushes the data cache for the pages that make up the folio.
+    pub fn flush_dcache(&self) {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::flush_dcache_folio(self.0.get()) }
+    }
+}
+
+/// A [`Folio`] that has a single reference to it.
+pub struct UniqueFolio(pub(crate) ARef<Folio>);
+
+impl UniqueFolio {
+    /// Maps the contents of a folio page into a slice.
+    pub fn map_page(&self, page_index: usize) -> Result<MapGuard<'_>> {
+        if page_index >= self.0.size() / bindings::PAGE_SIZE {
+            return Err(EDOM);
+        }
+
+        // SAFETY: We just checked that the index is within bounds of the folio.
+        let page = unsafe { bindings::folio_page(self.0 .0.get(), page_index) };
+
+        // SAFETY: `page` is valid because it was returned by `folio_page` above.
+        let ptr = unsafe { bindings::kmap(page) };
+
+        // SAFETY: We just mapped `ptr`, so it's valid for read.
+        let data = unsafe { core::slice::from_raw_parts(ptr.cast::<u8>(), bindings::PAGE_SIZE) };
+
+        Ok(MapGuard { data, page })
+    }
+}
+
+/// A mapped [`UniqueFolio`].
+pub struct MapGuard<'a> {
+    data: &'a [u8],
+    page: *mut bindings::page,
+}
+
+impl core::ops::Deref for MapGuard<'_> {
+    type Target = [u8];
+
+    fn deref(&self) -> &Self::Target {
+        self.data
+    }
+}
+
+impl Drop for MapGuard<'_> {
+    fn drop(&mut self) {
+        // SAFETY: A `MapGuard` instance is only created when `kmap` succeeds, so it's ok to unmap
+        // it when the guard is dropped.
+        unsafe { bindings::kunmap(self.page) };
+    }
+}
+
+/// A locked [`Folio`].
+pub struct LockedFolio<'a>(&'a Folio);
+
+impl LockedFolio<'_> {
+    /// Creates a new locked folio from a raw pointer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that the folio is valid and locked. Additionally, that the
+    /// responsibility of unlocking is transferred to the new instance of [`LockedFolio`]. Lastly,
+    /// that the returned [`LockedFolio`] doesn't outlive the refcount that keeps it alive.
+    #[allow(dead_code)]
+    pub(crate) unsafe fn from_raw(folio: *const bindings::folio) -> Self {
+        let ptr = folio.cast();
+        // SAFETY: The safety requirements ensure that `folio` (from which `ptr` is derived) is
+        // valid and will remain valid while the `LockedFolio` instance lives.
+        Self(unsafe { &*ptr })
+    }
+
+    /// Marks the folio as being up to date.
+    pub fn mark_uptodate(&mut self) {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_mark_uptodate(self.0 .0.get()) }
+    }
+
+    /// Sets the error flag on the folio.
+    pub fn set_error(&mut self) {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_set_error(self.0 .0.get()) }
+    }
+
+    fn for_each_page(
+        &mut self,
+        offset: usize,
+        len: usize,
+        mut cb: impl FnMut(&mut [u8]) -> Result,
+    ) -> Result {
+        let mut remaining = len;
+        let mut next_offset = offset;
+
+        // Check that we don't overflow the folio.
+        let end = offset.checked_add(len).ok_or(EDOM)?;
+        if end > self.size() {
+            return Err(EINVAL);
+        }
+
+        while remaining > 0 {
+            let page_offset = next_offset & (bindings::PAGE_SIZE - 1);
+            let usable = min(remaining, bindings::PAGE_SIZE - page_offset);
+            // SAFETY: The folio is valid because the shared reference implies a non-zero refcount;
+            // `next_offset` is also guaranteed be lesss than the folio size.
+            let ptr = unsafe { bindings::kmap_local_folio(self.0 .0.get(), next_offset) };
+
+            // SAFETY: `ptr` was just returned by the `kmap_local_folio` above.
+            let _guard = ScopeGuard::new(|| unsafe { bindings::kunmap_local(ptr) });
+
+            // SAFETY: `kmap_local_folio` maps whole page so we know it's mapped for at least
+            // `usable` bytes.
+            let s = unsafe { core::slice::from_raw_parts_mut(ptr.cast::<u8>(), usable) };
+            cb(s)?;
+
+            next_offset += usable;
+            remaining -= usable;
+        }
+
+        Ok(())
+    }
+
+    /// Writes the given slice into the folio.
+    pub fn write(&mut self, offset: usize, data: &[u8]) -> Result {
+        let mut remaining = data;
+
+        self.for_each_page(offset, data.len(), |s| {
+            s.copy_from_slice(&remaining[..s.len()]);
+            remaining = &remaining[s.len()..];
+            Ok(())
+        })
+    }
+
+    /// Writes zeroes into the folio.
+    pub fn zero_out(&mut self, offset: usize, len: usize) -> Result {
+        self.for_each_page(offset, len, |s| {
+            s.fill(0);
+            Ok(())
+        })
+    }
+}
+
+impl core::ops::Deref for LockedFolio<'_> {
+    type Target = Folio;
+    fn deref(&self) -> &Self::Target {
+        self.0
+    }
+}
+
+impl Drop for LockedFolio<'_> {
+    fn drop(&mut self) {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_unlock(self.0 .0.get()) }
+    }
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 00059b80c240..0e85b380da64 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -34,6 +34,7 @@
 mod allocator;
 mod build_assert;
 pub mod error;
+pub mod folio;
 pub mod fs;
 pub mod init;
 pub mod ioctl;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 10/19] rust: fs: introduce `FileSystem::read_folio`
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (8 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 09/19] rust: folio: introduce basic support for folios Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-11-07 22:18   ` Matthew Wilcox
  2023-10-18 12:25 ` [RFC PATCH 11/19] rust: fs: introduce `FileSystem::read_xattr` Wedson Almeida Filho
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to create regular file inodes backed by the page
cache. The contents of such files are read into folios via `read_folio`.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/folio.rs      |  1 -
 rust/kernel/fs.rs         | 75 +++++++++++++++++++++++++++++++++++++--
 samples/rust/rust_rofs.rs | 69 ++++++++++++++++++++++++-----------
 3 files changed, 122 insertions(+), 23 deletions(-)

diff --git a/rust/kernel/folio.rs b/rust/kernel/folio.rs
index ef8a08b97962..b7f80291b0e1 100644
--- a/rust/kernel/folio.rs
+++ b/rust/kernel/folio.rs
@@ -123,7 +123,6 @@ impl LockedFolio<'_> {
     /// Callers must ensure that the folio is valid and locked. Additionally, that the
     /// responsibility of unlocking is transferred to the new instance of [`LockedFolio`]. Lastly,
     /// that the returned [`LockedFolio`] doesn't outlive the refcount that keeps it alive.
-    #[allow(dead_code)]
     pub(crate) unsafe fn from_raw(folio: *const bindings::folio) -> Self {
         let ptr = folio.cast();
         // SAFETY: The safety requirements ensure that `folio` (from which `ptr` is derived) is
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 681fef8e3af1..ee3dce87032b 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -8,7 +8,10 @@
 
 use crate::error::{code::*, from_result, to_result, Error, Result};
 use crate::types::{ARef, AlwaysRefCounted, Either, Opaque};
-use crate::{bindings, init::PinInit, str::CStr, time::Timespec, try_pin_init, ThisModule};
+use crate::{
+    bindings, folio::LockedFolio, init::PinInit, str::CStr, time::Timespec, try_pin_init,
+    ThisModule,
+};
 use core::{marker::PhantomData, marker::PhantomPinned, mem::ManuallyDrop, pin::Pin, ptr};
 use macros::{pin_data, pinned_drop};
 
@@ -36,6 +39,9 @@ pub trait FileSystem {
 
     /// Returns the inode corresponding to the directory entry with the given name.
     fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>>;
+
+    /// Reads the contents of the inode into the given folio.
+    fn read_folio(inode: &INode<Self>, folio: LockedFolio<'_>) -> Result;
 }
 
 /// The types of directory entries reported by [`FileSystem::read_dir`].
@@ -74,6 +80,7 @@ impl From<INodeType> for DirEntryType {
     fn from(value: INodeType) -> Self {
         match value {
             INodeType::Dir => DirEntryType::Dir,
+            INodeType::Reg => DirEntryType::Reg,
         }
     }
 }
@@ -232,6 +239,15 @@ pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
                 inode.i_op = &Tables::<T>::DIR_INODE_OPERATIONS;
                 bindings::S_IFDIR
             }
+            INodeType::Reg => {
+                // SAFETY: `generic_ro_fops` never changes, it's safe to reference it.
+                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::generic_ro_fops };
+                inode.i_data.a_ops = &Tables::<T>::FILE_ADDRESS_SPACE_OPERATIONS;
+
+                // SAFETY: The `i_mapping` pointer doesn't change and is valid.
+                unsafe { bindings::mapping_set_large_folios(inode.i_mapping) };
+                bindings::S_IFREG
+            }
         };
 
         inode.i_mode = (params.mode & 0o777) | u16::try_from(mode)?;
@@ -268,6 +284,9 @@ fn drop(&mut self) {
 pub enum INodeType {
     /// Directory type.
     Dir,
+
+    /// Regular file type.
+    Reg,
 }
 
 /// Required inode parameters.
@@ -588,6 +607,55 @@ extern "C" fn lookup_callback(
             },
         }
     }
+
+    const FILE_ADDRESS_SPACE_OPERATIONS: bindings::address_space_operations =
+        bindings::address_space_operations {
+            writepage: None,
+            read_folio: Some(Self::read_folio_callback),
+            writepages: None,
+            dirty_folio: None,
+            readahead: None,
+            write_begin: None,
+            write_end: None,
+            bmap: None,
+            invalidate_folio: None,
+            release_folio: None,
+            free_folio: None,
+            direct_IO: None,
+            migrate_folio: None,
+            launder_folio: None,
+            is_partially_uptodate: None,
+            is_dirty_writeback: None,
+            error_remove_page: None,
+            swap_activate: None,
+            swap_deactivate: None,
+            swap_rw: None,
+        };
+
+    extern "C" fn read_folio_callback(
+        _file: *mut bindings::file,
+        folio: *mut bindings::folio,
+    ) -> i32 {
+        from_result(|| {
+            // SAFETY: All pointers are valid and stable.
+            let inode = unsafe {
+                &*(*(*folio)
+                    .__bindgen_anon_1
+                    .page
+                    .__bindgen_anon_1
+                    .__bindgen_anon_1
+                    .mapping)
+                    .host
+                    .cast::<INode<T>>()
+            };
+
+            // SAFETY: The C contract guarantees that the folio is valid and locked, with ownership
+            // of the lock transferred to the callee (this function). The folio is also guaranteed
+            // not to outlive this function.
+            T::read_folio(inode, unsafe { LockedFolio::from_raw(folio) })?;
+            Ok(0)
+        })
+    }
 }
 
 /// Directory entry emitter.
@@ -673,7 +741,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 /// # mod module_fs_sample {
 /// use kernel::fs::{DirEmitter, INode, NewSuperBlock, SuperBlock, SuperParams};
 /// use kernel::prelude::*;
-/// use kernel::{c_str, fs, types::ARef};
+/// use kernel::{c_str, folio::LockedFolio, fs, types::ARef};
 ///
 /// kernel::module_fs! {
 ///     type: MyFs,
@@ -698,6 +766,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///     fn lookup(_: &INode<Self>, _: &[u8]) -> Result<ARef<INode<Self>>> {
 ///         todo!()
 ///     }
+///     fn read_folio(_: &INode<Self>, _: LockedFolio<'_>) -> Result {
+///         todo!()
+///     }
 /// }
 /// # }
 /// ```
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 4cc8525884a9..ef651ad38185 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -6,7 +6,7 @@
     DirEmitter, INode, INodeParams, INodeType, NewSuperBlock, SuperBlock, SuperParams,
 };
 use kernel::prelude::*;
-use kernel::{c_str, fs, time::UNIX_EPOCH, types::ARef, types::Either};
+use kernel::{c_str, folio::LockedFolio, fs, time::UNIX_EPOCH, types::ARef, types::Either};
 
 kernel::module_fs! {
     type: RoFs,
@@ -20,6 +20,7 @@ struct Entry {
     name: &'static [u8],
     ino: u64,
     etype: INodeType,
+    contents: &'static [u8],
 }
 
 const ENTRIES: [Entry; 3] = [
@@ -27,16 +28,19 @@ struct Entry {
         name: b".",
         ino: 1,
         etype: INodeType::Dir,
+        contents: b"",
     },
     Entry {
         name: b"..",
         ino: 1,
         etype: INodeType::Dir,
+        contents: b"",
     },
     Entry {
-        name: b"subdir",
+        name: b"test.txt",
         ino: 2,
-        etype: INodeType::Dir,
+        etype: INodeType::Reg,
+        contents: b"hello\n",
     },
 ];
 
@@ -95,23 +99,48 @@ fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>> {
             return Err(ENOENT);
         }
 
-        match name {
-            b"subdir" => match parent.super_block().get_or_create_inode(2)? {
-                Either::Left(existing) => Ok(existing),
-                Either::Right(new) => new.init(INodeParams {
-                    typ: INodeType::Dir,
-                    mode: 0o555,
-                    size: 0,
-                    blocks: 1,
-                    nlink: 2,
-                    uid: 0,
-                    gid: 0,
-                    atime: UNIX_EPOCH,
-                    ctime: UNIX_EPOCH,
-                    mtime: UNIX_EPOCH,
-                }),
-            },
-            _ => Err(ENOENT),
+        for e in &ENTRIES {
+            if name == e.name {
+                return match parent.super_block().get_or_create_inode(e.ino)? {
+                    Either::Left(existing) => Ok(existing),
+                    Either::Right(new) => new.init(INodeParams {
+                        typ: e.etype,
+                        mode: 0o444,
+                        size: e.contents.len().try_into()?,
+                        blocks: 1,
+                        nlink: 1,
+                        uid: 0,
+                        gid: 0,
+                        atime: UNIX_EPOCH,
+                        ctime: UNIX_EPOCH,
+                        mtime: UNIX_EPOCH,
+                    }),
+                };
+            }
         }
+
+        Err(ENOENT)
+    }
+
+    fn read_folio(inode: &INode<Self>, mut folio: LockedFolio<'_>) -> Result {
+        let data = match inode.ino() {
+            2 => ENTRIES[2].contents,
+            _ => return Err(EINVAL),
+        };
+
+        let pos = usize::try_from(folio.pos()).unwrap_or(usize::MAX);
+        let copied = if pos >= data.len() {
+            0
+        } else {
+            let to_copy = core::cmp::min(data.len() - pos, folio.size());
+            folio.write(0, &data[pos..][..to_copy])?;
+            to_copy
+        };
+
+        folio.zero_out(copied, folio.size() - copied)?;
+        folio.mark_uptodate();
+        folio.flush_dcache();
+
+        Ok(())
     }
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 11/19] rust: fs: introduce `FileSystem::read_xattr`
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (9 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 10/19] rust: fs: introduce `FileSystem::read_folio` Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-18 13:06   ` Ariel Miculas (amiculas)
  2023-10-18 12:25 ` [RFC PATCH 12/19] rust: fs: introduce `FileSystem::statfs` Wedson Almeida Filho
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to expose xattrs associated with inodes.
`overlayfs` uses an xattr to indicate that a directory is opaque (i.e.,
that lower layers should not be looked up). The planned file systems
need to support opaque directories, so they must be able to implement
this.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |  1 +
 rust/kernel/error.rs            |  2 ++
 rust/kernel/fs.rs               | 43 +++++++++++++++++++++++++++++++++
 3 files changed, 46 insertions(+)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 53a99ea512d1..fa754c5e85a2 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -15,6 +15,7 @@
 #include <linux/refcount.h>
 #include <linux/wait.h>
 #include <linux/sched.h>
+#include <linux/xattr.h>
 
 /* `bindgen` gets confused at certain things. */
 const size_t BINDINGS_ARCH_SLAB_MINALIGN = ARCH_SLAB_MINALIGN;
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 484fa7c11de1..6c167583b275 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -81,6 +81,8 @@ macro_rules! declare_err {
     declare_err!(EIOCBQUEUED, "iocb queued, will get completion event.");
     declare_err!(ERECALLCONFLICT, "Conflict with recalled state.");
     declare_err!(ENOGRACE, "NFS file lock reclaim refused.");
+    declare_err!(ENODATA, "No data available.");
+    declare_err!(EOPNOTSUPP, "Operation not supported on transport endpoint.");
 }
 
 /// Generic integer kernel error.
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index ee3dce87032b..adf9cbee16d2 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -42,6 +42,14 @@ pub trait FileSystem {
 
     /// Reads the contents of the inode into the given folio.
     fn read_folio(inode: &INode<Self>, folio: LockedFolio<'_>) -> Result;
+
+    /// Reads an xattr.
+    ///
+    /// Returns the number of bytes written to `outbuf`. If it is too small, returns the number of
+    /// bytes needs to hold the attribute.
+    fn read_xattr(_inode: &INode<Self>, _name: &CStr, _outbuf: &mut [u8]) -> Result<usize> {
+        Err(EOPNOTSUPP)
+    }
 }
 
 /// The types of directory entries reported by [`FileSystem::read_dir`].
@@ -418,6 +426,7 @@ impl<T: FileSystem + ?Sized> Tables<T> {
 
             sb.0.s_magic = params.magic as _;
             sb.0.s_op = &Tables::<T>::SUPER_BLOCK;
+            sb.0.s_xattr = &Tables::<T>::XATTR_HANDLERS[0];
             sb.0.s_maxbytes = params.maxbytes;
             sb.0.s_time_gran = params.time_gran;
             sb.0.s_blocksize_bits = params.blocksize_bits;
@@ -487,6 +496,40 @@ impl<T: FileSystem + ?Sized> Tables<T> {
         shutdown: None,
     };
 
+    const XATTR_HANDLERS: [*const bindings::xattr_handler; 2] = [&Self::XATTR_HANDLER, ptr::null()];
+
+    const XATTR_HANDLER: bindings::xattr_handler = bindings::xattr_handler {
+        name: ptr::null(),
+        prefix: crate::c_str!("").as_char_ptr(),
+        flags: 0,
+        list: None,
+        get: Some(Self::xattr_get_callback),
+        set: None,
+    };
+
+    unsafe extern "C" fn xattr_get_callback(
+        _handler: *const bindings::xattr_handler,
+        _dentry: *mut bindings::dentry,
+        inode_ptr: *mut bindings::inode,
+        name: *const core::ffi::c_char,
+        buffer: *mut core::ffi::c_void,
+        size: usize,
+    ) -> core::ffi::c_int {
+        from_result(|| {
+            // SAFETY: The C API guarantees that `inode_ptr` is a valid inode.
+            let inode = unsafe { &*inode_ptr.cast::<INode<T>>() };
+
+            // SAFETY: The c API guarantees that `name` is a valid null-terminated string. It
+            // also guarantees that it's valid for the duration of the callback.
+            let name = unsafe { CStr::from_char_ptr(name) };
+
+            // SAFETY: The C API guarantees that `buffer` is at least `size` bytes in length.
+            let buf = unsafe { core::slice::from_raw_parts_mut(buffer.cast(), size) };
+            let len = T::read_xattr(inode, name, buf)?;
+            Ok(len.try_into()?)
+        })
+    }
+
     const DIR_FILE_OPERATIONS: bindings::file_operations = bindings::file_operations {
         owner: ptr::null_mut(),
         llseek: Some(bindings::generic_file_llseek),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 12/19] rust: fs: introduce `FileSystem::statfs`
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (10 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 11/19] rust: fs: introduce `FileSystem::read_xattr` Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2024-01-03 14:13   ` Andreas Hindborg (Samsung)
  2024-01-04  5:33   ` Darrick J. Wong
  2023-10-18 12:25 ` [RFC PATCH 13/19] rust: fs: introduce more inode types Wedson Almeida Filho
                   ` (8 subsequent siblings)
  20 siblings, 2 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to expose their stats. `overlayfs` requires that
this be implemented by all file systems that are part of an overlay.
The planned file systems need to be overlayed with overlayfs, so they
must be able to implement this.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |  1 +
 rust/kernel/error.rs            |  1 +
 rust/kernel/fs.rs               | 52 ++++++++++++++++++++++++++++++++-
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index fa754c5e85a2..e2b2ccc835e3 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -11,6 +11,7 @@
 #include <linux/fs.h>
 #include <linux/fs_context.h>
 #include <linux/slab.h>
+#include <linux/statfs.h>
 #include <linux/pagemap.h>
 #include <linux/refcount.h>
 #include <linux/wait.h>
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 6c167583b275..829756cf6c48 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -83,6 +83,7 @@ macro_rules! declare_err {
     declare_err!(ENOGRACE, "NFS file lock reclaim refused.");
     declare_err!(ENODATA, "No data available.");
     declare_err!(EOPNOTSUPP, "Operation not supported on transport endpoint.");
+    declare_err!(ENOSYS, "Invalid system call number.");
 }
 
 /// Generic integer kernel error.
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index adf9cbee16d2..8f34da50e694 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -50,6 +50,31 @@ pub trait FileSystem {
     fn read_xattr(_inode: &INode<Self>, _name: &CStr, _outbuf: &mut [u8]) -> Result<usize> {
         Err(EOPNOTSUPP)
     }
+
+    /// Get filesystem statistics.
+    fn statfs(_sb: &SuperBlock<Self>) -> Result<Stat> {
+        Err(ENOSYS)
+    }
+}
+
+/// File system stats.
+///
+/// A subset of C's `kstatfs`.
+pub struct Stat {
+    /// Magic number of the file system.
+    pub magic: u32,
+
+    /// The maximum length of a file name.
+    pub namelen: i64,
+
+    /// Block size.
+    pub bsize: i64,
+
+    /// Number of files in the file system.
+    pub files: u64,
+
+    /// Number of blocks in the file system.
+    pub blocks: u64,
 }
 
 /// The types of directory entries reported by [`FileSystem::read_dir`].
@@ -478,7 +503,7 @@ impl<T: FileSystem + ?Sized> Tables<T> {
         freeze_fs: None,
         thaw_super: None,
         unfreeze_fs: None,
-        statfs: None,
+        statfs: Some(Self::statfs_callback),
         remount_fs: None,
         umount_begin: None,
         show_options: None,
@@ -496,6 +521,31 @@ impl<T: FileSystem + ?Sized> Tables<T> {
         shutdown: None,
     };
 
+    unsafe extern "C" fn statfs_callback(
+        dentry: *mut bindings::dentry,
+        buf: *mut bindings::kstatfs,
+    ) -> core::ffi::c_int {
+        from_result(|| {
+            // SAFETY: The C API guarantees that `dentry` is valid for read. `d_sb` is
+            // immutable, so it's safe to read it. The superblock is guaranteed to be valid dor
+            // the duration of the call.
+            let sb = unsafe { &*(*dentry).d_sb.cast::<SuperBlock<T>>() };
+            let s = T::statfs(sb)?;
+
+            // SAFETY: The C API guarantees that `buf` is valid for read and write.
+            let buf = unsafe { &mut *buf };
+            buf.f_type = s.magic.into();
+            buf.f_namelen = s.namelen;
+            buf.f_bsize = s.bsize;
+            buf.f_files = s.files;
+            buf.f_blocks = s.blocks;
+            buf.f_bfree = 0;
+            buf.f_bavail = 0;
+            buf.f_ffree = 0;
+            Ok(0)
+        })
+    }
+
     const XATTR_HANDLERS: [*const bindings::xattr_handler; 2] = [&Self::XATTR_HANDLER, ptr::null()];
 
     const XATTR_HANDLER: bindings::xattr_handler = bindings::xattr_handler {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 13/19] rust: fs: introduce more inode types
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (11 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 12/19] rust: fs: introduce `FileSystem::statfs` Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-18 12:25 ` [RFC PATCH 14/19] rust: fs: add per-superblock data Wedson Almeida Filho
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file system modules to create inodes that are symlinks,
pipes, sockets, char devices and block devices (in addition to the
already-supported directories and regular files).

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c            |  6 +++
 rust/kernel/fs.rs         | 88 +++++++++++++++++++++++++++++++++++++++
 samples/rust/rust_rofs.rs |  9 +++-
 3 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/rust/helpers.c b/rust/helpers.c
index f2ce3e7b688c..af335d1912e7 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -244,6 +244,12 @@ void rust_helper_mapping_set_large_folios(struct address_space *mapping)
 }
 EXPORT_SYMBOL_GPL(rust_helper_mapping_set_large_folios);
 
+unsigned int rust_helper_MKDEV(unsigned int major, unsigned int minor)
+{
+	return MKDEV(major, minor);
+}
+EXPORT_SYMBOL_GPL(rust_helper_MKDEV);
+
 /*
  * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
  * use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 8f34da50e694..5b7eaa16d254 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -112,8 +112,13 @@ pub enum DirEntryType {
 impl From<INodeType> for DirEntryType {
     fn from(value: INodeType) -> Self {
         match value {
+            INodeType::Fifo => DirEntryType::Fifo,
+            INodeType::Chr(_, _) => DirEntryType::Chr,
             INodeType::Dir => DirEntryType::Dir,
+            INodeType::Blk(_, _) => DirEntryType::Blk,
             INodeType::Reg => DirEntryType::Reg,
+            INodeType::Lnk => DirEntryType::Lnk,
+            INodeType::Sock => DirEntryType::Sock,
         }
     }
 }
@@ -281,6 +286,46 @@ pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
                 unsafe { bindings::mapping_set_large_folios(inode.i_mapping) };
                 bindings::S_IFREG
             }
+            INodeType::Lnk => {
+                inode.i_op = &Tables::<T>::LNK_INODE_OPERATIONS;
+                inode.i_data.a_ops = &Tables::<T>::FILE_ADDRESS_SPACE_OPERATIONS;
+
+                // SAFETY: `inode` is valid for write as it's a new inode.
+                unsafe { bindings::inode_nohighmem(inode) };
+                bindings::S_IFLNK
+            }
+            INodeType::Fifo => {
+                // SAFETY: `inode` is valid for write as it's a new inode.
+                unsafe { bindings::init_special_inode(inode, bindings::S_IFIFO as _, 0) };
+                bindings::S_IFIFO
+            }
+            INodeType::Sock => {
+                // SAFETY: `inode` is valid for write as it's a new inode.
+                unsafe { bindings::init_special_inode(inode, bindings::S_IFSOCK as _, 0) };
+                bindings::S_IFSOCK
+            }
+            INodeType::Chr(major, minor) => {
+                // SAFETY: `inode` is valid for write as it's a new inode.
+                unsafe {
+                    bindings::init_special_inode(
+                        inode,
+                        bindings::S_IFCHR as _,
+                        bindings::MKDEV(major, minor & bindings::MINORMASK),
+                    )
+                };
+                bindings::S_IFCHR
+            }
+            INodeType::Blk(major, minor) => {
+                // SAFETY: `inode` is valid for write as it's a new inode.
+                unsafe {
+                    bindings::init_special_inode(
+                        inode,
+                        bindings::S_IFBLK as _,
+                        bindings::MKDEV(major, minor & bindings::MINORMASK),
+                    )
+                };
+                bindings::S_IFBLK
+            }
         };
 
         inode.i_mode = (params.mode & 0o777) | u16::try_from(mode)?;
@@ -315,11 +360,26 @@ fn drop(&mut self) {
 /// The type of the inode.
 #[derive(Copy, Clone)]
 pub enum INodeType {
+    /// Named pipe (first-in, first-out) type.
+    Fifo,
+
+    /// Character device type.
+    Chr(u32, u32),
+
     /// Directory type.
     Dir,
 
+    /// Block device type.
+    Blk(u32, u32),
+
     /// Regular file type.
     Reg,
+
+    /// Symbolic link type.
+    Lnk,
+
+    /// Named unix-domain socket type.
+    Sock,
 }
 
 /// Required inode parameters.
@@ -701,6 +761,34 @@ extern "C" fn lookup_callback(
         }
     }
 
+    const LNK_INODE_OPERATIONS: bindings::inode_operations = bindings::inode_operations {
+        lookup: None,
+        get_link: Some(bindings::page_get_link),
+        permission: None,
+        get_inode_acl: None,
+        readlink: None,
+        create: None,
+        link: None,
+        unlink: None,
+        symlink: None,
+        mkdir: None,
+        rmdir: None,
+        mknod: None,
+        rename: None,
+        setattr: None,
+        getattr: None,
+        listxattr: None,
+        fiemap: None,
+        update_time: None,
+        atomic_open: None,
+        tmpfile: None,
+        get_acl: None,
+        set_acl: None,
+        fileattr_set: None,
+        fileattr_get: None,
+        get_offset_ctx: None,
+    };
+
     const FILE_ADDRESS_SPACE_OPERATIONS: bindings::address_space_operations =
         bindings::address_space_operations {
             writepage: None,
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index ef651ad38185..95ce28efa1c3 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -23,7 +23,7 @@ struct Entry {
     contents: &'static [u8],
 }
 
-const ENTRIES: [Entry; 3] = [
+const ENTRIES: [Entry; 4] = [
     Entry {
         name: b".",
         ino: 1,
@@ -42,6 +42,12 @@ struct Entry {
         etype: INodeType::Reg,
         contents: b"hello\n",
     },
+    Entry {
+        name: b"link.txt",
+        ino: 3,
+        etype: INodeType::Lnk,
+        contents: b"./test.txt",
+    },
 ];
 
 struct RoFs;
@@ -125,6 +131,7 @@ fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>> {
     fn read_folio(inode: &INode<Self>, mut folio: LockedFolio<'_>) -> Result {
         let data = match inode.ino() {
             2 => ENTRIES[2].contents,
+            3 => ENTRIES[3].contents,
             _ => return Err(EINVAL),
         };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 14/19] rust: fs: add per-superblock data
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (12 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 13/19] rust: fs: introduce more inode types Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-25 15:51   ` Ariel Miculas (amiculas)
                     ` (2 more replies)
  2023-10-18 12:25 ` [RFC PATCH 15/19] rust: fs: add basic support for fs buffer heads Wedson Almeida Filho
                   ` (6 subsequent siblings)
  20 siblings, 3 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to associate [typed] data to super blocks when
they're created. Since we only have a pointer-sized field in which to
store the state, it must implement the `ForeignOwnable` trait.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs.rs         | 42 +++++++++++++++++++++++++++++++++------
 samples/rust/rust_rofs.rs |  4 +++-
 2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 5b7eaa16d254..e9a9362d2897 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -7,7 +7,7 @@
 //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
 
 use crate::error::{code::*, from_result, to_result, Error, Result};
-use crate::types::{ARef, AlwaysRefCounted, Either, Opaque};
+use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Opaque};
 use crate::{
     bindings, folio::LockedFolio, init::PinInit, str::CStr, time::Timespec, try_pin_init,
     ThisModule,
@@ -20,11 +20,14 @@
 
 /// A file system type.
 pub trait FileSystem {
+    /// Data associated with each file system instance (super-block).
+    type Data: ForeignOwnable + Send + Sync;
+
     /// The name of the file system type.
     const NAME: &'static CStr;
 
     /// Returns the parameters to initialise a super block.
-    fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams>;
+    fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>>;
 
     /// Initialises and returns the root inode of the given superblock.
     ///
@@ -174,7 +177,7 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
                 fs.owner = module.0;
                 fs.name = T::NAME.as_char_ptr();
                 fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
-                fs.kill_sb = Some(Self::kill_sb_callback);
+                fs.kill_sb = Some(Self::kill_sb_callback::<T>);
                 fs.fs_flags = 0;
 
                 // SAFETY: Pointers stored in `fs` are static so will live for as long as the
@@ -195,10 +198,22 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
         })
     }
 
-    unsafe extern "C" fn kill_sb_callback(sb_ptr: *mut bindings::super_block) {
+    unsafe extern "C" fn kill_sb_callback<T: FileSystem + ?Sized>(
+        sb_ptr: *mut bindings::super_block,
+    ) {
         // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
         // the appropriate function to call for cleanup.
         unsafe { bindings::kill_anon_super(sb_ptr) };
+
+        // SAFETY: The C API contract guarantees that `sb_ptr` is valid for read.
+        let ptr = unsafe { (*sb_ptr).s_fs_info };
+        if !ptr.is_null() {
+            // SAFETY: The only place where `s_fs_info` is assigned is `NewSuperBlock::init`, where
+            // it's initialised with the result of an `into_foreign` call. We checked above that
+            // `ptr` is non-null because it would be null if we never reached the point where we
+            // init the field.
+            unsafe { T::Data::from_foreign(ptr) };
+        }
     }
 }
 
@@ -429,6 +444,14 @@ pub struct INodeParams {
 pub struct SuperBlock<T: FileSystem + ?Sized>(Opaque<bindings::super_block>, PhantomData<T>);
 
 impl<T: FileSystem + ?Sized> SuperBlock<T> {
+    /// Returns the data associated with the superblock.
+    pub fn data(&self) -> <T::Data as ForeignOwnable>::Borrowed<'_> {
+        // SAFETY: This method is only available after the `NeedsData` typestate, so `s_fs_info`
+        // has been initialised initialised with the result of a call to `T::into_foreign`.
+        let ptr = unsafe { (*self.0.get()).s_fs_info };
+        unsafe { T::Data::borrow(ptr) }
+    }
+
     /// Tries to get an existing inode or create a new one if it doesn't exist yet.
     pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, NewINode<T>>> {
         // SAFETY: The only initialisation missing from the superblock is the root, and this
@@ -458,7 +481,7 @@ pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, New
 /// Required superblock parameters.
 ///
 /// This is returned by implementations of [`FileSystem::super_params`].
-pub struct SuperParams {
+pub struct SuperParams<T: ForeignOwnable + Send + Sync> {
     /// The magic number of the superblock.
     pub magic: u32,
 
@@ -472,6 +495,9 @@ pub struct SuperParams {
 
     /// Granularity of c/m/atime in ns (cannot be worse than a second).
     pub time_gran: u32,
+
+    /// Data to be associated with the superblock.
+    pub data: T,
 }
 
 /// A superblock that is still being initialised.
@@ -522,6 +548,9 @@ impl<T: FileSystem + ?Sized> Tables<T> {
             sb.0.s_blocksize = 1 << sb.0.s_blocksize_bits;
             sb.0.s_flags |= bindings::SB_RDONLY;
 
+            // N.B.: Even on failure, `kill_sb` is called and frees the data.
+            sb.0.s_fs_info = params.data.into_foreign().cast_mut();
+
             // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
             // newly-created (and initialised above) superblock.
             let sb = unsafe { &mut *sb_ptr.cast() };
@@ -934,8 +963,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 ///
 /// struct MyFs;
 /// impl fs::FileSystem for MyFs {
+///     type Data = ();
 ///     const NAME: &'static CStr = c_str!("myfs");
-///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams> {
+///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>> {
 ///         todo!()
 ///     }
 ///     fn init_root(_sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 95ce28efa1c3..093425650f26 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -52,14 +52,16 @@ struct Entry {
 
 struct RoFs;
 impl fs::FileSystem for RoFs {
+    type Data = ();
     const NAME: &'static CStr = c_str!("rust-fs");
 
-    fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams> {
+    fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>> {
         Ok(SuperParams {
             magic: 0x52555354,
             blocksize_bits: 12,
             maxbytes: fs::MAX_LFS_FILESIZE,
             time_gran: 1,
+            data: (),
         })
     }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 15/19] rust: fs: add basic support for fs buffer heads
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (13 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 14/19] rust: fs: add per-superblock data Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2024-01-03 14:17   ` Andreas Hindborg (Samsung)
  2023-10-18 12:25 ` [RFC PATCH 16/19] rust: fs: allow file systems backed by a block device Wedson Almeida Filho
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Introduce the abstractions that will be used by modules to handle buffer
heads, which will be used to access cached blocks from block devices.

All dead-code annotations are removed in the next commit in the series.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |  1 +
 rust/helpers.c                  | 15 ++++++++
 rust/kernel/fs.rs               |  3 ++
 rust/kernel/fs/buffer.rs        | 61 +++++++++++++++++++++++++++++++++
 4 files changed, 80 insertions(+)
 create mode 100644 rust/kernel/fs/buffer.rs

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index e2b2ccc835e3..d328375f7cb7 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -7,6 +7,7 @@
  */
 
 #include <kunit/test.h>
+#include <linux/buffer_head.h>
 #include <linux/errname.h>
 #include <linux/fs.h>
 #include <linux/fs_context.h>
diff --git a/rust/helpers.c b/rust/helpers.c
index af335d1912e7..a5393c6b93f2 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -21,6 +21,7 @@
  */
 
 #include <kunit/test-bug.h>
+#include <linux/buffer_head.h>
 #include <linux/bug.h>
 #include <linux/build_bug.h>
 #include <linux/cacheflush.h>
@@ -250,6 +251,20 @@ unsigned int rust_helper_MKDEV(unsigned int major, unsigned int minor)
 }
 EXPORT_SYMBOL_GPL(rust_helper_MKDEV);
 
+#ifdef CONFIG_BUFFER_HEAD
+void rust_helper_get_bh(struct buffer_head *bh)
+{
+	get_bh(bh);
+}
+EXPORT_SYMBOL_GPL(rust_helper_get_bh);
+
+void rust_helper_put_bh(struct buffer_head *bh)
+{
+	put_bh(bh);
+}
+EXPORT_SYMBOL_GPL(rust_helper_put_bh);
+#endif
+
 /*
  * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
  * use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index e9a9362d2897..4f04cb1d3c6f 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -15,6 +15,9 @@
 use core::{marker::PhantomData, marker::PhantomPinned, mem::ManuallyDrop, pin::Pin, ptr};
 use macros::{pin_data, pinned_drop};
 
+#[cfg(CONFIG_BUFFER_HEAD)]
+pub mod buffer;
+
 /// Maximum size of an inode.
 pub const MAX_LFS_FILESIZE: i64 = bindings::MAX_LFS_FILESIZE;
 
diff --git a/rust/kernel/fs/buffer.rs b/rust/kernel/fs/buffer.rs
new file mode 100644
index 000000000000..6052af8822b3
--- /dev/null
+++ b/rust/kernel/fs/buffer.rs
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! File system buffers.
+//!
+//! C headers: [`include/linux/buffer_head.h`](../../../include/linux/buffer_head.h)
+
+use crate::types::{ARef, AlwaysRefCounted, Opaque};
+use core::ptr;
+
+/// Wraps the kernel's `struct buffer_head`.
+///
+/// # Invariants
+///
+/// Instances of this type are always ref-counted, that is, a call to `get_bh` ensures that the
+/// allocation remains valid at least until the matching call to `put_bh`.
+#[repr(transparent)]
+pub struct Head(Opaque<bindings::buffer_head>);
+
+// SAFETY: The type invariants guarantee that `INode` is always ref-counted.
+unsafe impl AlwaysRefCounted for Head {
+    fn inc_ref(&self) {
+        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+        unsafe { bindings::get_bh(self.0.get()) };
+    }
+
+    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
+        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
+        unsafe { bindings::put_bh(obj.cast().as_ptr()) }
+    }
+}
+
+impl Head {
+    /// Returns the block data associated with the given buffer head.
+    pub fn data(&self) -> &[u8] {
+        let h = self.0.get();
+        // SAFETY: The existence of a shared reference guarantees that the buffer head is
+        // available and so we can access its contents.
+        unsafe { core::slice::from_raw_parts((*h).b_data.cast(), (*h).b_size) }
+    }
+}
+
+/// A view of a buffer.
+///
+/// It may contain just a contiguous subset of the buffer.
+pub struct View {
+    head: ARef<Head>,
+    offset: usize,
+    size: usize,
+}
+
+impl View {
+    #[allow(dead_code)]
+    pub(crate) fn new(head: ARef<Head>, offset: usize, size: usize) -> Self {
+        Self { head, size, offset }
+    }
+
+    /// Returns the view of the buffer head.
+    pub fn data(&self) -> &[u8] {
+        &self.head.data()[self.offset..][..self.size]
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 16/19] rust: fs: allow file systems backed by a block device
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (14 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 15/19] rust: fs: add basic support for fs buffer heads Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-21 13:39   ` Benno Lossin
  2024-01-03 14:38   ` Andreas Hindborg (Samsung)
  2023-10-18 12:25 ` [RFC PATCH 17/19] rust: fs: allow per-inode data Wedson Almeida Filho
                   ` (4 subsequent siblings)
  20 siblings, 2 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems that are backed by block devices (in addition to
in-memory ones).

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |   1 +
 rust/helpers.c                  |  14 +++
 rust/kernel/fs.rs               | 177 +++++++++++++++++++++++++++++---
 rust/kernel/fs/buffer.rs        |   1 -
 4 files changed, 180 insertions(+), 13 deletions(-)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index d328375f7cb7..8403f13d4d48 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -7,6 +7,7 @@
  */
 
 #include <kunit/test.h>
+#include <linux/bio.h>
 #include <linux/buffer_head.h>
 #include <linux/errname.h>
 #include <linux/fs.h>
diff --git a/rust/helpers.c b/rust/helpers.c
index a5393c6b93f2..bc19f3b7b93e 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -21,6 +21,7 @@
  */
 
 #include <kunit/test-bug.h>
+#include <linux/blkdev.h>
 #include <linux/buffer_head.h>
 #include <linux/bug.h>
 #include <linux/build_bug.h>
@@ -252,6 +253,13 @@ unsigned int rust_helper_MKDEV(unsigned int major, unsigned int minor)
 EXPORT_SYMBOL_GPL(rust_helper_MKDEV);
 
 #ifdef CONFIG_BUFFER_HEAD
+struct buffer_head *rust_helper_sb_bread(struct super_block *sb,
+		sector_t block)
+{
+	return sb_bread(sb, block);
+}
+EXPORT_SYMBOL_GPL(rust_helper_sb_bread);
+
 void rust_helper_get_bh(struct buffer_head *bh)
 {
 	get_bh(bh);
@@ -265,6 +273,12 @@ void rust_helper_put_bh(struct buffer_head *bh)
 EXPORT_SYMBOL_GPL(rust_helper_put_bh);
 #endif
 
+sector_t rust_helper_bdev_nr_sectors(struct block_device *bdev)
+{
+	return bdev_nr_sectors(bdev);
+}
+EXPORT_SYMBOL_GPL(rust_helper_bdev_nr_sectors);
+
 /*
  * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
  * use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index 4f04cb1d3c6f..b1ad5c110dbb 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -7,11 +7,9 @@
 //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
 
 use crate::error::{code::*, from_result, to_result, Error, Result};
-use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Opaque};
-use crate::{
-    bindings, folio::LockedFolio, init::PinInit, str::CStr, time::Timespec, try_pin_init,
-    ThisModule,
-};
+use crate::folio::{LockedFolio, UniqueFolio};
+use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Opaque, ScopeGuard};
+use crate::{bindings, init::PinInit, str::CStr, time::Timespec, try_pin_init, ThisModule};
 use core::{marker::PhantomData, marker::PhantomPinned, mem::ManuallyDrop, pin::Pin, ptr};
 use macros::{pin_data, pinned_drop};
 
@@ -21,6 +19,17 @@
 /// Maximum size of an inode.
 pub const MAX_LFS_FILESIZE: i64 = bindings::MAX_LFS_FILESIZE;
 
+/// Type of superblock keying.
+///
+/// It determines how C's `fs_context_operations::get_tree` is implemented.
+pub enum Super {
+    /// Multiple independent superblocks may exist.
+    Independent,
+
+    /// Uses a block device.
+    BlockDev,
+}
+
 /// A file system type.
 pub trait FileSystem {
     /// Data associated with each file system instance (super-block).
@@ -29,6 +38,9 @@ pub trait FileSystem {
     /// The name of the file system type.
     const NAME: &'static CStr;
 
+    /// Determines how superblocks for this file system type are keyed.
+    const SUPER_TYPE: Super = Super::Independent;
+
     /// Returns the parameters to initialise a super block.
     fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>>;
 
@@ -181,7 +193,9 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
                 fs.name = T::NAME.as_char_ptr();
                 fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
                 fs.kill_sb = Some(Self::kill_sb_callback::<T>);
-                fs.fs_flags = 0;
+                fs.fs_flags = if let Super::BlockDev = T::SUPER_TYPE {
+                    bindings::FS_REQUIRES_DEV as i32
+                } else { 0 };
 
                 // SAFETY: Pointers stored in `fs` are static so will live for as long as the
                 // registration is active (it is undone in `drop`).
@@ -204,9 +218,16 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
     unsafe extern "C" fn kill_sb_callback<T: FileSystem + ?Sized>(
         sb_ptr: *mut bindings::super_block,
     ) {
-        // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
-        // the appropriate function to call for cleanup.
-        unsafe { bindings::kill_anon_super(sb_ptr) };
+        match T::SUPER_TYPE {
+            // SAFETY: In `get_tree_callback` we always call `get_tree_bdev` for
+            // `Super::BlockDev`, so `kill_block_super` is the appropriate function to call
+            // for cleanup.
+            Super::BlockDev => unsafe { bindings::kill_block_super(sb_ptr) },
+            // SAFETY: In `get_tree_callback` we always call `get_tree_nodev` for
+            // `Super::Independent`, so `kill_anon_super` is the appropriate function to call
+            // for cleanup.
+            Super::Independent => unsafe { bindings::kill_anon_super(sb_ptr) },
+        }
 
         // SAFETY: The C API contract guarantees that `sb_ptr` is valid for read.
         let ptr = unsafe { (*sb_ptr).s_fs_info };
@@ -479,6 +500,65 @@ pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, New
             })))
         }
     }
+
+    /// Reads a block from the block device.
+    #[cfg(CONFIG_BUFFER_HEAD)]
+    pub fn bread(&self, block: u64) -> Result<ARef<buffer::Head>> {
+        // Fail requests for non-blockdev file systems. This is a compile-time check.
+        match T::SUPER_TYPE {
+            Super::BlockDev => {}
+            _ => return Err(EIO),
+        }
+
+        // SAFETY: This function is only valid after the `NeedsInit` typestate, so the block size
+        // is known and the superblock can be used to read blocks.
+        let ptr =
+            ptr::NonNull::new(unsafe { bindings::sb_bread(self.0.get(), block) }).ok_or(EIO)?;
+        // SAFETY: `sb_bread` returns a referenced buffer head. Ownership of the increment is
+        // passed to the `ARef` instance.
+        Ok(unsafe { ARef::from_raw(ptr.cast()) })
+    }
+
+    /// Reads `size` bytes starting from `offset` bytes.
+    ///
+    /// Returns an iterator that returns slices based on blocks.
+    #[cfg(CONFIG_BUFFER_HEAD)]
+    pub fn read(
+        &self,
+        offset: u64,
+        size: u64,
+    ) -> Result<impl Iterator<Item = Result<buffer::View>> + '_> {
+        struct BlockIter<'a, T: FileSystem + ?Sized> {
+            sb: &'a SuperBlock<T>,
+            next_offset: u64,
+            end: u64,
+        }
+        impl<'a, T: FileSystem + ?Sized> Iterator for BlockIter<'a, T> {
+            type Item = Result<buffer::View>;
+
+            fn next(&mut self) -> Option<Self::Item> {
+                if self.next_offset >= self.end {
+                    return None;
+                }
+
+                // SAFETY: The superblock is valid and has had its block size initialised.
+                let block_size = unsafe { (*self.sb.0.get()).s_blocksize };
+                let bh = match self.sb.bread(self.next_offset / block_size) {
+                    Ok(bh) => bh,
+                    Err(e) => return Some(Err(e)),
+                };
+                let boffset = self.next_offset & (block_size - 1);
+                let bsize = core::cmp::min(self.end - self.next_offset, block_size - boffset);
+                self.next_offset += bsize;
+                Some(Ok(buffer::View::new(bh, boffset as usize, bsize as usize)))
+            }
+        }
+        Ok(BlockIter {
+            sb: self,
+            next_offset: offset,
+            end: offset.checked_add(size).ok_or(ERANGE)?,
+        })
+    }
 }
 
 /// Required superblock parameters.
@@ -511,6 +591,70 @@ pub struct SuperParams<T: ForeignOwnable + Send + Sync> {
 #[repr(transparent)]
 pub struct NewSuperBlock<T: FileSystem + ?Sized>(bindings::super_block, PhantomData<T>);
 
+impl<T: FileSystem + ?Sized> NewSuperBlock<T> {
+    /// Reads sectors.
+    ///
+    /// `count` must be such that the total size doesn't exceed a page.
+    pub fn sread(&self, sector: u64, count: usize, folio: &mut UniqueFolio) -> Result {
+        // Fail requests for non-blockdev file systems. This is a compile-time check.
+        match T::SUPER_TYPE {
+            // The superblock is valid and given that it's a blockdev superblock it must have a
+            // valid `s_bdev`.
+            Super::BlockDev => {}
+            _ => return Err(EIO),
+        }
+
+        crate::build_assert!(count * (bindings::SECTOR_SIZE as usize) <= bindings::PAGE_SIZE);
+
+        // Read the sectors.
+        let mut bio = bindings::bio::default();
+        let bvec = Opaque::<bindings::bio_vec>::uninit();
+
+        // SAFETY: `bio` and `bvec` are allocated on the stack, they're both valid.
+        unsafe {
+            bindings::bio_init(
+                &mut bio,
+                self.0.s_bdev,
+                bvec.get(),
+                1,
+                bindings::req_op_REQ_OP_READ,
+            )
+        };
+
+        // SAFETY: `bio` was just initialised with `bio_init` above, so it's safe to call
+        // `bio_uninit` on the way out.
+        let mut bio =
+            ScopeGuard::new_with_data(bio, |mut b| unsafe { bindings::bio_uninit(&mut b) });
+
+        // SAFETY: We have one free `bvec` (initialsied above). We also know that size won't exceed
+        // a page size (build_assert above).
+        unsafe {
+            bindings::bio_add_folio_nofail(
+                &mut *bio,
+                folio.0 .0.get(),
+                count * (bindings::SECTOR_SIZE as usize),
+                0,
+            )
+        };
+        bio.bi_iter.bi_sector = sector;
+
+        // SAFETY: The bio was fully initialised above.
+        to_result(unsafe { bindings::submit_bio_wait(&mut *bio) })?;
+        Ok(())
+    }
+
+    /// Returns the number of sectors in the underlying block device.
+    pub fn sector_count(&self) -> Result<u64> {
+        // Fail requests for non-blockdev file systems. This is a compile-time check.
+        match T::SUPER_TYPE {
+            // The superblock is valid and given that it's a blockdev superblock it must have a
+            // valid `s_bdev`.
+            Super::BlockDev => Ok(unsafe { bindings::bdev_nr_sectors(self.0.s_bdev) }),
+            _ => Err(EIO),
+        }
+    }
+}
+
 struct Tables<T: FileSystem + ?Sized>(T);
 impl<T: FileSystem + ?Sized> Tables<T> {
     const CONTEXT: bindings::fs_context_operations = bindings::fs_context_operations {
@@ -523,9 +667,18 @@ impl<T: FileSystem + ?Sized> Tables<T> {
     };
 
     unsafe extern "C" fn get_tree_callback(fc: *mut bindings::fs_context) -> core::ffi::c_int {
-        // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
-        // the right type and is a valid callback.
-        unsafe { bindings::get_tree_nodev(fc, Some(Self::fill_super_callback)) }
+        match T::SUPER_TYPE {
+            // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
+            // the right type and is a valid callback.
+            Super::BlockDev => unsafe {
+                bindings::get_tree_bdev(fc, Some(Self::fill_super_callback))
+            },
+            // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
+            // the right type and is a valid callback.
+            Super::Independent => unsafe {
+                bindings::get_tree_nodev(fc, Some(Self::fill_super_callback))
+            },
+        }
     }
 
     unsafe extern "C" fn fill_super_callback(
diff --git a/rust/kernel/fs/buffer.rs b/rust/kernel/fs/buffer.rs
index 6052af8822b3..de23d0fee66c 100644
--- a/rust/kernel/fs/buffer.rs
+++ b/rust/kernel/fs/buffer.rs
@@ -49,7 +49,6 @@ pub struct View {
 }
 
 impl View {
-    #[allow(dead_code)]
     pub(crate) fn new(head: ARef<Head>, offset: usize, size: usize) -> Self {
         Self { head, size, offset }
     }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 17/19] rust: fs: allow per-inode data
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (15 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 16/19] rust: fs: allow file systems backed by a block device Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-21 13:57   ` Benno Lossin
  2024-01-03 14:39   ` Andreas Hindborg (Samsung)
  2023-10-18 12:25 ` [RFC PATCH 18/19] rust: fs: export file type from mode constants Wedson Almeida Filho
                   ` (3 subsequent siblings)
  20 siblings, 2 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to attach extra [typed] data to each inode. If
no data is needed, use the regular inode kmem_cache, otherwise we create
a new one.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/helpers.c            |   7 +++
 rust/kernel/fs.rs         | 128 +++++++++++++++++++++++++++++++++++---
 rust/kernel/mem_cache.rs  |   2 -
 samples/rust/rust_rofs.rs |   9 ++-
 4 files changed, 131 insertions(+), 15 deletions(-)

diff --git a/rust/helpers.c b/rust/helpers.c
index bc19f3b7b93e..7b12a6d4cf5c 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -222,6 +222,13 @@ void rust_helper_kunmap_local(const void *vaddr)
 }
 EXPORT_SYMBOL_GPL(rust_helper_kunmap_local);
 
+void *rust_helper_alloc_inode_sb(struct super_block *sb,
+				 struct kmem_cache *cache, gfp_t gfp)
+{
+	return alloc_inode_sb(sb, cache, gfp);
+}
+EXPORT_SYMBOL_GPL(rust_helper_alloc_inode_sb);
+
 void rust_helper_i_uid_write(struct inode *inode, uid_t uid)
 {
 	i_uid_write(inode, uid);
diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index b1ad5c110dbb..b07203758674 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -9,8 +9,12 @@
 use crate::error::{code::*, from_result, to_result, Error, Result};
 use crate::folio::{LockedFolio, UniqueFolio};
 use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Opaque, ScopeGuard};
-use crate::{bindings, init::PinInit, str::CStr, time::Timespec, try_pin_init, ThisModule};
-use core::{marker::PhantomData, marker::PhantomPinned, mem::ManuallyDrop, pin::Pin, ptr};
+use crate::{
+    bindings, container_of, init::PinInit, mem_cache::MemCache, str::CStr, time::Timespec,
+    try_pin_init, ThisModule,
+};
+use core::mem::{size_of, ManuallyDrop, MaybeUninit};
+use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin, ptr};
 use macros::{pin_data, pinned_drop};
 
 #[cfg(CONFIG_BUFFER_HEAD)]
@@ -35,6 +39,9 @@ pub trait FileSystem {
     /// Data associated with each file system instance (super-block).
     type Data: ForeignOwnable + Send + Sync;
 
+    /// Type of data associated with each inode.
+    type INodeData: Send + Sync;
+
     /// The name of the file system type.
     const NAME: &'static CStr;
 
@@ -165,6 +172,7 @@ fn try_from(v: u32) -> Result<Self> {
 pub struct Registration {
     #[pin]
     fs: Opaque<bindings::file_system_type>,
+    inode_cache: Option<MemCache>,
     #[pin]
     _pin: PhantomPinned,
 }
@@ -182,6 +190,14 @@ impl Registration {
     pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<Self, Error> {
         try_pin_init!(Self {
             _pin: PhantomPinned,
+            inode_cache: if size_of::<T::INodeData>() == 0 {
+                None
+            } else {
+                Some(MemCache::try_new::<INodeWithData<T::INodeData>>(
+                    T::NAME,
+                    Some(Self::inode_init_once_callback::<T>),
+                )?)
+            },
             fs <- Opaque::try_ffi_init(|fs_ptr: *mut bindings::file_system_type| {
                 // SAFETY: `try_ffi_init` guarantees that `fs_ptr` is valid for write.
                 unsafe { fs_ptr.write(bindings::file_system_type::default()) };
@@ -239,6 +255,16 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
             unsafe { T::Data::from_foreign(ptr) };
         }
     }
+
+    unsafe extern "C" fn inode_init_once_callback<T: FileSystem + ?Sized>(
+        outer_inode: *mut core::ffi::c_void,
+    ) {
+        let ptr = outer_inode.cast::<INodeWithData<T::INodeData>>();
+
+        // SAFETY: This is only used in `new`, so we know that we have a valid `INodeWithData`
+        // instance whose inode part can be initialised.
+        unsafe { bindings::inode_init_once(ptr::addr_of_mut!((*ptr).inode)) };
+    }
 }
 
 #[pinned_drop]
@@ -280,6 +306,15 @@ pub fn super_block(&self) -> &SuperBlock<T> {
         unsafe { &*(*self.0.get()).i_sb.cast() }
     }
 
+    /// Returns the data associated with the inode.
+    pub fn data(&self) -> &T::INodeData {
+        let outerp = container_of!(self.0.get(), INodeWithData<T::INodeData>, inode);
+        // SAFETY: `self` is guaranteed to be valid by the existence of a shared reference
+        // (`&self`) to it. Additionally, we know `T::INodeData` is always initialised in an
+        // `INode`.
+        unsafe { &*(*outerp).data.as_ptr() }
+    }
+
     /// Returns the size of the inode contents.
     pub fn size(&self) -> i64 {
         // SAFETY: `self` is guaranteed to be valid by the existence of a shared reference.
@@ -300,15 +335,29 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
     }
 }
 
+struct INodeWithData<T> {
+    data: MaybeUninit<T>,
+    inode: bindings::inode,
+}
+
 /// An inode that is locked and hasn't been initialised yet.
 #[repr(transparent)]
 pub struct NewINode<T: FileSystem + ?Sized>(ARef<INode<T>>);
 
 impl<T: FileSystem + ?Sized> NewINode<T> {
     /// Initialises the new inode with the given parameters.
-    pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
-        // SAFETY: This is a new inode, so it's safe to manipulate it mutably.
-        let inode = unsafe { &mut *self.0 .0.get() };
+    pub fn init(self, params: INodeParams<T::INodeData>) -> Result<ARef<INode<T>>> {
+        let outerp = container_of!(self.0 .0.get(), INodeWithData<T::INodeData>, inode);
+
+        // SAFETY: This is a newly-created inode. No other references to it exist, so it is
+        // safe to mutably dereference it.
+        let outer = unsafe { &mut *outerp.cast_mut() };
+
+        // N.B. We must always write this to a newly allocated inode because the free callback
+        // expects the data to be initialised and drops it.
+        outer.data.write(params.value);
+
+        let inode = &mut outer.inode;
 
         let mode = match params.typ {
             INodeType::Dir => {
@@ -424,7 +473,7 @@ pub enum INodeType {
 /// Required inode parameters.
 ///
 /// This is used when creating new inodes.
-pub struct INodeParams {
+pub struct INodeParams<T> {
     /// The access mode. It's a mask that grants execute (1), write (2) and read (4) access to
     /// everyone, the owner group, and the owner.
     pub mode: u16,
@@ -459,6 +508,9 @@ pub struct INodeParams {
 
     /// Last access time.
     pub atime: Timespec,
+
+    /// Value to attach to this node.
+    pub value: T,
 }
 
 /// A file system super block.
@@ -735,8 +787,12 @@ impl<T: FileSystem + ?Sized> Tables<T> {
     }
 
     const SUPER_BLOCK: bindings::super_operations = bindings::super_operations {
-        alloc_inode: None,
-        destroy_inode: None,
+        alloc_inode: if size_of::<T::INodeData>() != 0 {
+            Some(Self::alloc_inode_callback)
+        } else {
+            None
+        },
+        destroy_inode: Some(Self::destroy_inode_callback),
         free_inode: None,
         dirty_inode: None,
         write_inode: None,
@@ -766,6 +822,61 @@ impl<T: FileSystem + ?Sized> Tables<T> {
         shutdown: None,
     };
 
+    unsafe extern "C" fn alloc_inode_callback(
+        sb: *mut bindings::super_block,
+    ) -> *mut bindings::inode {
+        // SAFETY: The callback contract guarantees that `sb` is valid for read.
+        let super_type = unsafe { (*sb).s_type };
+
+        // SAFETY: This callback is only used in `Registration`, so `super_type` is necessarily
+        // embedded in a `Registration`, which is guaranteed to be valid because it has a
+        // superblock associated to it.
+        let reg = unsafe { &*container_of!(super_type, Registration, fs) };
+
+        // SAFETY: `sb` and `cache` are guaranteed to be valid by the callback contract and by
+        // the existence of a superblock respectively.
+        let ptr = unsafe {
+            bindings::alloc_inode_sb(sb, MemCache::ptr(&reg.inode_cache), bindings::GFP_KERNEL)
+        }
+        .cast::<INodeWithData<T::INodeData>>();
+        if ptr.is_null() {
+            return ptr::null_mut();
+        }
+        ptr::addr_of_mut!((*ptr).inode)
+    }
+
+    unsafe extern "C" fn destroy_inode_callback(inode: *mut bindings::inode) {
+        // SAFETY: By the C contract, `inode` is a valid pointer.
+        let is_bad = unsafe { bindings::is_bad_inode(inode) };
+
+        // SAFETY: The inode is guaranteed to be valid by the callback contract. Additionally, the
+        // superblock is also guaranteed to still be valid by the inode existence.
+        let super_type = unsafe { (*(*inode).i_sb).s_type };
+
+        // SAFETY: This callback is only used in `Registration`, so `super_type` is necessarily
+        // embedded in a `Registration`, which is guaranteed to be valid because it has a
+        // superblock associated to it.
+        let reg = unsafe { &*container_of!(super_type, Registration, fs) };
+        let ptr = container_of!(inode, INodeWithData<T::INodeData>, inode).cast_mut();
+
+        if !is_bad {
+            // SAFETY: The code either initialises the data or marks the inode as bad. Since the
+            // inode is not bad, the data is initialised, and thus safe to drop.
+            unsafe { ptr::drop_in_place((*ptr).data.as_mut_ptr()) };
+        }
+
+        if size_of::<T::INodeData>() == 0 {
+            // SAFETY: When the size of `INodeData` is zero, we don't use a separate mem_cache, so
+            // it is allocated from the regular mem_cache, which is what `free_inode_nonrcu` uses
+            // to free the inode.
+            unsafe { bindings::free_inode_nonrcu(inode) };
+        } else {
+            // The callback contract guarantees that the inode was previously allocated via the
+            // `alloc_inode_callback` callback, so it is safe to free it back to the cache.
+            unsafe { bindings::kmem_cache_free(MemCache::ptr(&reg.inode_cache), ptr.cast()) };
+        }
+    }
+
     unsafe extern "C" fn statfs_callback(
         dentry: *mut bindings::dentry,
         buf: *mut bindings::kstatfs,
@@ -1120,6 +1231,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
 /// struct MyFs;
 /// impl fs::FileSystem for MyFs {
 ///     type Data = ();
+///     type INodeData =();
 ///     const NAME: &'static CStr = c_str!("myfs");
 ///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>> {
 ///         todo!()
diff --git a/rust/kernel/mem_cache.rs b/rust/kernel/mem_cache.rs
index 05e5f2bc9781..bf6ce2d2d3e1 100644
--- a/rust/kernel/mem_cache.rs
+++ b/rust/kernel/mem_cache.rs
@@ -20,7 +20,6 @@ impl MemCache {
     /// Allocates a new `kmem_cache` for type `T`.
     ///
     /// `init` is called by the C code when entries are allocated.
-    #[allow(dead_code)]
     pub(crate) fn try_new<T>(
         name: &'static CStr,
         init: Option<unsafe extern "C" fn(*mut core::ffi::c_void)>,
@@ -43,7 +42,6 @@ pub(crate) fn try_new<T>(
     /// Returns the pointer to the `kmem_cache` instance, or null if it's `None`.
     ///
     /// This is a helper for functions like `alloc_inode_sb` where the cache is optional.
-    #[allow(dead_code)]
     pub(crate) fn ptr(c: &Option<Self>) -> *mut bindings::kmem_cache {
         match c {
             Some(m) => m.ptr.as_ptr(),
diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
index 093425650f26..dfe745439842 100644
--- a/samples/rust/rust_rofs.rs
+++ b/samples/rust/rust_rofs.rs
@@ -53,6 +53,7 @@ struct Entry {
 struct RoFs;
 impl fs::FileSystem for RoFs {
     type Data = ();
+    type INodeData = &'static Entry;
     const NAME: &'static CStr = c_str!("rust-fs");
 
     fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>> {
@@ -79,6 +80,7 @@ fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
                 atime: UNIX_EPOCH,
                 ctime: UNIX_EPOCH,
                 mtime: UNIX_EPOCH,
+                value: &ENTRIES[0],
             }),
         }
     }
@@ -122,6 +124,7 @@ fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>> {
                         atime: UNIX_EPOCH,
                         ctime: UNIX_EPOCH,
                         mtime: UNIX_EPOCH,
+                        value: e,
                     }),
                 };
             }
@@ -131,11 +134,7 @@ fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>> {
     }
 
     fn read_folio(inode: &INode<Self>, mut folio: LockedFolio<'_>) -> Result {
-        let data = match inode.ino() {
-            2 => ENTRIES[2].contents,
-            3 => ENTRIES[3].contents,
-            _ => return Err(EINVAL),
-        };
+        let data = inode.data().contents;
 
         let pos = usize::try_from(folio.pos()).unwrap_or(usize::MAX);
         let copied = if pos >= data.len() {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 18/19] rust: fs: export file type from mode constants
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (16 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 17/19] rust: fs: allow per-inode data Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-18 12:25 ` [RFC PATCH 19/19] tarfs: introduce tar fs Wedson Almeida Filho
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file system modules to use these constants if needed.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/kernel/fs.rs | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
index b07203758674..235a86ed1127 100644
--- a/rust/kernel/fs.rs
+++ b/rust/kernel/fs.rs
@@ -20,6 +20,33 @@
 #[cfg(CONFIG_BUFFER_HEAD)]
 pub mod buffer;
 
+/// Contains constants related to Linux file modes.
+pub mod mode {
+    /// A bitmask used to the file type from a mode value.
+    pub const S_IFMT: u32 = bindings::S_IFMT;
+
+    /// File type constant for block devices.
+    pub const S_IFBLK: u32 = bindings::S_IFBLK;
+
+    /// File type constant for char devices.
+    pub const S_IFCHR: u32 = bindings::S_IFCHR;
+
+    /// File type constant for directories.
+    pub const S_IFDIR: u32 = bindings::S_IFDIR;
+
+    /// File type constant for pipes.
+    pub const S_IFIFO: u32 = bindings::S_IFIFO;
+
+    /// File type constant for symbolic links.
+    pub const S_IFLNK: u32 = bindings::S_IFLNK;
+
+    /// File type constant for regular files.
+    pub const S_IFREG: u32 = bindings::S_IFREG;
+
+    /// File type constant for sockets.
+    pub const S_IFSOCK: u32 = bindings::S_IFSOCK;
+}
+
 /// Maximum size of an inode.
 pub const MAX_LFS_FILESIZE: i64 = bindings::MAX_LFS_FILESIZE;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 19/19] tarfs: introduce tar fs
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (17 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 18/19] rust: fs: export file type from mode constants Wedson Almeida Filho
@ 2023-10-18 12:25 ` Wedson Almeida Filho
  2023-10-18 16:57   ` Matthew Wilcox
  2024-01-24  5:05   ` Matthew Wilcox
  2023-10-18 13:40 ` [RFC PATCH 00/19] Rust abstractions for VFS Ariel Miculas (amiculas)
  2023-10-29 20:31 ` Matthew Wilcox
  20 siblings, 2 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 12:25 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

From: Wedson Almeida Filho <walmeida@microsoft.com>

It is a file system based on tar files and an index appended to them (to
facilitate finding fs entries without having to traverse the whole tar
file).

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 fs/Kconfig                        |   1 +
 fs/Makefile                       |   1 +
 fs/tarfs/Kconfig                  |  16 ++
 fs/tarfs/Makefile                 |   8 +
 fs/tarfs/defs.rs                  |  80 ++++++++
 fs/tarfs/tar.rs                   | 322 ++++++++++++++++++++++++++++++
 scripts/generate_rust_analyzer.py |   2 +-
 7 files changed, 429 insertions(+), 1 deletion(-)
 create mode 100644 fs/tarfs/Kconfig
 create mode 100644 fs/tarfs/Makefile
 create mode 100644 fs/tarfs/defs.rs
 create mode 100644 fs/tarfs/tar.rs

diff --git a/fs/Kconfig b/fs/Kconfig
index aa7e03cc1941..f4b8c33ea624 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -331,6 +331,7 @@ source "fs/sysv/Kconfig"
 source "fs/ufs/Kconfig"
 source "fs/erofs/Kconfig"
 source "fs/vboxsf/Kconfig"
+source "fs/tarfs/Kconfig"
 
 endif # MISC_FILESYSTEMS
 
diff --git a/fs/Makefile b/fs/Makefile
index f9541f40be4e..e3389f8b049d 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -129,3 +129,4 @@ obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
 obj-$(CONFIG_EROFS_FS)		+= erofs/
 obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
 obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
+obj-$(CONFIG_TARFS_FS)		+= tarfs/
diff --git a/fs/tarfs/Kconfig b/fs/tarfs/Kconfig
new file mode 100644
index 000000000000..d3e19eb2adbc
--- /dev/null
+++ b/fs/tarfs/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+
+config TARFS_FS
+	tristate "TAR file system support"
+	depends on RUST && BLOCK
+	select BUFFER_HEAD
+	help
+	  This is a simple read-only file system intended for mounting
+	  tar files that have had an index appened to them.
+
+	  To compile this file system support as a module, choose M here: the
+	  module will be called tarfs.
+
+	  If you don't know whether you need it, then you don't need it:
+	  answer N.
diff --git a/fs/tarfs/Makefile b/fs/tarfs/Makefile
new file mode 100644
index 000000000000..011c5d64fbe3
--- /dev/null
+++ b/fs/tarfs/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the linux tarfs filesystem routines.
+#
+
+obj-$(CONFIG_TARFS_FS) += tarfs.o
+
+tarfs-y := tar.o
diff --git a/fs/tarfs/defs.rs b/fs/tarfs/defs.rs
new file mode 100644
index 000000000000..7481b75aaab2
--- /dev/null
+++ b/fs/tarfs/defs.rs
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Definitions of tarfs structures.
+
+use kernel::types::LE;
+
+/// Flags used in [`Inode::flags`].
+pub mod inode_flags {
+    /// Indicates that the inode is opaque.
+    ///
+    /// When set, inode will have the "trusted.overlay.opaque" set to "y" at runtime.
+    pub const OPAQUE: u8 = 0x1;
+}
+
+kernel::derive_readable_from_bytes! {
+    /// An inode in the tarfs inode table.
+    #[repr(C)]
+    pub struct Inode {
+        /// The mode of the inode.
+        ///
+        /// The bottom 9 bits are the rwx bits for owner, group, all.
+        ///
+        /// The bits in the [`S_IFMT`] mask represent the file mode.
+        pub mode: LE<u16>,
+
+        /// Tarfs flags for the inode.
+        ///
+        /// Values are drawn from the [`inode_flags`] module.
+        pub flags: u8,
+
+        /// The bottom 4 bits represent the top 4 bits of mtime.
+        pub hmtime: u8,
+
+        /// The owner of the inode.
+        pub owner: LE<u32>,
+
+        /// The group of the inode.
+        pub group: LE<u32>,
+
+        /// The bottom 32 bits of mtime.
+        pub lmtime: LE<u32>,
+
+        /// Size of the contents of the inode.
+        pub size: LE<u64>,
+
+        /// Either the offset to the data, or the major and minor numbers of a device.
+        ///
+        /// For the latter, the 32 LSB are the minor, and the 32 MSB are the major numbers.
+        pub offset: LE<u64>,
+    }
+
+    /// An entry in a tarfs directory entry table.
+    #[repr(C)]
+    pub struct DirEntry {
+        /// The inode number this entry refers to.
+        pub ino: LE<u64>,
+
+        /// The offset to the name of the entry.
+        pub name_offset: LE<u64>,
+
+        /// The length of the name of the entry.
+        pub name_len: LE<u64>,
+
+        /// The type of entry.
+        pub etype: u8,
+
+        /// Unused padding.
+        pub _padding: [u8; 7],
+    }
+
+    /// The super-block of a tarfs instance.
+    #[repr(C)]
+    pub struct Header {
+        /// The offset to the beginning of the inode-table.
+        pub inode_table_offset: LE<u64>,
+
+        /// The number of inodes in the file system.
+        pub inode_count: LE<u64>,
+    }
+}
diff --git a/fs/tarfs/tar.rs b/fs/tarfs/tar.rs
new file mode 100644
index 000000000000..1a71b1ccf8e7
--- /dev/null
+++ b/fs/tarfs/tar.rs
@@ -0,0 +1,322 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! File system based on tar files and an index.
+
+use core::mem::size_of;
+use defs::*;
+use kernel::fs::{
+    DirEmitter, DirEntryType, INode, INodeParams, INodeType, NewSuperBlock, Stat, Super,
+    SuperBlock, SuperParams,
+};
+use kernel::types::{ARef, Either, FromBytes};
+use kernel::{c_str, folio::Folio, folio::LockedFolio, fs, prelude::*};
+
+pub mod defs;
+
+kernel::module_fs! {
+    type: TarFs,
+    name: "tarfs",
+    author: "Wedson Almeida Filho <walmeida@microsoft.com>",
+    description: "File system for indexed tar files",
+    license: "GPL",
+}
+
+const SECTOR_SIZE: u64 = 512;
+const TARFS_BSIZE: u64 = 1 << TARFS_BSIZE_BITS;
+const TARFS_BSIZE_BITS: u8 = 12;
+const SECTORS_PER_BLOCK: u64 = TARFS_BSIZE / SECTOR_SIZE;
+const TARFS_MAGIC: u32 = 0x54415246;
+
+static_assert!(SECTORS_PER_BLOCK > 0);
+
+struct INodeData {
+    offset: u64,
+    flags: u8,
+}
+
+struct TarFs {
+    data_size: u64,
+    inode_table_offset: u64,
+    inode_count: u64,
+}
+
+impl TarFs {
+    fn iget(sb: &SuperBlock<Self>, ino: u64) -> Result<ARef<INode<Self>>> {
+        // Check that the inode number is valid.
+        let h = sb.data();
+        if ino == 0 || ino > h.inode_count {
+            return Err(ENOENT);
+        }
+
+        // Create an inode or find an existing (cached) one.
+        let inode = match sb.get_or_create_inode(ino)? {
+            Either::Left(existing) => return Ok(existing),
+            Either::Right(new) => new,
+        };
+
+        static_assert!((TARFS_BSIZE as usize) % size_of::<Inode>() == 0);
+
+        // Load inode details from storage.
+        let offset = h.inode_table_offset + (ino - 1) * u64::try_from(size_of::<Inode>())?;
+
+        let bh = sb.bread(offset / TARFS_BSIZE)?;
+        let b = bh.data();
+        let idata = Inode::from_bytes(b, (offset & (TARFS_BSIZE - 1)) as usize).ok_or(EIO)?;
+
+        let mode = idata.mode.value();
+
+        // Ignore inodes that have unknown mode bits.
+        if (u32::from(mode) & !(fs::mode::S_IFMT | 0o777)) != 0 {
+            return Err(ENOENT);
+        }
+
+        let doffset = idata.offset.value();
+        let size = idata.size.value().try_into()?;
+        let secs = u64::from(idata.lmtime.value()) | (u64::from(idata.hmtime & 0xf) << 32);
+        let ts = kernel::time::Timespec::new(secs, 0)?;
+        let typ = match u32::from(mode) & fs::mode::S_IFMT {
+            fs::mode::S_IFREG => INodeType::Reg,
+            fs::mode::S_IFDIR => INodeType::Dir,
+            fs::mode::S_IFLNK => INodeType::Lnk,
+            fs::mode::S_IFSOCK => INodeType::Sock,
+            fs::mode::S_IFIFO => INodeType::Fifo,
+            fs::mode::S_IFCHR => INodeType::Chr((doffset >> 32) as u32, doffset as u32),
+            fs::mode::S_IFBLK => INodeType::Blk((doffset >> 32) as u32, doffset as u32),
+            _ => return Err(ENOENT),
+        };
+        inode.init(INodeParams {
+            typ,
+            mode: mode & 0o777,
+            size,
+            blocks: (idata.size.value() + TARFS_BSIZE - 1) / TARFS_BSIZE,
+            nlink: 1,
+            uid: idata.owner.value(),
+            gid: idata.group.value(),
+            ctime: ts,
+            mtime: ts,
+            atime: ts,
+            value: INodeData {
+                offset: doffset,
+                flags: idata.flags,
+            },
+        })
+    }
+
+    fn name_eq(sb: &SuperBlock<Self>, mut name: &[u8], offset: u64) -> Result<bool> {
+        for v in sb.read(offset, name.len().try_into()?)? {
+            let v = v?;
+            let b = v.data();
+            if b != &name[..b.len()] {
+                return Ok(false);
+            }
+            name = &name[b.len()..];
+        }
+        Ok(true)
+    }
+
+    fn read_name(sb: &SuperBlock<Self>, mut name: &mut [u8], offset: u64) -> Result<bool> {
+        for v in sb.read(offset, name.len().try_into()?)? {
+            let v = v?;
+            let b = v.data();
+            name[..b.len()].copy_from_slice(b);
+            name = &mut name[b.len()..];
+        }
+        Ok(true)
+    }
+}
+
+impl fs::FileSystem for TarFs {
+    type Data = Box<Self>;
+    type INodeData = INodeData;
+    const NAME: &'static CStr = c_str!("tar");
+    const SUPER_TYPE: Super = Super::BlockDev;
+
+    fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>> {
+        let scount = sb.sector_count()?;
+        if scount < SECTORS_PER_BLOCK {
+            pr_err!("Block device is too small: sector count={scount}\n");
+            return Err(ENXIO);
+        }
+
+        let tarfs = {
+            let mut folio = Folio::try_new(0)?;
+            sb.sread(
+                (scount / SECTORS_PER_BLOCK - 1) * SECTORS_PER_BLOCK,
+                SECTORS_PER_BLOCK as usize,
+                &mut folio,
+            )?;
+            let mapped = folio.map_page(0)?;
+            let hdr =
+                Header::from_bytes(&mapped, (TARFS_BSIZE - SECTOR_SIZE) as usize).ok_or(EIO)?;
+            Box::try_new(TarFs {
+                inode_table_offset: hdr.inode_table_offset.value(),
+                inode_count: hdr.inode_count.value(),
+                data_size: scount.checked_mul(SECTOR_SIZE).ok_or(ERANGE)?,
+            })?
+        };
+
+        // Check that the inode table starts within the device data and is aligned to the block
+        // size.
+        if tarfs.inode_table_offset >= tarfs.data_size {
+            pr_err!(
+                "inode table offset beyond data size: {} >= {}\n",
+                tarfs.inode_table_offset,
+                tarfs.data_size
+            );
+            return Err(E2BIG);
+        }
+
+        if tarfs.inode_table_offset % SECTOR_SIZE != 0 {
+            pr_err!(
+                "inode table offset not aligned to sector size: {}\n",
+                tarfs.inode_table_offset,
+            );
+            return Err(EDOM);
+        }
+
+        // Check that the last inode is within bounds (and that there is no overflow when
+        // calculating its offset).
+        let offset = tarfs
+            .inode_count
+            .checked_mul(u64::try_from(size_of::<Inode>())?)
+            .ok_or(ERANGE)?
+            .checked_add(tarfs.inode_table_offset)
+            .ok_or(ERANGE)?;
+        if offset > tarfs.data_size {
+            pr_err!(
+                "inode table extends beyond the data size : {} > {}\n",
+                tarfs.inode_table_offset + (tarfs.inode_count * size_of::<Inode>() as u64),
+                tarfs.data_size,
+            );
+            return Err(E2BIG);
+        }
+
+        Ok(SuperParams {
+            magic: TARFS_MAGIC,
+            blocksize_bits: TARFS_BSIZE_BITS,
+            maxbytes: fs::MAX_LFS_FILESIZE,
+            time_gran: 1000000000,
+            data: tarfs,
+        })
+    }
+
+    fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
+        Self::iget(sb, 1)
+    }
+
+    fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result {
+        let sb = inode.super_block();
+        let mut name = Vec::<u8>::new();
+        let pos = emitter.pos();
+
+        if pos < 0 || pos % size_of::<DirEntry>() as i64 != 0 {
+            return Err(ENOENT);
+        }
+
+        if pos >= inode.size() {
+            return Ok(());
+        }
+
+        // Make sure the inode data doesn't overflow the data area.
+        let size = u64::try_from(inode.size())?;
+        if inode.data().offset.checked_add(size).ok_or(EIO)? > sb.data().data_size {
+            return Err(EIO);
+        }
+
+        for v in sb.read(inode.data().offset + pos as u64, size - pos as u64)? {
+            for e in DirEntry::from_bytes_to_slice(v?.data()).ok_or(EIO)? {
+                let name_len = usize::try_from(e.name_len.value())?;
+                if name_len > name.len() {
+                    name.try_resize(name_len, 0)?;
+                }
+
+                Self::read_name(sb, &mut name[..name_len], e.name_offset.value())?;
+
+                if !emitter.emit(
+                    size_of::<DirEntry>() as i64,
+                    &name[..name_len],
+                    e.ino.value(),
+                    DirEntryType::try_from(u32::from(e.etype))?,
+                ) {
+                    return Ok(());
+                }
+            }
+        }
+
+        Ok(())
+    }
+
+    fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>> {
+        let name_len = u64::try_from(name.len())?;
+        let sb = parent.super_block();
+
+        for v in sb.read(parent.data().offset, parent.size().try_into()?)? {
+            for e in DirEntry::from_bytes_to_slice(v?.data()).ok_or(EIO)? {
+                if e.name_len.value() != name_len || e.name_len.value() > usize::MAX as u64 {
+                    continue;
+                }
+                if Self::name_eq(sb, name, e.name_offset.value())? {
+                    return Self::iget(sb, e.ino.value());
+                }
+            }
+        }
+
+        Err(ENOENT)
+    }
+
+    fn read_folio(inode: &INode<Self>, mut folio: LockedFolio<'_>) -> Result {
+        let pos = u64::try_from(folio.pos()).unwrap_or(u64::MAX);
+        let size = u64::try_from(inode.size())?;
+        let sb = inode.super_block();
+
+        let copied = if pos >= size {
+            0
+        } else {
+            let offset = inode.data().offset.checked_add(pos).ok_or(ERANGE)?;
+            let len = core::cmp::min(size - pos, folio.size().try_into()?);
+            let mut foffset = 0;
+
+            if offset.checked_add(len).ok_or(ERANGE)? > sb.data().data_size {
+                return Err(EIO);
+            }
+
+            for v in sb.read(offset, len)? {
+                let v = v?;
+                folio.write(foffset, v.data())?;
+                foffset += v.data().len();
+            }
+            foffset
+        };
+
+        folio.zero_out(copied, folio.size() - copied)?;
+        folio.mark_uptodate();
+        folio.flush_dcache();
+
+        Ok(())
+    }
+
+    fn read_xattr(inode: &INode<Self>, name: &CStr, outbuf: &mut [u8]) -> Result<usize> {
+        if inode.data().flags & inode_flags::OPAQUE == 0
+            || name.as_bytes() != b"trusted.overlay.opaque"
+        {
+            return Err(ENODATA);
+        }
+
+        if !outbuf.is_empty() {
+            outbuf[0] = b'y';
+        }
+
+        Ok(1)
+    }
+
+    fn statfs(sb: &SuperBlock<Self>) -> Result<Stat> {
+        let data = sb.data();
+        Ok(Stat {
+            magic: TARFS_MAGIC,
+            namelen: i64::MAX,
+            bsize: TARFS_BSIZE as _,
+            blocks: data.inode_table_offset / TARFS_BSIZE,
+            files: data.inode_count,
+        })
+    }
+}
diff --git a/scripts/generate_rust_analyzer.py b/scripts/generate_rust_analyzer.py
index fc52bc41d3e7..8dc74991894e 100755
--- a/scripts/generate_rust_analyzer.py
+++ b/scripts/generate_rust_analyzer.py
@@ -116,7 +116,7 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs):
     # Then, the rest outside of `rust/`.
     #
     # We explicitly mention the top-level folders we want to cover.
-    extra_dirs = map(lambda dir: srctree / dir, ("samples", "drivers"))
+    extra_dirs = map(lambda dir: srctree / dir, ("samples", "drivers", "fs"))
     if external_src is not None:
         extra_dirs = [external_src]
     for folder in extra_dirs:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 11/19] rust: fs: introduce `FileSystem::read_xattr`
  2023-10-18 12:25 ` [RFC PATCH 11/19] rust: fs: introduce `FileSystem::read_xattr` Wedson Almeida Filho
@ 2023-10-18 13:06   ` Ariel Miculas (amiculas)
  2023-10-19 13:35     ` Wedson Almeida Filho
  0 siblings, 1 reply; 125+ messages in thread
From: Ariel Miculas (amiculas) @ 2023-10-18 13:06 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	Wedson Almeida Filho

On 23/10/18 09:25AM, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to expose xattrs associated with inodes.
> `overlayfs` uses an xattr to indicate that a directory is opaque (i.e.,
> that lower layers should not be looked up). The planned file systems
> need to support opaque directories, so they must be able to implement
> this.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>  rust/bindings/bindings_helper.h |  1 +
>  rust/kernel/error.rs            |  2 ++
>  rust/kernel/fs.rs               | 43 +++++++++++++++++++++++++++++++++
>  3 files changed, 46 insertions(+)
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 53a99ea512d1..fa754c5e85a2 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -15,6 +15,7 @@
>  #include <linux/refcount.h>
>  #include <linux/wait.h>
>  #include <linux/sched.h>
> +#include <linux/xattr.h>
>  
>  /* `bindgen` gets confused at certain things. */
>  const size_t BINDINGS_ARCH_SLAB_MINALIGN = ARCH_SLAB_MINALIGN;
> diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
> index 484fa7c11de1..6c167583b275 100644
> --- a/rust/kernel/error.rs
> +++ b/rust/kernel/error.rs
> @@ -81,6 +81,8 @@ macro_rules! declare_err {
>      declare_err!(EIOCBQUEUED, "iocb queued, will get completion event.");
>      declare_err!(ERECALLCONFLICT, "Conflict with recalled state.");
>      declare_err!(ENOGRACE, "NFS file lock reclaim refused.");
> +    declare_err!(ENODATA, "No data available.");
> +    declare_err!(EOPNOTSUPP, "Operation not supported on transport endpoint.");
>  }
>  
>  /// Generic integer kernel error.
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index ee3dce87032b..adf9cbee16d2 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -42,6 +42,14 @@ pub trait FileSystem {
>  
>      /// Reads the contents of the inode into the given folio.
>      fn read_folio(inode: &INode<Self>, folio: LockedFolio<'_>) -> Result;
> +
> +    /// Reads an xattr.
> +    ///
> +    /// Returns the number of bytes written to `outbuf`. If it is too small, returns the number of
> +    /// bytes needs to hold the attribute.
> +    fn read_xattr(_inode: &INode<Self>, _name: &CStr, _outbuf: &mut [u8]) -> Result<usize> {
> +        Err(EOPNOTSUPP)
> +    }
>  }
>  
>  /// The types of directory entries reported by [`FileSystem::read_dir`].
> @@ -418,6 +426,7 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>  
>              sb.0.s_magic = params.magic as _;
>              sb.0.s_op = &Tables::<T>::SUPER_BLOCK;
> +            sb.0.s_xattr = &Tables::<T>::XATTR_HANDLERS[0];
>              sb.0.s_maxbytes = params.maxbytes;
>              sb.0.s_time_gran = params.time_gran;
>              sb.0.s_blocksize_bits = params.blocksize_bits;
> @@ -487,6 +496,40 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>          shutdown: None,
>      };
>  
> +    const XATTR_HANDLERS: [*const bindings::xattr_handler; 2] = [&Self::XATTR_HANDLER, ptr::null()];
> +
> +    const XATTR_HANDLER: bindings::xattr_handler = bindings::xattr_handler {
> +        name: ptr::null(),
> +        prefix: crate::c_str!("").as_char_ptr(),
> +        flags: 0,
> +        list: None,
> +        get: Some(Self::xattr_get_callback),
> +        set: None,
> +    };
> +
> +    unsafe extern "C" fn xattr_get_callback(
> +        _handler: *const bindings::xattr_handler,
> +        _dentry: *mut bindings::dentry,
> +        inode_ptr: *mut bindings::inode,
> +        name: *const core::ffi::c_char,
> +        buffer: *mut core::ffi::c_void,
> +        size: usize,
> +    ) -> core::ffi::c_int {
> +        from_result(|| {
> +            // SAFETY: The C API guarantees that `inode_ptr` is a valid inode.
> +            let inode = unsafe { &*inode_ptr.cast::<INode<T>>() };
> +
> +            // SAFETY: The c API guarantees that `name` is a valid null-terminated string. It
> +            // also guarantees that it's valid for the duration of the callback.
> +            let name = unsafe { CStr::from_char_ptr(name) };
> +
> +            // SAFETY: The C API guarantees that `buffer` is at least `size` bytes in length.
> +            let buf = unsafe { core::slice::from_raw_parts_mut(buffer.cast(), size) };

I think this is not safe. from_raw_parts_mut's documentation says:
```
`data` must be non-null and aligned even for zero-length slices. One
reason for this is that enum layout optimizations may rely on references
(including slices of any length) being aligned and non-null to distinguish
them from other data. You can obtain a pointer that is usable as `data`
for zero-length slices using [`NonNull::dangling()`].
```

`vfs_getxattr_alloc` explicitly calls the `get` handler with `buffer` set
to NULL and `size` set to 0, in order to determine the required size for
the extended attributes:
```
error = handler->get(handler, dentry, inode, name, NULL, 0);
if (error < 0)
	return error;
```

So `buffer` is definitely NULL in the first call to the handler.

When `buffer` is NULL, the first argument to `from_raw_parts_mut` should
be `NonNull::dangling()`.

> +            let len = T::read_xattr(inode, name, buf)?;
> +            Ok(len.try_into()?)
> +        })
> +    }
> +
>      const DIR_FILE_OPERATIONS: bindings::file_operations = bindings::file_operations {
>          owner: ptr::null_mut(),
>          llseek: Some(bindings::generic_file_llseek),
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (18 preceding siblings ...)
  2023-10-18 12:25 ` [RFC PATCH 19/19] tarfs: introduce tar fs Wedson Almeida Filho
@ 2023-10-18 13:40 ` Ariel Miculas (amiculas)
  2023-10-18 17:12   ` Wedson Almeida Filho
  2023-10-29 20:31 ` Matthew Wilcox
  20 siblings, 1 reply; 125+ messages in thread
From: Ariel Miculas (amiculas) @ 2023-10-18 13:40 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	Wedson Almeida Filho

On 23/10/18 09:24AM, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> This series introduces Rust abstractions that allow page-cache-backed read-only
> file systems to be written in Rust.
> 
> There are two file systems that are built on top of these abstractions: tarfs
> and puzzlefs. The former has zero unsafe blocks and is included as a patch in
> this series; the latter is described elsewhere [1]. We limit the functionality
> to the bare minimum needed to implement them.
> 
> Rust file system modules can be declared with the `module_fs` macro and are
> required to implement the following functions (which are part of the
> `FileSystem` trait):
> 
> impl FileSystem for MyFS {
>     fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>>;
>     fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>>;
>     fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result;
>     fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>>;
>     fn read_folio(inode: &INode<Self>, folio: LockedFolio<'_>) -> Result;
> }
> 
> They can optionally implement the following:
> 
> fn read_xattr(inode: &INode<Self>, name: &CStr, outbuf: &mut [u8]) -> Result<usize>;
> fn statfs(sb: &SuperBlock<Self>) -> Result<Stat>;
> 
> They may also choose the type of the data they can attach to superblocks and/or
> inodes.
> 
> There a couple of issues that are likely to lead to unsoundness that have to do
> with the unregistration of file systems. I will send separate emails about
> them.
> 
> A git tree is available here:
>     git://github.com/wedsonaf/linux.git vfs
> 
> Web:
>     https://github.com/wedsonaf/linux/commits/vfs

I've checked out your branch and but it doesn't compile:
```
$ make LLVM=1 -j4
  DESCEND objtool
  CALL    scripts/checksyscalls.sh
make[4]: 'install_headers' is up to date.
  RUSTC L rust/kernel.o
error[E0425]: cannot find function `folio_alloc` in crate `bindings`
     --> rust/kernel/folio.rs:43:54
      |
43    |           let f = ptr::NonNull::new(unsafe { bindings::folio_alloc(bindings::GFP_KERNEL, order) })
      |                                                        ^^^^^^^^^^^ help: a function with a similar name exists: `__folio_alloc`
      |
     ::: /home/amiculas/work/linux/rust/bindings/bindings_generated.rs:17311:5
      |
17311 | /     pub fn __folio_alloc(
17312 | |         gfp: gfp_t,
17313 | |         order: core::ffi::c_uint,
17314 | |         preferred_nid: core::ffi::c_int,
17315 | |         nodemask: *mut nodemask_t,
17316 | |     ) -> *mut folio;
      | |___________________- similarly named function `__folio_alloc` defined here

error: aborting due to previous error

For more information about this error, try `rustc --explain E0425`.
make[2]: *** [rust/Makefile:460: rust/kernel.o] Error 1
make[1]: *** [/home/amiculas/work/linux/Makefile:1208: prepare] Error 2
make: *** [Makefile:234: __sub-make] Error 2
```

I'm missing `CONFIG_NUMA`, which seems to guard `folio_alloc`
(include/linux/gfp.h):
```
#ifdef CONFIG_NUMA
struct page *alloc_pages(gfp_t gfp, unsigned int order);
struct folio *folio_alloc(gfp_t gfp, unsigned order);
struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
		unsigned long addr, bool hugepage);
#else
```


> 
> [1]: The PuzzleFS container filesystem: https://lwn.net/Articles/945320/
> 
> Wedson Almeida Filho (19):
>   rust: fs: add registration/unregistration of file systems
>   rust: fs: introduce the `module_fs` macro
>   samples: rust: add initial ro file system sample
>   rust: fs: introduce `FileSystem::super_params`
>   rust: fs: introduce `INode<T>`
>   rust: fs: introduce `FileSystem::init_root`
>   rust: fs: introduce `FileSystem::read_dir`
>   rust: fs: introduce `FileSystem::lookup`
>   rust: folio: introduce basic support for folios
>   rust: fs: introduce `FileSystem::read_folio`
>   rust: fs: introduce `FileSystem::read_xattr`
>   rust: fs: introduce `FileSystem::statfs`
>   rust: fs: introduce more inode types
>   rust: fs: add per-superblock data
>   rust: fs: add basic support for fs buffer heads
>   rust: fs: allow file systems backed by a block device
>   rust: fs: allow per-inode data
>   rust: fs: export file type from mode constants
>   tarfs: introduce tar fs
> 
>  fs/Kconfig                        |    1 +
>  fs/Makefile                       |    1 +
>  fs/tarfs/Kconfig                  |   16 +
>  fs/tarfs/Makefile                 |    8 +
>  fs/tarfs/defs.rs                  |   80 ++
>  fs/tarfs/tar.rs                   |  322 +++++++
>  rust/bindings/bindings_helper.h   |   13 +
>  rust/bindings/lib.rs              |    6 +
>  rust/helpers.c                    |  142 ++++
>  rust/kernel/error.rs              |    6 +-
>  rust/kernel/folio.rs              |  214 +++++
>  rust/kernel/fs.rs                 | 1290 +++++++++++++++++++++++++++++
>  rust/kernel/fs/buffer.rs          |   60 ++
>  rust/kernel/lib.rs                |    2 +
>  rust/kernel/mem_cache.rs          |    2 -
>  samples/rust/Kconfig              |   10 +
>  samples/rust/Makefile             |    1 +
>  samples/rust/rust_rofs.rs         |  154 ++++
>  scripts/generate_rust_analyzer.py |    2 +-
>  19 files changed, 2324 insertions(+), 6 deletions(-)
>  create mode 100644 fs/tarfs/Kconfig
>  create mode 100644 fs/tarfs/Makefile
>  create mode 100644 fs/tarfs/defs.rs
>  create mode 100644 fs/tarfs/tar.rs
>  create mode 100644 rust/kernel/folio.rs
>  create mode 100644 rust/kernel/fs.rs
>  create mode 100644 rust/kernel/fs/buffer.rs
>  create mode 100644 samples/rust/rust_rofs.rs
> 
> 
> base-commit: b0bc357ef7a98904600826dea3de79c0c67eb0a7
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 01/19] rust: fs: add registration/unregistration of file systems
  2023-10-18 12:25 ` [RFC PATCH 01/19] rust: fs: add registration/unregistration of file systems Wedson Almeida Filho
@ 2023-10-18 15:38   ` Benno Lossin
  2024-01-10 18:32     ` Wedson Almeida Filho
  0 siblings, 1 reply; 125+ messages in thread
From: Benno Lossin @ 2023-10-18 15:38 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 18.10.23 14:25, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow basic registration and unregistration of Rust file system types.
> Unregistration happens automatically when a registration variable is
> dropped (e.g., when it goes out of scope).
> 
> File systems registered this way are visible in `/proc/filesystems` but
> cannot be mounted yet because `init_fs_context` fails.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>   rust/bindings/bindings_helper.h |  1 +
>   rust/kernel/error.rs            |  2 -
>   rust/kernel/fs.rs               | 80 +++++++++++++++++++++++++++++++++
>   rust/kernel/lib.rs              |  1 +
>   4 files changed, 82 insertions(+), 2 deletions(-)
>   create mode 100644 rust/kernel/fs.rs
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 3b620ae07021..9c23037b33d0 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -8,6 +8,7 @@
> 
>   #include <kunit/test.h>
>   #include <linux/errname.h>
> +#include <linux/fs.h>
>   #include <linux/slab.h>
>   #include <linux/refcount.h>
>   #include <linux/wait.h>
> diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
> index 05fcab6abfe6..e6d7ce46be55 100644
> --- a/rust/kernel/error.rs
> +++ b/rust/kernel/error.rs
> @@ -320,8 +320,6 @@ pub(crate) fn from_err_ptr<T>(ptr: *mut T) -> Result<*mut T> {
>   ///     })
>   /// }
>   /// ```
> -// TODO: Remove `dead_code` marker once an in-kernel client is available.
> -#[allow(dead_code)]
>   pub(crate) fn from_result<T, F>(f: F) -> T
>   where
>       T: From<i16>,
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> new file mode 100644
> index 000000000000..f3fb09db41ba
> --- /dev/null
> +++ b/rust/kernel/fs.rs
> @@ -0,0 +1,80 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Kernel file systems.
> +//!
> +//! This module allows Rust code to register new kernel file systems.
> +//!
> +//! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
> +
> +use crate::error::{code::*, from_result, to_result, Error};
> +use crate::types::Opaque;
> +use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
> +use core::{marker::PhantomPinned, pin::Pin};
> +use macros::{pin_data, pinned_drop};
> +
> +/// A file system type.
> +pub trait FileSystem {
> +    /// The name of the file system type.
> +    const NAME: &'static CStr;
> +}
> +
> +/// A registration of a file system.
> +#[pin_data(PinnedDrop)]
> +pub struct Registration {
> +    #[pin]
> +    fs: Opaque<bindings::file_system_type>,
> +    #[pin]
> +    _pin: PhantomPinned,

Note that since commit 0b4e3b6f6b79 ("rust: types: make `Opaque` be
`!Unpin`") you do not need an extra pinned `PhantomPinned` in your struct
(if you already have a pinned `Opaque`), since `Opaque` already is
`!Unpin`.

> +}
> +
> +// SAFETY: `Registration` doesn't provide any `&self` methods, so it is safe to pass references
> +// to it around.
> +unsafe impl Sync for Registration {}
> +
> +// SAFETY: Both registration and unregistration are implemented in C and safe to be performed
> +// from any thread, so `Registration` is `Send`.
> +unsafe impl Send for Registration {}
> +
> +impl Registration {
> +    /// Creates the initialiser of a new file system registration.
> +    pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<Self, Error> {

I am a bit curious why you specify `?Sized` here, is it common
for types that implement `FileSystem` to not be `Sized`?

Or do you want to use `dyn FileSystem`?

> +        try_pin_init!(Self {
> +            _pin: PhantomPinned,
> +            fs <- Opaque::try_ffi_init(|fs_ptr: *mut bindings::file_system_type| {
> +                // SAFETY: `try_ffi_init` guarantees that `fs_ptr` is valid for write.
> +                unsafe { fs_ptr.write(bindings::file_system_type::default()) };
> +
> +                // SAFETY: `try_ffi_init` guarantees that `fs_ptr` is valid for write, and it has
> +                // just been initialised above, so it's also valid for read.
> +                let fs = unsafe { &mut *fs_ptr };
> +                fs.owner = module.0;
> +                fs.name = T::NAME.as_char_ptr();
> +                fs.init_fs_context = Some(Self::init_fs_context_callback);
> +                fs.kill_sb = Some(Self::kill_sb_callback);
> +                fs.fs_flags = 0;
> +
> +                // SAFETY: Pointers stored in `fs` are static so will live for as long as the
> +                // registration is active (it is undone in `drop`).
> +                to_result(unsafe { bindings::register_filesystem(fs_ptr) })
> +            }),
> +        })
> +    }
> +
> +    unsafe extern "C" fn init_fs_context_callback(
> +        _fc_ptr: *mut bindings::fs_context,
> +    ) -> core::ffi::c_int {
> +        from_result(|| Err(ENOTSUPP))
> +    }
> +
> +    unsafe extern "C" fn kill_sb_callback(_sb_ptr: *mut bindings::super_block) {}
> +}
> +
> +#[pinned_drop]
> +impl PinnedDrop for Registration {
> +    fn drop(self: Pin<&mut Self>) {
> +        // SAFETY: If an instance of `Self` has been successfully created, a call to
> +        // `register_filesystem` has necessarily succeeded. So it's ok to call
> +        // `unregister_filesystem` on the previously registered fs.

I would simply add an invariant on `Registration` that `self.fs` is
registered, then you do not need such a lengthy explanation here.

-- 
Cheers,
Benno

> +        unsafe { bindings::unregister_filesystem(self.fs.get()) };
> +    }
> +}
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index 187d58f906a5..00059b80c240 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -34,6 +34,7 @@
>   mod allocator;
>   mod build_assert;
>   pub mod error;
> +pub mod fs;
>   pub mod init;
>   pub mod ioctl;
>   #[cfg(CONFIG_KUNIT)]
> --
> 2.34.1
> 
>

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params`
  2023-10-18 12:25 ` [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params` Wedson Almeida Filho
@ 2023-10-18 16:34   ` Benno Lossin
  2023-10-28 16:39     ` Alice Ryhl
  2023-10-20 15:04   ` Ariel Miculas (amiculas)
  2024-01-03 12:25   ` Andreas Hindborg (Samsung)
  2 siblings, 1 reply; 125+ messages in thread
From: Benno Lossin @ 2023-10-18 16:34 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 18.10.23 14:25, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to initialise superblocks, which allows them
> to be mounted (though they are still empty).
> 
> Some scaffolding code is added to create an empty directory as the root.
> It is replaced by proper inode creation in a subsequent patch in this
> series.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>   rust/bindings/bindings_helper.h |   5 +
>   rust/bindings/lib.rs            |   4 +
>   rust/kernel/fs.rs               | 176 ++++++++++++++++++++++++++++++--
>   samples/rust/rust_rofs.rs       |  10 ++
>   4 files changed, 189 insertions(+), 6 deletions(-)
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 9c23037b33d0..ca1898ce9527 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -9,6 +9,7 @@
>   #include <kunit/test.h>
>   #include <linux/errname.h>
>   #include <linux/fs.h>
> +#include <linux/fs_context.h>
>   #include <linux/slab.h>
>   #include <linux/refcount.h>
>   #include <linux/wait.h>
> @@ -22,3 +23,7 @@ const gfp_t BINDINGS___GFP_ZERO = __GFP_ZERO;
>   const slab_flags_t BINDINGS_SLAB_RECLAIM_ACCOUNT = SLAB_RECLAIM_ACCOUNT;
>   const slab_flags_t BINDINGS_SLAB_MEM_SPREAD = SLAB_MEM_SPREAD;
>   const slab_flags_t BINDINGS_SLAB_ACCOUNT = SLAB_ACCOUNT;
> +
> +const unsigned long BINDINGS_SB_RDONLY = SB_RDONLY;
> +
> +const loff_t BINDINGS_MAX_LFS_FILESIZE = MAX_LFS_FILESIZE;
> diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs
> index 6a8c6cd17e45..426915d3fb57 100644
> --- a/rust/bindings/lib.rs
> +++ b/rust/bindings/lib.rs
> @@ -55,3 +55,7 @@ mod bindings_helper {
>   pub const SLAB_RECLAIM_ACCOUNT: slab_flags_t = BINDINGS_SLAB_RECLAIM_ACCOUNT;
>   pub const SLAB_MEM_SPREAD: slab_flags_t = BINDINGS_SLAB_MEM_SPREAD;
>   pub const SLAB_ACCOUNT: slab_flags_t = BINDINGS_SLAB_ACCOUNT;
> +
> +pub const SB_RDONLY: core::ffi::c_ulong = BINDINGS_SB_RDONLY;
> +
> +pub const MAX_LFS_FILESIZE: loff_t = BINDINGS_MAX_LFS_FILESIZE;
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index 1df54c234101..31cf643aaded 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -6,16 +6,22 @@
>   //!
>   //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
> 
> -use crate::error::{code::*, from_result, to_result, Error};
> +use crate::error::{code::*, from_result, to_result, Error, Result};
>   use crate::types::Opaque;
>   use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
>   use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin};
>   use macros::{pin_data, pinned_drop};
> 
> +/// Maximum size of an inode.
> +pub const MAX_LFS_FILESIZE: i64 = bindings::MAX_LFS_FILESIZE;
> +
>   /// A file system type.
>   pub trait FileSystem {
>       /// The name of the file system type.
>       const NAME: &'static CStr;
> +
> +    /// Returns the parameters to initialise a super block.
> +    fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams>;
>   }
> 
>   /// A registration of a file system.
> @@ -49,7 +55,7 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>                   let fs = unsafe { &mut *fs_ptr };
>                   fs.owner = module.0;
>                   fs.name = T::NAME.as_char_ptr();
> -                fs.init_fs_context = Some(Self::init_fs_context_callback);
> +                fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
>                   fs.kill_sb = Some(Self::kill_sb_callback);
>                   fs.fs_flags = 0;
> 
> @@ -60,13 +66,22 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>           })
>       }
> 
> -    unsafe extern "C" fn init_fs_context_callback(
> -        _fc_ptr: *mut bindings::fs_context,
> +    unsafe extern "C" fn init_fs_context_callback<T: FileSystem + ?Sized>(
> +        fc_ptr: *mut bindings::fs_context,
>       ) -> core::ffi::c_int {
> -        from_result(|| Err(ENOTSUPP))
> +        from_result(|| {
> +            // SAFETY: The C callback API guarantees that `fc_ptr` is valid.
> +            let fc = unsafe { &mut *fc_ptr };

This safety comment is not enough, the pointer needs to be unique and
pointing to a valid value for this to be ok. I would recommend to do
this instead:

    unsafe { addr_of_mut!((*fc_ptr).ops).write(&Tables::<T>::CONTEXT) };

> +            fc.ops = &Tables::<T>::CONTEXT;
> +            Ok(0)
> +        })
>       }
> 
> -    unsafe extern "C" fn kill_sb_callback(_sb_ptr: *mut bindings::super_block) {}
> +    unsafe extern "C" fn kill_sb_callback(sb_ptr: *mut bindings::super_block) {
> +        // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
> +        // the appropriate function to call for cleanup.
> +        unsafe { bindings::kill_anon_super(sb_ptr) };
> +    }
>   }
> 
>   #[pinned_drop]
> @@ -79,6 +94,151 @@ fn drop(self: Pin<&mut Self>) {
>       }
>   }
> 
> +/// A file system super block.
> +///
> +/// Wraps the kernel's `struct super_block`.
> +#[repr(transparent)]
> +pub struct SuperBlock<T: FileSystem + ?Sized>(Opaque<bindings::super_block>, PhantomData<T>);
> +
> +/// Required superblock parameters.
> +///
> +/// This is returned by implementations of [`FileSystem::super_params`].
> +pub struct SuperParams {
> +    /// The magic number of the superblock.
> +    pub magic: u32,
> +
> +    /// The size of a block in powers of 2 (i.e., for a value of `n`, the size is `2^n`).
> +    pub blocksize_bits: u8,
> +
> +    /// Maximum size of a file.
> +    ///
> +    /// The maximum allowed value is [`MAX_LFS_FILESIZE`].
> +    pub maxbytes: i64,
> +
> +    /// Granularity of c/m/atime in ns (cannot be worse than a second).
> +    pub time_gran: u32,
> +}
> +
> +/// A superblock that is still being initialised.
> +///
> +/// # Invariants
> +///
> +/// The superblock is a newly-created one and this is the only active pointer to it.

This struct is not wrapping a pointer?

> +#[repr(transparent)]
> +pub struct NewSuperBlock<T: FileSystem + ?Sized>(bindings::super_block, PhantomData<T>);

No `Opaque`?

> +
> +struct Tables<T: FileSystem + ?Sized>(T);

Please add a newline here.

Also the field `self.0` is never actually used, should it be
`PhantomData<T>` instead?

> +impl<T: FileSystem + ?Sized> Tables<T> {
> +    const CONTEXT: bindings::fs_context_operations = bindings::fs_context_operations {
> +        free: None,
> +        parse_param: None,
> +        get_tree: Some(Self::get_tree_callback),
> +        reconfigure: None,
> +        parse_monolithic: None,
> +        dup: None,
> +    };
> +
> +    unsafe extern "C" fn get_tree_callback(fc: *mut bindings::fs_context) -> core::ffi::c_int {
> +        // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
> +        // the right type and is a valid callback.
> +        unsafe { bindings::get_tree_nodev(fc, Some(Self::fill_super_callback)) }
> +    }
> +
> +    unsafe extern "C" fn fill_super_callback(
> +        sb_ptr: *mut bindings::super_block,
> +        _fc: *mut bindings::fs_context,
> +    ) -> core::ffi::c_int {
> +        from_result(|| {
> +            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
> +            // newly-created superblock.
> +            let sb = unsafe { &mut *sb_ptr.cast() };

It would be helpful if you spelled out the `NewSuperBlock` type here
somewhere (e.g. on the `cast::<NewSuperBlock>`).

Is it really ok to create a mutable reference to a `bindings::super_block`?
Since it is not wrapped in `Opaque`, I would rather have you avoid this.

> +            let params = T::super_params(sb)?;
> +
> +            sb.0.s_magic = params.magic as _;
> +            sb.0.s_op = &Tables::<T>::SUPER_BLOCK;
> +            sb.0.s_maxbytes = params.maxbytes;
> +            sb.0.s_time_gran = params.time_gran;
> +            sb.0.s_blocksize_bits = params.blocksize_bits;
> +            sb.0.s_blocksize = 1;
> +            if sb.0.s_blocksize.leading_zeros() < params.blocksize_bits.into() {
> +                return Err(EINVAL);
> +            }

I think you could add a comment that explains what this `if` does.

> +            sb.0.s_blocksize = 1 << sb.0.s_blocksize_bits;
> +            sb.0.s_flags |= bindings::SB_RDONLY;
> +
> +            // The following is scaffolding code that will be removed in a subsequent patch. It is
> +            // needed to build a root dentry, otherwise core code will BUG().
> +            // SAFETY: `sb` is the superblock being initialised, it is valid for read and write.
> +            let inode = unsafe { bindings::new_inode(&mut sb.0) };
> +            if inode.is_null() {
> +                return Err(ENOMEM);
> +            }
> +
> +            // SAFETY: `inode` is valid for write.
> +            unsafe { bindings::set_nlink(inode, 2) };
> +
> +            {
> +                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
> +                // safe to mutably dereference it.
> +                let inode = unsafe { &mut *inode };

The inode also needs to be initialized and have valid values as its fields.
Not sure if this is kept and it would probably be better to keep using raw
pointers here.

--
Cheers,
Benno

> +                inode.i_ino = 1;
> +                inode.i_mode = (bindings::S_IFDIR | 0o755) as _;
> +
> +                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
> +                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
> +
> +                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
> +                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
> +            }
> +
> +            // SAFETY: `d_make_root` requires that `inode` be valid and referenced, which is the
> +            // case for this call.
> +            //
> +            // It takes over the inode, even on failure, so we don't need to clean it up.
> +            let dentry = unsafe { bindings::d_make_root(inode) };
> +            if dentry.is_null() {
> +                return Err(ENOMEM);
> +            }
> +
> +            sb.0.s_root = dentry;
> +
> +            Ok(0)
> +        })
> +    }
> +
> +    const SUPER_BLOCK: bindings::super_operations = bindings::super_operations {
> +        alloc_inode: None,
> +        destroy_inode: None,
> +        free_inode: None,
> +        dirty_inode: None,
> +        write_inode: None,
> +        drop_inode: None,
> +        evict_inode: None,
> +        put_super: None,
> +        sync_fs: None,
> +        freeze_super: None,
> +        freeze_fs: None,
> +        thaw_super: None,
> +        unfreeze_fs: None,
> +        statfs: None,
> +        remount_fs: None,
> +        umount_begin: None,
> +        show_options: None,
> +        show_devname: None,
> +        show_path: None,
> +        show_stats: None,
> +        #[cfg(CONFIG_QUOTA)]
> +        quota_read: None,
> +        #[cfg(CONFIG_QUOTA)]
> +        quota_write: None,
> +        #[cfg(CONFIG_QUOTA)]
> +        get_dquots: None,
> +        nr_cached_objects: None,
> +        free_cached_objects: None,
> +        shutdown: None,
> +    };
> +}
> +
>   /// Kernel module that exposes a single file system implemented by `T`.
>   #[pin_data]
>   pub struct Module<T: FileSystem + ?Sized> {
> @@ -105,6 +265,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>   ///
>   /// ```
>   /// # mod module_fs_sample {
> +/// use kernel::fs::{NewSuperBlock, SuperParams};
>   /// use kernel::prelude::*;
>   /// use kernel::{c_str, fs};
>   ///
> @@ -119,6 +280,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>   /// struct MyFs;
>   /// impl fs::FileSystem for MyFs {
>   ///     const NAME: &'static CStr = c_str!("myfs");
> +///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams> {
> +///         todo!()
> +///     }
>   /// }
>   /// # }
>   /// ```
> diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
> index 1c00b1da8b94..9878bf88b991 100644
> --- a/samples/rust/rust_rofs.rs
> +++ b/samples/rust/rust_rofs.rs
> @@ -2,6 +2,7 @@
> 
>   //! Rust read-only file system sample.
> 
> +use kernel::fs::{NewSuperBlock, SuperParams};
>   use kernel::prelude::*;
>   use kernel::{c_str, fs};
> 
> @@ -16,4 +17,13 @@
>   struct RoFs;
>   impl fs::FileSystem for RoFs {
>       const NAME: &'static CStr = c_str!("rust-fs");
> +
> +    fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams> {
> +        Ok(SuperParams {
> +            magic: 0x52555354,
> +            blocksize_bits: 12,
> +            maxbytes: fs::MAX_LFS_FILESIZE,
> +            time_gran: 1,
> +        })
> +    }
>   }
> --
> 2.34.1
> 
>

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 19/19] tarfs: introduce tar fs
  2023-10-18 12:25 ` [RFC PATCH 19/19] tarfs: introduce tar fs Wedson Almeida Filho
@ 2023-10-18 16:57   ` Matthew Wilcox
  2023-10-18 17:05     ` Wedson Almeida Filho
  2024-01-24  5:05   ` Matthew Wilcox
  1 sibling, 1 reply; 125+ messages in thread
From: Matthew Wilcox @ 2023-10-18 16:57 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Oct 18, 2023 at 09:25:18AM -0300, Wedson Almeida Filho wrote:
> +    fn read_folio(inode: &INode<Self>, mut folio: LockedFolio<'_>) -> Result {
> +        let pos = u64::try_from(folio.pos()).unwrap_or(u64::MAX);
> +        let size = u64::try_from(inode.size())?;
> +        let sb = inode.super_block();
> +
> +        let copied = if pos >= size {
> +            0
> +        } else {
> +            let offset = inode.data().offset.checked_add(pos).ok_or(ERANGE)?;
> +            let len = core::cmp::min(size - pos, folio.size().try_into()?);
> +            let mut foffset = 0;
> +
> +            if offset.checked_add(len).ok_or(ERANGE)? > sb.data().data_size {
> +                return Err(EIO);
> +            }
> +
> +            for v in sb.read(offset, len)? {
> +                let v = v?;
> +                folio.write(foffset, v.data())?;
> +                foffset += v.data().len();
> +            }
> +            foffset
> +        };
> +
> +        folio.zero_out(copied, folio.size() - copied)?;
> +        folio.mark_uptodate();
> +        folio.flush_dcache();
> +
> +        Ok(())
> +    }

Who unlocks the folio here?

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 19/19] tarfs: introduce tar fs
  2023-10-18 16:57   ` Matthew Wilcox
@ 2023-10-18 17:05     ` Wedson Almeida Filho
  2023-10-18 17:20       ` Matthew Wilcox
  0 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 17:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, 18 Oct 2023 at 13:57, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Oct 18, 2023 at 09:25:18AM -0300, Wedson Almeida Filho wrote:
> > +    fn read_folio(inode: &INode<Self>, mut folio: LockedFolio<'_>) -> Result {
> > +        let pos = u64::try_from(folio.pos()).unwrap_or(u64::MAX);
> > +        let size = u64::try_from(inode.size())?;
> > +        let sb = inode.super_block();
> > +
> > +        let copied = if pos >= size {
> > +            0
> > +        } else {
> > +            let offset = inode.data().offset.checked_add(pos).ok_or(ERANGE)?;
> > +            let len = core::cmp::min(size - pos, folio.size().try_into()?);
> > +            let mut foffset = 0;
> > +
> > +            if offset.checked_add(len).ok_or(ERANGE)? > sb.data().data_size {
> > +                return Err(EIO);
> > +            }
> > +
> > +            for v in sb.read(offset, len)? {
> > +                let v = v?;
> > +                folio.write(foffset, v.data())?;
> > +                foffset += v.data().len();
> > +            }
> > +            foffset
> > +        };
> > +
> > +        folio.zero_out(copied, folio.size() - copied)?;
> > +        folio.mark_uptodate();
> > +        folio.flush_dcache();
> > +
> > +        Ok(())
> > +    }
>
> Who unlocks the folio here?

The `Drop` implementation of `LockedFolio`.

Note that `read_folio` is given ownership of `folio` (the last
argument), so when it goes out of scope (or when it's explicitly
dropped) its `drop` function is called automatically. You'll its
implementation (and the call to `folio_unlock`) in patch 9.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2023-10-18 13:40 ` [RFC PATCH 00/19] Rust abstractions for VFS Ariel Miculas (amiculas)
@ 2023-10-18 17:12   ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 17:12 UTC (permalink / raw)
  To: Ariel Miculas (amiculas)
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	Wedson Almeida Filho

On Wed, 18 Oct 2023 at 10:40, Ariel Miculas (amiculas)
<amiculas@cisco.com> wrote:

> I'm missing `CONFIG_NUMA`, which seems to guard `folio_alloc`
> (include/linux/gfp.h):
> ```
> #ifdef CONFIG_NUMA
> struct page *alloc_pages(gfp_t gfp, unsigned int order);
> struct folio *folio_alloc(gfp_t gfp, unsigned order);
> struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
>                 unsigned long addr, bool hugepage);
> #else
> ```

Hey Ariel, thanks for finding this.

When CONFIG_NUMA is not defined, `folio_alloc` is a static inline
function defined in the header file, so bindgen doesn't generate a
binding for it. I'll fix this by adding a helper in rust/helpers.c.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-18 12:25 ` [RFC PATCH 09/19] rust: folio: introduce basic support for folios Wedson Almeida Filho
@ 2023-10-18 17:17   ` Matthew Wilcox
  2023-10-18 18:32     ` Wedson Almeida Filho
  2023-10-21  9:21   ` Benno Lossin
  1 sibling, 1 reply; 125+ messages in thread
From: Matthew Wilcox @ 2023-10-18 17:17 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Oct 18, 2023 at 09:25:08AM -0300, Wedson Almeida Filho wrote:
> +void *rust_helper_kmap(struct page *page)
> +{
> +	return kmap(page);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_kmap);
> +
> +void rust_helper_kunmap(struct page *page)
> +{
> +	kunmap(page);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_kunmap);

I'm not thrilled by exposing kmap()/kunmap() to Rust code.  The vast
majority of code really only needs kmap_local_*() / kunmap_local().
Can you elaborate on why you need the old kmap() in new Rust code?

> +void rust_helper_folio_set_error(struct folio *folio)
> +{
> +	folio_set_error(folio);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_folio_set_error);

I'm trying to get rid of the error flag.  Can you share the situations
in which you've needed the error flag?  Or is it just copying existing
practices?

> +    /// Returns the byte position of this folio in its file.
> +    pub fn pos(&self) -> i64 {
> +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> +        unsafe { bindings::folio_pos(self.0.get()) }
> +    }

I think it's a mistake to make file positions an i64.  I estimate 64
bits will not be enough by 2035-2040.  We should probably have a numeric
type which is i64 on 32-bit and isize on other CPUs (I also project
64-bit pointers will be insufficient by 2035-2040 and so we will have
128-bit pointers around the same time, so we're not going to need i128
file offsets with i64 pointers).

> +/// A [`Folio`] that has a single reference to it.
> +pub struct UniqueFolio(pub(crate) ARef<Folio>);

How do we know it only has a single reference?  Do you mean "has at
least one reference"?  Or am I confusing Rust's notion of a reference
with Linux's notion of a reference?

> +impl UniqueFolio {
> +    /// Maps the contents of a folio page into a slice.
> +    pub fn map_page(&self, page_index: usize) -> Result<MapGuard<'_>> {
> +        if page_index >= self.0.size() / bindings::PAGE_SIZE {
> +            return Err(EDOM);
> +        }
> +
> +        // SAFETY: We just checked that the index is within bounds of the folio.
> +        let page = unsafe { bindings::folio_page(self.0 .0.get(), page_index) };
> +
> +        // SAFETY: `page` is valid because it was returned by `folio_page` above.
> +        let ptr = unsafe { bindings::kmap(page) };

Surely this can be:

	   let ptr = unsafe { bindings::kmap_local_folio(folio, page_index * PAGE_SIZE) };

> +        // SAFETY: We just mapped `ptr`, so it's valid for read.
> +        let data = unsafe { core::slice::from_raw_parts(ptr.cast::<u8>(), bindings::PAGE_SIZE) };

Can we hide away the "if this isn't a HIGHMEM system, this maps to the
end of the folio, but if it is, it only maps to the end of the page"
problem here?


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 19/19] tarfs: introduce tar fs
  2023-10-18 17:05     ` Wedson Almeida Filho
@ 2023-10-18 17:20       ` Matthew Wilcox
  2023-10-18 18:07         ` Wedson Almeida Filho
  0 siblings, 1 reply; 125+ messages in thread
From: Matthew Wilcox @ 2023-10-18 17:20 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Oct 18, 2023 at 02:05:51PM -0300, Wedson Almeida Filho wrote:
> On Wed, 18 Oct 2023 at 13:57, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Oct 18, 2023 at 09:25:18AM -0300, Wedson Almeida Filho wrote:
> > > +    fn read_folio(inode: &INode<Self>, mut folio: LockedFolio<'_>) -> Result {
> > > +        let pos = u64::try_from(folio.pos()).unwrap_or(u64::MAX);
> > > +        let size = u64::try_from(inode.size())?;
> > > +        let sb = inode.super_block();
> > > +
> > > +        let copied = if pos >= size {
> > > +            0
> > > +        } else {
> > > +            let offset = inode.data().offset.checked_add(pos).ok_or(ERANGE)?;
> > > +            let len = core::cmp::min(size - pos, folio.size().try_into()?);
> > > +            let mut foffset = 0;
> > > +
> > > +            if offset.checked_add(len).ok_or(ERANGE)? > sb.data().data_size {
> > > +                return Err(EIO);
> > > +            }
> > > +
> > > +            for v in sb.read(offset, len)? {
> > > +                let v = v?;
> > > +                folio.write(foffset, v.data())?;
> > > +                foffset += v.data().len();
> > > +            }
> > > +            foffset
> > > +        };
> > > +
> > > +        folio.zero_out(copied, folio.size() - copied)?;
> > > +        folio.mark_uptodate();
> > > +        folio.flush_dcache();
> > > +
> > > +        Ok(())
> > > +    }
> >
> > Who unlocks the folio here?
> 
> The `Drop` implementation of `LockedFolio`.
> 
> Note that `read_folio` is given ownership of `folio` (the last
> argument), so when it goes out of scope (or when it's explicitly
> dropped) its `drop` function is called automatically. You'll its
> implementation (and the call to `folio_unlock`) in patch 9.

That works for synchronous implementations of read_folio(), but for
an asynchronous implementation, we need to unlock the folio once the
read completes, typically in the bio completion handler.  What's the
plan for that?  Hand ownership of the folio to the bio submission path,
which hands it to the bio completion path, which drops the folio?

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 19/19] tarfs: introduce tar fs
  2023-10-18 17:20       ` Matthew Wilcox
@ 2023-10-18 18:07         ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 18:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, 18 Oct 2023 at 14:20, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Oct 18, 2023 at 02:05:51PM -0300, Wedson Almeida Filho wrote:
> > On Wed, 18 Oct 2023 at 13:57, Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Wed, Oct 18, 2023 at 09:25:18AM -0300, Wedson Almeida Filho wrote:
> > > > +    fn read_folio(inode: &INode<Self>, mut folio: LockedFolio<'_>) -> Result {
> > > > +        let pos = u64::try_from(folio.pos()).unwrap_or(u64::MAX);
> > > > +        let size = u64::try_from(inode.size())?;
> > > > +        let sb = inode.super_block();
> > > > +
> > > > +        let copied = if pos >= size {
> > > > +            0
> > > > +        } else {
> > > > +            let offset = inode.data().offset.checked_add(pos).ok_or(ERANGE)?;
> > > > +            let len = core::cmp::min(size - pos, folio.size().try_into()?);
> > > > +            let mut foffset = 0;
> > > > +
> > > > +            if offset.checked_add(len).ok_or(ERANGE)? > sb.data().data_size {
> > > > +                return Err(EIO);
> > > > +            }
> > > > +
> > > > +            for v in sb.read(offset, len)? {
> > > > +                let v = v?;
> > > > +                folio.write(foffset, v.data())?;
> > > > +                foffset += v.data().len();
> > > > +            }
> > > > +            foffset
> > > > +        };
> > > > +
> > > > +        folio.zero_out(copied, folio.size() - copied)?;
> > > > +        folio.mark_uptodate();
> > > > +        folio.flush_dcache();
> > > > +
> > > > +        Ok(())
> > > > +    }
> > >
> > > Who unlocks the folio here?
> >
> > The `Drop` implementation of `LockedFolio`.
> >
> > Note that `read_folio` is given ownership of `folio` (the last
> > argument), so when it goes out of scope (or when it's explicitly
> > dropped) its `drop` function is called automatically. You'll its
> > implementation (and the call to `folio_unlock`) in patch 9.
>
> That works for synchronous implementations of read_folio(), but for
> an asynchronous implementation, we need to unlock the folio once the
> read completes, typically in the bio completion handler.  What's the
> plan for that?  Hand ownership of the folio to the bio submission path,
> which hands it to the bio completion path, which drops the folio?

Yes, exactly. (I mentioned this in a github comment a few days back:
https://github.com/Rust-for-Linux/linux/pull/1037#discussion_r1359706872.)

The code as is doesn't allow a `LockedFolio` to outlive this call but
we can add support as you described.

Part of the reason support for this isn't included in this series is
that are no Rust users of the async code path. And we've been told
repeatedly by Greg and others that we must not add code that isn't
used yet.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-18 17:17   ` Matthew Wilcox
@ 2023-10-18 18:32     ` Wedson Almeida Filho
  2023-10-18 19:21       ` Matthew Wilcox
  0 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-18 18:32 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, 18 Oct 2023 at 14:17, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Oct 18, 2023 at 09:25:08AM -0300, Wedson Almeida Filho wrote:
> > +void *rust_helper_kmap(struct page *page)
> > +{
> > +     return kmap(page);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_kmap);
> > +
> > +void rust_helper_kunmap(struct page *page)
> > +{
> > +     kunmap(page);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_kunmap);
>
> I'm not thrilled by exposing kmap()/kunmap() to Rust code.  The vast
> majority of code really only needs kmap_local_*() / kunmap_local().
> Can you elaborate on why you need the old kmap() in new Rust code?

The difficulty we have with kmap_local_*() has to do with the
requirement that maps and unmaps need to be nested neatly. For
example:

let a = folio1.map_local(...);
let b = folio2.map_local(...);
// Do something with `a` and `b`.
drop(a);
drop(b);

The code obviously violates the requirements.

One way to enforce the rule is Rust is to use closures, so the code
above would be:

folio1.map_local(..., |a| {
    folio2.map_local(..., |b| {
        // Do something with `a` and `b`.
    })
})

It isn't ergonomic the first option, but allows us to satisfy the
nesting requirement.

Any chance we can relax that requirement?

(If not, and we really want to get rid of the non-local function, we
can fall back to the closure-based implementation. In fact, you'll
find that in this patch I already do this for a private function that
used when writing into the folio, we could just make a version of it
public.)

> > +void rust_helper_folio_set_error(struct folio *folio)
> > +{
> > +     folio_set_error(folio);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_folio_set_error);
>
> I'm trying to get rid of the error flag.  Can you share the situations
> in which you've needed the error flag?  Or is it just copying existing
> practices?

I'm just mimicking C code. Happy to remove it.

> > +    /// Returns the byte position of this folio in its file.
> > +    pub fn pos(&self) -> i64 {
> > +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> > +        unsafe { bindings::folio_pos(self.0.get()) }
> > +    }
>
> I think it's a mistake to make file positions an i64.  I estimate 64
> bits will not be enough by 2035-2040.  We should probably have a numeric
> type which is i64 on 32-bit and isize on other CPUs (I also project
> 64-bit pointers will be insufficient by 2035-2040 and so we will have
> 128-bit pointers around the same time, so we're not going to need i128
> file offsets with i64 pointers).

I'm also just mimicking C here -- we just don't have a type that has
the properties you describe. I'm happy to switch once we have it, in
fact, Miguel has plans that I believe align well with what you want.
I'm not sure if he has already contacted you about it yet though.

> > +/// A [`Folio`] that has a single reference to it.
> > +pub struct UniqueFolio(pub(crate) ARef<Folio>);
>
> How do we know it only has a single reference?  Do you mean "has at
> least one reference"?  Or am I confusing Rust's notion of a reference
> with Linux's notion of a reference?

Instances of `UniqueFolio` are only produced by calls to
`folio_alloc`. They encode the fact that it's safe for us to map the
folio and know that there aren't any concurrent threads/CPUs doing the
same to the same folio.

Naturally, if you to increment the refcount on this folio and share it
with other threads/CPUs, it's no longer unique. So we don't allow it.

This is only used when using a synchronous bio to read blocks from a
block device while setting up a new superblock, in particular, to read
the superblock itself.

>
> > +impl UniqueFolio {
> > +    /// Maps the contents of a folio page into a slice.
> > +    pub fn map_page(&self, page_index: usize) -> Result<MapGuard<'_>> {
> > +        if page_index >= self.0.size() / bindings::PAGE_SIZE {
> > +            return Err(EDOM);
> > +        }
> > +
> > +        // SAFETY: We just checked that the index is within bounds of the folio.
> > +        let page = unsafe { bindings::folio_page(self.0 .0.get(), page_index) };
> > +
> > +        // SAFETY: `page` is valid because it was returned by `folio_page` above.
> > +        let ptr = unsafe { bindings::kmap(page) };
>
> Surely this can be:
>
>            let ptr = unsafe { bindings::kmap_local_folio(folio, page_index * PAGE_SIZE) };

The problem is the unmap path that can happen at arbitrary order in
Rust, see my comment above.

>
> > +        // SAFETY: We just mapped `ptr`, so it's valid for read.
> > +        let data = unsafe { core::slice::from_raw_parts(ptr.cast::<u8>(), bindings::PAGE_SIZE) };
>
> Can we hide away the "if this isn't a HIGHMEM system, this maps to the
> end of the folio, but if it is, it only maps to the end of the page"
> problem here?

Do you have ideas on how this might look like? (Don't worry about
Rust, just express it in some pseudo-C and we'll see if you can
express it in Rust.)

My approach here was to be conservative, since the common denominator
was "maps to the end of the page", that's what I have.

One possible way to do it with Rust would be an "Iterator" -- in
HIGHMEM it would return one item per page, otherwise it would return a
single item. (We have something similar for reading arbitrary ranges
of a block device, it's broken up into several chunks, so we return an
iterator.)

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-18 18:32     ` Wedson Almeida Filho
@ 2023-10-18 19:21       ` Matthew Wilcox
  2023-10-19 13:25         ` Wedson Almeida Filho
  0 siblings, 1 reply; 125+ messages in thread
From: Matthew Wilcox @ 2023-10-18 19:21 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Oct 18, 2023 at 03:32:36PM -0300, Wedson Almeida Filho wrote:
> On Wed, 18 Oct 2023 at 14:17, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Oct 18, 2023 at 09:25:08AM -0300, Wedson Almeida Filho wrote:
> > > +void *rust_helper_kmap(struct page *page)
> > > +{
> > > +     return kmap(page);
> > > +}
> > > +EXPORT_SYMBOL_GPL(rust_helper_kmap);
> > > +
> > > +void rust_helper_kunmap(struct page *page)
> > > +{
> > > +     kunmap(page);
> > > +}
> > > +EXPORT_SYMBOL_GPL(rust_helper_kunmap);
> >
> > I'm not thrilled by exposing kmap()/kunmap() to Rust code.  The vast
> > majority of code really only needs kmap_local_*() / kunmap_local().
> > Can you elaborate on why you need the old kmap() in new Rust code?
> 
> The difficulty we have with kmap_local_*() has to do with the
> requirement that maps and unmaps need to be nested neatly. For
> example:
> 
> let a = folio1.map_local(...);
> let b = folio2.map_local(...);
> // Do something with `a` and `b`.
> drop(a);
> drop(b);
> 
> The code obviously violates the requirements.

Is that the only problem, or are there situations where we might try
to do something like:

a = folio1.map.local()
b = folio2.map.local()
drop(a)
a = folio3.map.local()
drop(b)
b = folio4.map.local()
drop (a)
a = folio5.map.local()
...

> One way to enforce the rule is Rust is to use closures, so the code
> above would be:
> 
> folio1.map_local(..., |a| {
>     folio2.map_local(..., |b| {
>         // Do something with `a` and `b`.
>     })
> })
> 
> It isn't ergonomic the first option, but allows us to satisfy the
> nesting requirement.
> 
> Any chance we can relax that requirement?

It's possible.  Here's an untested patch that _only_ supports
"map a, map b, unmap a, unmap b".  If we need more, well, I guess
we can scan the entire array, both at map & unmap in order to
unmap pages.

diff --git a/mm/highmem.c b/mm/highmem.c
index e19269093a93..778a22ca1796 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -586,7 +586,7 @@ void kunmap_local_indexed(const void *vaddr)
 {
 	unsigned long addr = (unsigned long) vaddr & PAGE_MASK;
 	pte_t *kmap_pte;
-	int idx;
+	int idx, local_idx;
 
 	if (addr < __fix_to_virt(FIX_KMAP_END) ||
 	    addr > __fix_to_virt(FIX_KMAP_BEGIN)) {
@@ -607,15 +607,25 @@ void kunmap_local_indexed(const void *vaddr)
 	}
 
 	preempt_disable();
-	idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
+	local_idx = kmap_local_idx();
+	idx = arch_kmap_local_unmap_idx(local_idx, addr);
+	if (addr != __fix_to_virt(FIX_KMAP_BEGIN + idx) && local_idx > 0) {
+		idx--;
+		local_idx--;
+	}
 	WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
 
 	kmap_pte = kmap_get_pte(addr, idx);
 	arch_kmap_local_pre_unmap(addr);
 	pte_clear(&init_mm, addr, kmap_pte);
 	arch_kmap_local_post_unmap(addr);
-	current->kmap_ctrl.pteval[kmap_local_idx()] = __pte(0);
-	kmap_local_idx_pop();
+	current->kmap_ctrl.pteval[local_idx] = __pte(0);
+	if (local_idx == kmap_local_idx()) {
+		kmap_local_idx_pop();
+		if (local_idx > 0 &&
+		    pte_none(current->kmap_ctrl.pteval[local_idx - 1]))
+			kmap_local_idx_pop();
+	}
 	preempt_enable();
 	migrate_enable();
 }
@@ -648,7 +658,7 @@ void __kmap_local_sched_out(void)
 			WARN_ON_ONCE(pte_val(pteval) != 0);
 			continue;
 		}
-		if (WARN_ON_ONCE(pte_none(pteval)))
+		if (pte_none(pteval))
 			continue;
 
 		/*
@@ -685,7 +695,7 @@ void __kmap_local_sched_in(void)
 			WARN_ON_ONCE(pte_val(pteval) != 0);
 			continue;
 		}
-		if (WARN_ON_ONCE(pte_none(pteval)))
+		if (pte_none(pteval))
 			continue;
 
 		/* See comment in __kmap_local_sched_out() */

> > > +void rust_helper_folio_set_error(struct folio *folio)
> > > +{
> > > +     folio_set_error(folio);
> > > +}
> > > +EXPORT_SYMBOL_GPL(rust_helper_folio_set_error);
> >
> > I'm trying to get rid of the error flag.  Can you share the situations
> > in which you've needed the error flag?  Or is it just copying existing
> > practices?
> 
> I'm just mimicking C code. Happy to remove it.

Great, thanks!

> > > +    /// Returns the byte position of this folio in its file.
> > > +    pub fn pos(&self) -> i64 {
> > > +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> > > +        unsafe { bindings::folio_pos(self.0.get()) }
> > > +    }
> >
> > I think it's a mistake to make file positions an i64.  I estimate 64
> > bits will not be enough by 2035-2040.  We should probably have a numeric
> > type which is i64 on 32-bit and isize on other CPUs (I also project
> > 64-bit pointers will be insufficient by 2035-2040 and so we will have
> > 128-bit pointers around the same time, so we're not going to need i128
> > file offsets with i64 pointers).
> 
> I'm also just mimicking C here -- we just don't have a type that has
> the properties you describe. I'm happy to switch once we have it, in
> fact, Miguel has plans that I believe align well with what you want.
> I'm not sure if he has already contacted you about it yet though.

No, I haven't heard about plans for an off_t equivalent.  Perhaps you
could just do what the crates.io libc does?

https://docs.rs/libc/0.2.149/libc/type.off_t.html
pub type off_t = i64;

and then there's only one place to change to be i128 when the time comes.

> > > +/// A [`Folio`] that has a single reference to it.
> > > +pub struct UniqueFolio(pub(crate) ARef<Folio>);
> >
> > How do we know it only has a single reference?  Do you mean "has at
> > least one reference"?  Or am I confusing Rust's notion of a reference
> > with Linux's notion of a reference?
> 
> Instances of `UniqueFolio` are only produced by calls to
> `folio_alloc`. They encode the fact that it's safe for us to map the
> folio and know that there aren't any concurrent threads/CPUs doing the
> same to the same folio.

Mmm ... it's always safe to map a folio, even if other people have a
reference to it.  And Linux can make temporary spurious references to
folios appear, although those should be noticed by the other party and
released again before they access the contents of the folio.  So from
the point of view of being memory-safe, you can ignore them, but you
might see the refcount of the folio as >1, even if you just got the
folio back from the allocator.

> > > +impl UniqueFolio {
> > > +    /// Maps the contents of a folio page into a slice.
> > > +    pub fn map_page(&self, page_index: usize) -> Result<MapGuard<'_>> {
> > > +        if page_index >= self.0.size() / bindings::PAGE_SIZE {
> > > +            return Err(EDOM);
> > > +        }
> > > +
> > > +        // SAFETY: We just checked that the index is within bounds of the folio.
> > > +        let page = unsafe { bindings::folio_page(self.0 .0.get(), page_index) };
> > > +
> > > +        // SAFETY: `page` is valid because it was returned by `folio_page` above.
> > > +        let ptr = unsafe { bindings::kmap(page) };
> >
> > Surely this can be:
> >
> >            let ptr = unsafe { bindings::kmap_local_folio(folio, page_index * PAGE_SIZE) };
> 
> The problem is the unmap path that can happen at arbitrary order in
> Rust, see my comment above.
> 
> >
> > > +        // SAFETY: We just mapped `ptr`, so it's valid for read.
> > > +        let data = unsafe { core::slice::from_raw_parts(ptr.cast::<u8>(), bindings::PAGE_SIZE) };
> >
> > Can we hide away the "if this isn't a HIGHMEM system, this maps to the
> > end of the folio, but if it is, it only maps to the end of the page"
> > problem here?
> 
> Do you have ideas on how this might look like? (Don't worry about
> Rust, just express it in some pseudo-C and we'll see if you can
> express it in Rust.)

On systems without HIGHMEM, kmap() is a no-op.  So we could do something
like this:

	let data = unsafe { core::slice::from_raw_parts(ptr.cast::<u8>(),
		if (folio_test_highmem(folio))
			bindings::PAGE_SIZE
		else
			folio_size(folio) - page_idx * PAGE_SIZE) }

... modulo whatever the correct syntax is in Rust.

Something I forgot to mention was that I found it more useful to express
"map this chunk of a folio" in bytes rather than pages.  You might find
the same, in which case it's just folio.map(offset: usize) instead of
folio.map_page(page_index: usize)


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-18 19:21       ` Matthew Wilcox
@ 2023-10-19 13:25         ` Wedson Almeida Filho
  2023-10-20  4:11           ` Matthew Wilcox
  0 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-19 13:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, 18 Oct 2023 at 16:21, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Oct 18, 2023 at 03:32:36PM -0300, Wedson Almeida Filho wrote:
> > On Wed, 18 Oct 2023 at 14:17, Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Wed, Oct 18, 2023 at 09:25:08AM -0300, Wedson Almeida Filho wrote:
> > > > +void *rust_helper_kmap(struct page *page)
> > > > +{
> > > > +     return kmap(page);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(rust_helper_kmap);
> > > > +
> > > > +void rust_helper_kunmap(struct page *page)
> > > > +{
> > > > +     kunmap(page);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(rust_helper_kunmap);
> > >
> > > I'm not thrilled by exposing kmap()/kunmap() to Rust code.  The vast
> > > majority of code really only needs kmap_local_*() / kunmap_local().
> > > Can you elaborate on why you need the old kmap() in new Rust code?
> >
> > The difficulty we have with kmap_local_*() has to do with the
> > requirement that maps and unmaps need to be nested neatly. For
> > example:
> >
> > let a = folio1.map_local(...);
> > let b = folio2.map_local(...);
> > // Do something with `a` and `b`.
> > drop(a);
> > drop(b);
> >
> > The code obviously violates the requirements.
>
> Is that the only problem, or are there situations where we might try
> to do something like:
>
> a = folio1.map.local()
> b = folio2.map.local()
> drop(a)
> a = folio3.map.local()
> drop(b)
> b = folio4.map.local()
> drop (a)
> a = folio5.map.local()
> ...

This is also a problem. We don't control the order in which users are
going to unmap.

> > One way to enforce the rule is Rust is to use closures, so the code
> > above would be:
> >
> > folio1.map_local(..., |a| {
> >     folio2.map_local(..., |b| {
> >         // Do something with `a` and `b`.
> >     })
> > })
> >
> > It isn't ergonomic the first option, but allows us to satisfy the
> > nesting requirement.
> >
> > Any chance we can relax that requirement?
>
> It's possible.  Here's an untested patch that _only_ supports
> "map a, map b, unmap a, unmap b".  If we need more, well, I guess
> we can scan the entire array, both at map & unmap in order to
> unmap pages.

We need more.

If you don't want to scan the whole array, we could have a solution
where we add an indirection between the available indices and the
stack of allocations; this way C could continue to work as is and Rust
would have a slightly different API that returns both the mapped
address and an index (which would be used to unmap).

It's simple to remember the index in Rust and it wouldn't have to be
exposed to end users, they'd still just do:

let a = folio1.map_local(...);

And when `a` is dropped, it would call unmap and pass the index back.
(It's also safe in the sense that users would not be able to
accidentally pass the wrong index.)

But if scanning the whole array is acceptable performance-wise, it's
definitely a simpler solution.

> diff --git a/mm/highmem.c b/mm/highmem.c
> index e19269093a93..778a22ca1796 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -586,7 +586,7 @@ void kunmap_local_indexed(const void *vaddr)
>  {
>         unsigned long addr = (unsigned long) vaddr & PAGE_MASK;
>         pte_t *kmap_pte;
> -       int idx;
> +       int idx, local_idx;
>
>         if (addr < __fix_to_virt(FIX_KMAP_END) ||
>             addr > __fix_to_virt(FIX_KMAP_BEGIN)) {
> @@ -607,15 +607,25 @@ void kunmap_local_indexed(const void *vaddr)
>         }
>
>         preempt_disable();
> -       idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
> +       local_idx = kmap_local_idx();
> +       idx = arch_kmap_local_unmap_idx(local_idx, addr);
> +       if (addr != __fix_to_virt(FIX_KMAP_BEGIN + idx) && local_idx > 0) {
> +               idx--;
> +               local_idx--;
> +       }
>         WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
>
>         kmap_pte = kmap_get_pte(addr, idx);
>         arch_kmap_local_pre_unmap(addr);
>         pte_clear(&init_mm, addr, kmap_pte);
>         arch_kmap_local_post_unmap(addr);
> -       current->kmap_ctrl.pteval[kmap_local_idx()] = __pte(0);
> -       kmap_local_idx_pop();
> +       current->kmap_ctrl.pteval[local_idx] = __pte(0);
> +       if (local_idx == kmap_local_idx()) {
> +               kmap_local_idx_pop();
> +               if (local_idx > 0 &&
> +                   pte_none(current->kmap_ctrl.pteval[local_idx - 1]))
> +                       kmap_local_idx_pop();
> +       }
>         preempt_enable();
>         migrate_enable();
>  }
> @@ -648,7 +658,7 @@ void __kmap_local_sched_out(void)
>                         WARN_ON_ONCE(pte_val(pteval) != 0);
>                         continue;
>                 }
> -               if (WARN_ON_ONCE(pte_none(pteval)))
> +               if (pte_none(pteval))
>                         continue;
>
>                 /*
> @@ -685,7 +695,7 @@ void __kmap_local_sched_in(void)
>                         WARN_ON_ONCE(pte_val(pteval) != 0);
>                         continue;
>                 }
> -               if (WARN_ON_ONCE(pte_none(pteval)))
> +               if (pte_none(pteval))
>                         continue;
>
>                 /* See comment in __kmap_local_sched_out() */
>
> > > > +void rust_helper_folio_set_error(struct folio *folio)
> > > > +{
> > > > +     folio_set_error(folio);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(rust_helper_folio_set_error);
> > >
> > > I'm trying to get rid of the error flag.  Can you share the situations
> > > in which you've needed the error flag?  Or is it just copying existing
> > > practices?
> >
> > I'm just mimicking C code. Happy to remove it.
>
> Great, thanks!
>
> > > > +    /// Returns the byte position of this folio in its file.
> > > > +    pub fn pos(&self) -> i64 {
> > > > +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> > > > +        unsafe { bindings::folio_pos(self.0.get()) }
> > > > +    }
> > >
> > > I think it's a mistake to make file positions an i64.  I estimate 64
> > > bits will not be enough by 2035-2040.  We should probably have a numeric
> > > type which is i64 on 32-bit and isize on other CPUs (I also project
> > > 64-bit pointers will be insufficient by 2035-2040 and so we will have
> > > 128-bit pointers around the same time, so we're not going to need i128
> > > file offsets with i64 pointers).
> >
> > I'm also just mimicking C here -- we just don't have a type that has
> > the properties you describe. I'm happy to switch once we have it, in
> > fact, Miguel has plans that I believe align well with what you want.
> > I'm not sure if he has already contacted you about it yet though.
>
> No, I haven't heard about plans for an off_t equivalent.

He tells me he'll send a patch for that soon.

> Perhaps you
> could just do what the crates.io libc does?
>
> https://docs.rs/libc/0.2.149/libc/type.off_t.html
> pub type off_t = i64;
>
> and then there's only one place to change to be i128 when the time comes.

Yes, I'll do that for v2.

> > > > +/// A [`Folio`] that has a single reference to it.
> > > > +pub struct UniqueFolio(pub(crate) ARef<Folio>);
> > >
> > > How do we know it only has a single reference?  Do you mean "has at
> > > least one reference"?  Or am I confusing Rust's notion of a reference
> > > with Linux's notion of a reference?
> >
> > Instances of `UniqueFolio` are only produced by calls to
> > `folio_alloc`. They encode the fact that it's safe for us to map the
> > folio and know that there aren't any concurrent threads/CPUs doing the
> > same to the same folio.
>
> Mmm ... it's always safe to map a folio, even if other people have a
> reference to it.  And Linux can make temporary spurious references to
> folios appear, although those should be noticed by the other party and
> released again before they access the contents of the folio.  So from
> the point of view of being memory-safe, you can ignore them, but you
> might see the refcount of the folio as >1, even if you just got the
> folio back from the allocator.

Sure, it's safe to map a folio in general, but Rust has stricter rules
about aliasing and mutability that are part of how memory safety is
achieved. In particular, it requires that we never have mutable and
immutable pointers to the same memory at once (modulo interior
mutability).

So we need to avoid something like:

let a = folio.map(); // `a` is a shared pointer to the contents of the folio.

// While we have a shared (and therefore immutable) pointer, we're
changing the contents of the folio.
sb.sread(sector_number, sector_count, folio);

This violates Rust rules. `UniqueFolio` helps us address this for our
use case; if we try the above with a UniqueFolio, the compiler will
error out saying that  `a` has a shared reference to the folio, so we
can't call `sread` on it (because sread requires a mutable, and
therefore not shareable, reference to the folio).

(It's ok for the reference count to go up and down; it's unfortunate
that we use "reference" with two slightly different meanings, we
invariably get confused.)

> > > > +impl UniqueFolio {
> > > > +    /// Maps the contents of a folio page into a slice.
> > > > +    pub fn map_page(&self, page_index: usize) -> Result<MapGuard<'_>> {
> > > > +        if page_index >= self.0.size() / bindings::PAGE_SIZE {
> > > > +            return Err(EDOM);
> > > > +        }
> > > > +
> > > > +        // SAFETY: We just checked that the index is within bounds of the folio.
> > > > +        let page = unsafe { bindings::folio_page(self.0 .0.get(), page_index) };
> > > > +
> > > > +        // SAFETY: `page` is valid because it was returned by `folio_page` above.
> > > > +        let ptr = unsafe { bindings::kmap(page) };
> > >
> > > Surely this can be:
> > >
> > >            let ptr = unsafe { bindings::kmap_local_folio(folio, page_index * PAGE_SIZE) };
> >
> > The problem is the unmap path that can happen at arbitrary order in
> > Rust, see my comment above.
> >
> > >
> > > > +        // SAFETY: We just mapped `ptr`, so it's valid for read.
> > > > +        let data = unsafe { core::slice::from_raw_parts(ptr.cast::<u8>(), bindings::PAGE_SIZE) };
> > >
> > > Can we hide away the "if this isn't a HIGHMEM system, this maps to the
> > > end of the folio, but if it is, it only maps to the end of the page"
> > > problem here?
> >
> > Do you have ideas on how this might look like? (Don't worry about
> > Rust, just express it in some pseudo-C and we'll see if you can
> > express it in Rust.)
>
> On systems without HIGHMEM, kmap() is a no-op.  So we could do something
> like this:
>
>         let data = unsafe { core::slice::from_raw_parts(ptr.cast::<u8>(),
>                 if (folio_test_highmem(folio))
>                         bindings::PAGE_SIZE
>                 else
>                         folio_size(folio) - page_idx * PAGE_SIZE) }
>
> ... modulo whatever the correct syntax is in Rust.

We can certainly do that. But since there's the possibility that the
array will be capped at PAGE_SIZE in the HIGHMEM case, callers would
still need a loop to traverse the whole folio, right?

let mut offset = 0;
while offset < folio.size() {
    let a = folio.map(offset);
    // Do something with a.
    offset += a.len();
}

I guess the advantage is that we'd have a single iteration in systems
without HIGHMEM.

> Something I forgot to mention was that I found it more useful to express
> "map this chunk of a folio" in bytes rather than pages.  You might find
> the same, in which case it's just folio.map(offset: usize) instead of
> folio.map_page(page_index: usize)

Oh, thanks for the feedback. I'll switch to bytes then for v2.
(Already did in the example above.)

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 11/19] rust: fs: introduce `FileSystem::read_xattr`
  2023-10-18 13:06   ` Ariel Miculas (amiculas)
@ 2023-10-19 13:35     ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-19 13:35 UTC (permalink / raw)
  To: Ariel Miculas (amiculas)
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	Wedson Almeida Filho

On Wed, 18 Oct 2023 at 10:06, Ariel Miculas (amiculas)
<amiculas@cisco.com> wrote:

> I think this is not safe. from_raw_parts_mut's documentation says:
> ```
> `data` must be non-null and aligned even for zero-length slices. One
> reason for this is that enum layout optimizations may rely on references
> (including slices of any length) being aligned and non-null to distinguish
> them from other data. You can obtain a pointer that is usable as `data`
> for zero-length slices using [`NonNull::dangling()`].
> ```
>
> `vfs_getxattr_alloc` explicitly calls the `get` handler with `buffer` set
> to NULL and `size` set to 0, in order to determine the required size for
> the extended attributes:
> ```
> error = handler->get(handler, dentry, inode, name, NULL, 0);
> if (error < 0)
>         return error;
> ```
>
> So `buffer` is definitely NULL in the first call to the handler.
>
> When `buffer` is NULL, the first argument to `from_raw_parts_mut` should
> be `NonNull::dangling()`.

Good catch, thanks!

I'll fix this for v2.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-18 12:25 ` [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root` Wedson Almeida Filho
@ 2023-10-19 14:30   ` Benno Lossin
  2023-10-20  0:52     ` Boqun Feng
  2023-10-20  0:30   ` Boqun Feng
  2024-01-03 13:29   ` Andreas Hindborg (Samsung)
  2 siblings, 1 reply; 125+ messages in thread
From: Benno Lossin @ 2023-10-19 14:30 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 18.10.23 14:25, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to specify their root directory. Also allow them
> to create (and do cache lookups of) directory inodes. (More types of
> inodes are added in subsequent patches in the series.)
> 
> The `NewINode` type ensures that a new inode is properly initialised
> before it is marked so. It also facilitates error paths by automatically
> marking inodes as failed if they're not properly initialised.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>   rust/helpers.c            |  12 +++
>   rust/kernel/fs.rs         | 178 +++++++++++++++++++++++++++++++-------
>   samples/rust/rust_rofs.rs |  22 ++++-
>   3 files changed, 181 insertions(+), 31 deletions(-)
> 
> diff --git a/rust/helpers.c b/rust/helpers.c
> index fe45f8ddb31f..c5a2bec6467d 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -145,6 +145,18 @@ struct kunit *rust_helper_kunit_get_current_test(void)
>   }
>   EXPORT_SYMBOL_GPL(rust_helper_kunit_get_current_test);
> 
> +void rust_helper_i_uid_write(struct inode *inode, uid_t uid)
> +{
> +	i_uid_write(inode, uid);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_i_uid_write);
> +
> +void rust_helper_i_gid_write(struct inode *inode, gid_t gid)
> +{
> +	i_gid_write(inode, gid);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_i_gid_write);
> +
>   off_t rust_helper_i_size_read(const struct inode *inode)
>   {
>   	return i_size_read(inode);
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index 30fa1f312f33..f3a41cf57502 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -7,9 +7,9 @@
>   //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
> 
>   use crate::error::{code::*, from_result, to_result, Error, Result};
> -use crate::types::{AlwaysRefCounted, Opaque};
> -use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
> -use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin, ptr};
> +use crate::types::{ARef, AlwaysRefCounted, Either, Opaque};
> +use crate::{bindings, init::PinInit, str::CStr, time::Timespec, try_pin_init, ThisModule};
> +use core::{marker::PhantomData, marker::PhantomPinned, mem::ManuallyDrop, pin::Pin, ptr};
>   use macros::{pin_data, pinned_drop};
> 
>   /// Maximum size of an inode.
> @@ -22,6 +22,12 @@ pub trait FileSystem {
> 
>       /// Returns the parameters to initialise a super block.
>       fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams>;
> +
> +    /// Initialises and returns the root inode of the given superblock.
> +    ///
> +    /// This is called during initialisation of a superblock after [`FileSystem::super_params`] has
> +    /// completed successfully.
> +    fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>>;
>   }
> 
>   /// A registration of a file system.
> @@ -143,12 +149,136 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
>       }
>   }
> 
> +/// An inode that is locked and hasn't been initialised yet.
> +#[repr(transparent)]
> +pub struct NewINode<T: FileSystem + ?Sized>(ARef<INode<T>>);
> +
> +impl<T: FileSystem + ?Sized> NewINode<T> {
> +    /// Initialises the new inode with the given parameters.
> +    pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
> +        // SAFETY: This is a new inode, so it's safe to manipulate it mutably.

How do you know that this is a new inode? Maybe add a type invariant?

> +        let inode = unsafe { &mut *self.0 .0.get() };
> +
> +        let mode = match params.typ {
> +            INodeType::Dir => {
> +                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
> +                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
> +
> +                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
> +                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
> +                bindings::S_IFDIR
> +            }
> +        };
> +
> +        inode.i_mode = (params.mode & 0o777) | u16::try_from(mode)?;
> +        inode.i_size = params.size;
> +        inode.i_blocks = params.blocks;
> +
> +        inode.__i_ctime = params.ctime.into();
> +        inode.i_mtime = params.mtime.into();
> +        inode.i_atime = params.atime.into();
> +
> +        // SAFETY: inode is a new inode, so it is valid for write.
> +        unsafe {
> +            bindings::set_nlink(inode, params.nlink);
> +            bindings::i_uid_write(inode, params.uid);
> +            bindings::i_gid_write(inode, params.gid);
> +            bindings::unlock_new_inode(inode);
> +        }
> +
> +        // SAFETY: We are manually destructuring `self` and preventing `drop` from being called.
> +        Ok(unsafe { (&ManuallyDrop::new(self).0 as *const ARef<INode<T>>).read() })

Add a comment that explains why you need to do this instead of `self.0`.

> +    }
> +}
> +
> +impl<T: FileSystem + ?Sized> Drop for NewINode<T> {
> +    fn drop(&mut self) {
> +        // SAFETY: The new inode failed to be turned into an initialised inode, so it's safe (and
> +        // in fact required) to call `iget_failed` on it.
> +        unsafe { bindings::iget_failed(self.0 .0.get()) };
> +    }
> +}
> +
> +/// The type of the inode.
> +#[derive(Copy, Clone)]
> +pub enum INodeType {
> +    /// Directory type.
> +    Dir,
> +}
> +
> +/// Required inode parameters.
> +///
> +/// This is used when creating new inodes.
> +pub struct INodeParams {
> +    /// The access mode. It's a mask that grants execute (1), write (2) and read (4) access to
> +    /// everyone, the owner group, and the owner.
> +    pub mode: u16,
> +
> +    /// Type of inode.
> +    ///
> +    /// Also carries additional per-type data.
> +    pub typ: INodeType,
> +
> +    /// Size of the contents of the inode.
> +    ///
> +    /// Its maximum value is [`MAX_LFS_FILESIZE`].
> +    pub size: i64,
> +
> +    /// Number of blocks.
> +    pub blocks: u64,
> +
> +    /// Number of links to the inode.
> +    pub nlink: u32,
> +
> +    /// User id.
> +    pub uid: u32,
> +
> +    /// Group id.
> +    pub gid: u32,
> +
> +    /// Creation time.
> +    pub ctime: Timespec,
> +
> +    /// Last modification time.
> +    pub mtime: Timespec,
> +
> +    /// Last access time.
> +    pub atime: Timespec,
> +}
> +
>   /// A file system super block.
>   ///
>   /// Wraps the kernel's `struct super_block`.
>   #[repr(transparent)]
>   pub struct SuperBlock<T: FileSystem + ?Sized>(Opaque<bindings::super_block>, PhantomData<T>);
> 
> +impl<T: FileSystem + ?Sized> SuperBlock<T> {
> +    /// Tries to get an existing inode or create a new one if it doesn't exist yet.
> +    pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, NewINode<T>>> {
> +        // SAFETY: The only initialisation missing from the superblock is the root, and this
> +        // function is needed to create the root, so it's safe to call it.

This is a weird safety comment. Why is the superblock not fully
initialized? Why is safe to call the function? This comment doesn't
really explain anything.

> +        let inode =
> +            ptr::NonNull::new(unsafe { bindings::iget_locked(self.0.get(), ino) }).ok_or(ENOMEM)?;
> +
> +        // SAFETY: `inode` is valid for read, but there could be concurrent writers (e.g., if it's
> +        // an already-initialised inode), so we use `read_volatile` to read its current state.
> +        let state = unsafe { ptr::read_volatile(ptr::addr_of!((*inode.as_ptr()).i_state)) };

Are you sure that `read_volatile` is sufficient for this use case? The
documentation [1] clearly states that concurrent write operations are still
UB:

    Just like in C, whether an operation is volatile has no bearing
    whatsoever on questions involving concurrent access from multiple
    threads. Volatile accesses behave exactly like non-atomic accesses in
    that regard. In particular, a race between a read_volatile and any
    write operation to the same location is undefined behavior.

[1]: https://doc.rust-lang.org/core/ptr/fn.read_volatile.html

-- 
Cheers,
Benno

> +        if state & u64::from(bindings::I_NEW) == 0 {
> +            // The inode is cached. Just return it.
> +            //
> +            // SAFETY: `inode` had its refcount incremented by `iget_locked`; this increment is now
> +            // owned by `ARef`.
> +            Ok(Either::Left(unsafe { ARef::from_raw(inode.cast()) }))
> +        } else {
> +            // SAFETY: The new inode is valid but not fully initialised yet, so it's ok to create a
> +            // `NewINode`.
> +            Ok(Either::Right(NewINode(unsafe {
> +                ARef::from_raw(inode.cast())
> +            })))
> +        }
> +    }
> +}
> +
>   /// Required superblock parameters.
>   ///
>   /// This is returned by implementations of [`FileSystem::super_params`].
> @@ -215,41 +345,28 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>               sb.0.s_blocksize = 1 << sb.0.s_blocksize_bits;
>               sb.0.s_flags |= bindings::SB_RDONLY;
> 
> -            // The following is scaffolding code that will be removed in a subsequent patch. It is
> -            // needed to build a root dentry, otherwise core code will BUG().
> -            // SAFETY: `sb` is the superblock being initialised, it is valid for read and write.
> -            let inode = unsafe { bindings::new_inode(&mut sb.0) };
> -            if inode.is_null() {
> -                return Err(ENOMEM);
> -            }
> -
> -            // SAFETY: `inode` is valid for write.
> -            unsafe { bindings::set_nlink(inode, 2) };
> -
> -            {
> -                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
> -                // safe to mutably dereference it.
> -                let inode = unsafe { &mut *inode };
> -                inode.i_ino = 1;
> -                inode.i_mode = (bindings::S_IFDIR | 0o755) as _;
> -
> -                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
> -                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
> +            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
> +            // newly-created (and initialised above) superblock.
> +            let sb = unsafe { &mut *sb_ptr.cast() };
> +            let root = T::init_root(sb)?;
> 
> -                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
> -                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
> +            // Reject root inode if it belongs to a different superblock.
> +            if !ptr::eq(root.super_block(), sb) {
> +                return Err(EINVAL);
>               }
> 
>               // SAFETY: `d_make_root` requires that `inode` be valid and referenced, which is the
>               // case for this call.
>               //
>               // It takes over the inode, even on failure, so we don't need to clean it up.
> -            let dentry = unsafe { bindings::d_make_root(inode) };
> +            let dentry = unsafe { bindings::d_make_root(ManuallyDrop::new(root).0.get()) };
>               if dentry.is_null() {
>                   return Err(ENOMEM);
>               }
> 
> -            sb.0.s_root = dentry;
> +            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
> +            // newly-created (and initialised above) superblock.
> +            unsafe { (*sb_ptr).s_root = dentry };
> 
>               Ok(0)
>           })
> @@ -314,9 +431,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>   ///
>   /// ```
>   /// # mod module_fs_sample {
> -/// use kernel::fs::{NewSuperBlock, SuperParams};
> +/// use kernel::fs::{INode, NewSuperBlock, SuperBlock, SuperParams};
>   /// use kernel::prelude::*;
> -/// use kernel::{c_str, fs};
> +/// use kernel::{c_str, fs, types::ARef};
>   ///
>   /// kernel::module_fs! {
>   ///     type: MyFs,
> @@ -332,6 +449,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>   ///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams> {
>   ///         todo!()
>   ///     }
> +///     fn init_root(_sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
> +///         todo!()
> +///     }
>   /// }
>   /// # }
>   /// ```
> diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
> index 9878bf88b991..9e5f4c7d1c06 100644
> --- a/samples/rust/rust_rofs.rs
> +++ b/samples/rust/rust_rofs.rs
> @@ -2,9 +2,9 @@
> 
>   //! Rust read-only file system sample.
> 
> -use kernel::fs::{NewSuperBlock, SuperParams};
> +use kernel::fs::{INode, INodeParams, INodeType, NewSuperBlock, SuperBlock, SuperParams};
>   use kernel::prelude::*;
> -use kernel::{c_str, fs};
> +use kernel::{c_str, fs, time::UNIX_EPOCH, types::ARef, types::Either};
> 
>   kernel::module_fs! {
>       type: RoFs,
> @@ -26,4 +26,22 @@ fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams> {
>               time_gran: 1,
>           })
>       }
> +
> +    fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
> +        match sb.get_or_create_inode(1)? {
> +            Either::Left(existing) => Ok(existing),
> +            Either::Right(new) => new.init(INodeParams {
> +                typ: INodeType::Dir,
> +                mode: 0o555,
> +                size: 1,
> +                blocks: 1,
> +                nlink: 2,
> +                uid: 0,
> +                gid: 0,
> +                atime: UNIX_EPOCH,
> +                ctime: UNIX_EPOCH,
> +                mtime: UNIX_EPOCH,
> +            }),
> +        }
> +    }
>   }
> --
> 2.34.1
> 
>

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-18 12:25 ` [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root` Wedson Almeida Filho
  2023-10-19 14:30   ` Benno Lossin
@ 2023-10-20  0:30   ` Boqun Feng
  2023-10-23 12:36     ` Wedson Almeida Filho
  2024-01-03 13:29   ` Andreas Hindborg (Samsung)
  2 siblings, 1 reply; 125+ messages in thread
From: Boqun Feng @ 2023-10-20  0:30 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Wed, Oct 18, 2023 at 09:25:05AM -0300, Wedson Almeida Filho wrote:
[...]
> +/// An inode that is locked and hasn't been initialised yet.
> +#[repr(transparent)]
> +pub struct NewINode<T: FileSystem + ?Sized>(ARef<INode<T>>);
> +
> +impl<T: FileSystem + ?Sized> NewINode<T> {
> +    /// Initialises the new inode with the given parameters.
> +    pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
> +        // SAFETY: This is a new inode, so it's safe to manipulate it mutably.
> +        let inode = unsafe { &mut *self.0 .0.get() };
> +
> +        let mode = match params.typ {
> +            INodeType::Dir => {
> +                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
> +                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
> +
> +                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
> +                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
> +                bindings::S_IFDIR
> +            }
> +        };
> +
> +        inode.i_mode = (params.mode & 0o777) | u16::try_from(mode)?;
> +        inode.i_size = params.size;
> +        inode.i_blocks = params.blocks;
> +
> +        inode.__i_ctime = params.ctime.into();
> +        inode.i_mtime = params.mtime.into();
> +        inode.i_atime = params.atime.into();
> +
> +        // SAFETY: inode is a new inode, so it is valid for write.
> +        unsafe {
> +            bindings::set_nlink(inode, params.nlink);
> +            bindings::i_uid_write(inode, params.uid);
> +            bindings::i_gid_write(inode, params.gid);
> +            bindings::unlock_new_inode(inode);
> +        }
> +
> +        // SAFETY: We are manually destructuring `self` and preventing `drop` from being called.
> +        Ok(unsafe { (&ManuallyDrop::new(self).0 as *const ARef<INode<T>>).read() })

How do we feel about using transmute here? ;-) I.e.

	// SAFETY: `NewINode` is transparent to `ARef<INode<_>>`, and
	// the inode has been initialised, so it's safety to change the
	// object type.
	Ok(unsafe { core::mem::transmute(self) })

What we actually want here is changing the type of the object (i.e.
bitwise move from one type to another), seems to me that transmute is
the best fit here.

Thoughts?

Regards,
Boqun


> +    }
> +}
> +
> +impl<T: FileSystem + ?Sized> Drop for NewINode<T> {
> +    fn drop(&mut self) {
> +        // SAFETY: The new inode failed to be turned into an initialised inode, so it's safe (and
> +        // in fact required) to call `iget_failed` on it.
> +        unsafe { bindings::iget_failed(self.0 .0.get()) };
> +    }
> +}
> +
[...]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-19 14:30   ` Benno Lossin
@ 2023-10-20  0:52     ` Boqun Feng
  2023-10-21 13:48       ` Benno Lossin
  0 siblings, 1 reply; 125+ messages in thread
From: Boqun Feng @ 2023-10-20  0:52 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Wedson Almeida Filho, Alexander Viro, Christian Brauner,
	Matthew Wilcox, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho

On Thu, Oct 19, 2023 at 02:30:56PM +0000, Benno Lossin wrote:
[...]
> > +        let inode =
> > +            ptr::NonNull::new(unsafe { bindings::iget_locked(self.0.get(), ino) }).ok_or(ENOMEM)?;
> > +
> > +        // SAFETY: `inode` is valid for read, but there could be concurrent writers (e.g., if it's
> > +        // an already-initialised inode), so we use `read_volatile` to read its current state.
> > +        let state = unsafe { ptr::read_volatile(ptr::addr_of!((*inode.as_ptr()).i_state)) };
> 
> Are you sure that `read_volatile` is sufficient for this use case? The
> documentation [1] clearly states that concurrent write operations are still
> UB:
> 
>     Just like in C, whether an operation is volatile has no bearing
>     whatsoever on questions involving concurrent access from multiple
>     threads. Volatile accesses behave exactly like non-atomic accesses in
>     that regard. In particular, a race between a read_volatile and any
>     write operation to the same location is undefined behavior.
> 

Right, `read_volatile` can have data race. I think what we can do here
is:

	// SAFETY: `i_state` in `inode` is `unsigned long`, therefore
	// it's safe to treat it as `AtomicUsize` and do a relaxed read.
	let state = unsafe { *(ptr::addr_of!((*inode.as_ptr()).i_state).cast::<AtomicUsize>()).load(Relaxed) };

Regards,
Boqun

> [1]: https://doc.rust-lang.org/core/ptr/fn.read_volatile.html
> 
> -- 
> Cheers,
> Benno
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-19 13:25         ` Wedson Almeida Filho
@ 2023-10-20  4:11           ` Matthew Wilcox
  2023-10-20 15:17             ` Matthew Wilcox
                               ` (2 more replies)
  0 siblings, 3 replies; 125+ messages in thread
From: Matthew Wilcox @ 2023-10-20  4:11 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Thu, Oct 19, 2023 at 10:25:39AM -0300, Wedson Almeida Filho wrote:
> On Wed, 18 Oct 2023 at 16:21, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Oct 18, 2023 at 03:32:36PM -0300, Wedson Almeida Filho wrote:
> > > On Wed, 18 Oct 2023 at 14:17, Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Wed, Oct 18, 2023 at 09:25:08AM -0300, Wedson Almeida Filho wrote:
> > > > > +void *rust_helper_kmap(struct page *page)
> > > > > +{
> > > > > +     return kmap(page);
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(rust_helper_kmap);
> > > > > +
> > > > > +void rust_helper_kunmap(struct page *page)
> > > > > +{
> > > > > +     kunmap(page);
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(rust_helper_kunmap);
> > > >
> > > > I'm not thrilled by exposing kmap()/kunmap() to Rust code.  The vast
> > > > majority of code really only needs kmap_local_*() / kunmap_local().
> > > > Can you elaborate on why you need the old kmap() in new Rust code?
> > >
> > > The difficulty we have with kmap_local_*() has to do with the
> > > requirement that maps and unmaps need to be nested neatly. For
> > > example:
> > >
> > > let a = folio1.map_local(...);
> > > let b = folio2.map_local(...);
> > > // Do something with `a` and `b`.
> > > drop(a);
> > > drop(b);
> > >
> > > The code obviously violates the requirements.
> >
> > Is that the only problem, or are there situations where we might try
> > to do something like:
> >
> > a = folio1.map.local()
> > b = folio2.map.local()
> > drop(a)
> > a = folio3.map.local()
> > drop(b)
> > b = folio4.map.local()
> > drop (a)
> > a = folio5.map.local()
> > ...
> 
> This is also a problem. We don't control the order in which users are
> going to unmap.

OK.  I have something in the works, but it's not quite ready yet.

> If you don't want to scan the whole array, we could have a solution
> where we add an indirection between the available indices and the
> stack of allocations; this way C could continue to work as is and Rust
> would have a slightly different API that returns both the mapped
> address and an index (which would be used to unmap).
> 
> It's simple to remember the index in Rust and it wouldn't have to be
> exposed to end users, they'd still just do:
> 
> let a = folio1.map_local(...);
> 
> And when `a` is dropped, it would call unmap and pass the index back.
> (It's also safe in the sense that users would not be able to
> accidentally pass the wrong index.)
> 
> But if scanning the whole array is acceptable performance-wise, it's
> definitely a simpler solution.

Interesting idea.  There are some other possibilities too ... let's see.

> > > > > +/// A [`Folio`] that has a single reference to it.
> > > > > +pub struct UniqueFolio(pub(crate) ARef<Folio>);
> > > >
> > > > How do we know it only has a single reference?  Do you mean "has at
> > > > least one reference"?  Or am I confusing Rust's notion of a reference
> > > > with Linux's notion of a reference?
> > >
> > > Instances of `UniqueFolio` are only produced by calls to
> > > `folio_alloc`. They encode the fact that it's safe for us to map the
> > > folio and know that there aren't any concurrent threads/CPUs doing the
> > > same to the same folio.
> >
> > Mmm ... it's always safe to map a folio, even if other people have a
> > reference to it.  And Linux can make temporary spurious references to
> > folios appear, although those should be noticed by the other party and
> > released again before they access the contents of the folio.  So from
> > the point of view of being memory-safe, you can ignore them, but you
> > might see the refcount of the folio as >1, even if you just got the
> > folio back from the allocator.
> 
> Sure, it's safe to map a folio in general, but Rust has stricter rules
> about aliasing and mutability that are part of how memory safety is
> achieved. In particular, it requires that we never have mutable and
> immutable pointers to the same memory at once (modulo interior
> mutability).
> 
> So we need to avoid something like:
> 
> let a = folio.map(); // `a` is a shared pointer to the contents of the folio.
> 
> // While we have a shared (and therefore immutable) pointer, we're
> changing the contents of the folio.
> sb.sread(sector_number, sector_count, folio);
> 
> This violates Rust rules. `UniqueFolio` helps us address this for our
> use case; if we try the above with a UniqueFolio, the compiler will
> error out saying that  `a` has a shared reference to the folio, so we
> can't call `sread` on it (because sread requires a mutable, and
> therefore not shareable, reference to the folio).

This is going to be quite the impedance mismatch.  Still, I imagine
you're used to dealing with those by now and have a toolbox of ideas.

We don't have that rule for the pagecache as it is.  We do have rules that
prevent data corruption!  For example, if the folio is !uptodate then you
must have the lock to write to the folio in order to bring it uptodate
(so we have a single writer rule in that regard).  But once the folio is
uptodate, all bets are off in terms of who can be writing to it / reading
it at the same time.  And that's going to have to continue to be true;
multiple processes can have the same page mmaped writable and write to
it at the same time.  There's no possible synchronisation between them.

But I think your concern is really more limited.  You're concerned
with filesystem metadata obeying Rust's rules.  And for a read-write
filesystem, you're going to have to have ... something ... which gets a
folio from the page cache, and establishes that this is the only thread
which can modify that folio (maybe it's an interior node of a Btree,
maybe it's a free space bitmap, ...).  We could probably use the folio
lock bit for that purpose,  For the read-only filesystems, you only need
be concerned about freshly-allocated folios, but you need something more
when it comes to doing an ext2 clone.

There's general concern about the overuse of the folio lock bit, but
this is a reasonable use -- preventing two threads from modifying the
same folio at the same time.

(I have simplified all this; both iomap and buffer heads support folios
which are partially uptodate, but conceptually this is accurate)

> > On systems without HIGHMEM, kmap() is a no-op.  So we could do something
> > like this:
> >
> >         let data = unsafe { core::slice::from_raw_parts(ptr.cast::<u8>(),
> >                 if (folio_test_highmem(folio))
> >                         bindings::PAGE_SIZE
> >                 else
> >                         folio_size(folio) - page_idx * PAGE_SIZE) }
> >
> > ... modulo whatever the correct syntax is in Rust.
> 
> We can certainly do that. But since there's the possibility that the
> array will be capped at PAGE_SIZE in the HIGHMEM case, callers would
> still need a loop to traverse the whole folio, right?
> 
> let mut offset = 0;
> while offset < folio.size() {
>     let a = folio.map(offset);
>     // Do something with a.
>     offset += a.len();
> }
> 
> I guess the advantage is that we'd have a single iteration in systems
> without HIGHMEM.

Right.  You can see something similar to that in memcpy_from_folio() in
highmem.h.

> > Something I forgot to mention was that I found it more useful to express
> > "map this chunk of a folio" in bytes rather than pages.  You might find
> > the same, in which case it's just folio.map(offset: usize) instead of
> > folio.map_page(page_index: usize)
> 
> Oh, thanks for the feedback. I'll switch to bytes then for v2.
> (Already did in the example above.)

Great!  Something else I think would be a good idea is open-coding some
of the trivial accessors.  eg instead of doing:

+size_t rust_helper_folio_size(struct folio *folio)
+{
+	return folio_size(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_size);
[...]
+    pub fn size(&self) -> usize {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
+        unsafe { bindings::folio_size(self.0.get()) }
+    }

add:

impl Folio {
...
    pub fn order(&self) -> u8 {
	if (self.flags & (1 << PG_head))
	    self._flags_1 & 0xff
	else
	    0
    }

    pub fn size(&self) -> usize {
	bindings::PAGE_SIZE << self.order()
    }
}

... or have I misunderstood what is possible here?  My hope is that the
compiler gets to "see through" the abstraction, which surely can't be
done when there's a function call.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params`
  2023-10-18 12:25 ` [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params` Wedson Almeida Filho
  2023-10-18 16:34   ` Benno Lossin
@ 2023-10-20 15:04   ` Ariel Miculas (amiculas)
  2024-01-03 12:25   ` Andreas Hindborg (Samsung)
  2 siblings, 0 replies; 125+ messages in thread
From: Ariel Miculas (amiculas) @ 2023-10-20 15:04 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	Wedson Almeida Filho

On 23/10/18 09:25AM, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to initialise superblocks, which allows them
> to be mounted (though they are still empty).
> 
> Some scaffolding code is added to create an empty directory as the root.
> It is replaced by proper inode creation in a subsequent patch in this
> series.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>  rust/bindings/bindings_helper.h |   5 +
>  rust/bindings/lib.rs            |   4 +
>  rust/kernel/fs.rs               | 176 ++++++++++++++++++++++++++++++--
>  samples/rust/rust_rofs.rs       |  10 ++
>  4 files changed, 189 insertions(+), 6 deletions(-)
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 9c23037b33d0..ca1898ce9527 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -9,6 +9,7 @@
>  #include <kunit/test.h>
>  #include <linux/errname.h>
>  #include <linux/fs.h>
> +#include <linux/fs_context.h>
>  #include <linux/slab.h>
>  #include <linux/refcount.h>
>  #include <linux/wait.h>
> @@ -22,3 +23,7 @@ const gfp_t BINDINGS___GFP_ZERO = __GFP_ZERO;
>  const slab_flags_t BINDINGS_SLAB_RECLAIM_ACCOUNT = SLAB_RECLAIM_ACCOUNT;
>  const slab_flags_t BINDINGS_SLAB_MEM_SPREAD = SLAB_MEM_SPREAD;
>  const slab_flags_t BINDINGS_SLAB_ACCOUNT = SLAB_ACCOUNT;
> +
> +const unsigned long BINDINGS_SB_RDONLY = SB_RDONLY;
> +
> +const loff_t BINDINGS_MAX_LFS_FILESIZE = MAX_LFS_FILESIZE;
> diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs
> index 6a8c6cd17e45..426915d3fb57 100644
> --- a/rust/bindings/lib.rs
> +++ b/rust/bindings/lib.rs
> @@ -55,3 +55,7 @@ mod bindings_helper {
>  pub const SLAB_RECLAIM_ACCOUNT: slab_flags_t = BINDINGS_SLAB_RECLAIM_ACCOUNT;
>  pub const SLAB_MEM_SPREAD: slab_flags_t = BINDINGS_SLAB_MEM_SPREAD;
>  pub const SLAB_ACCOUNT: slab_flags_t = BINDINGS_SLAB_ACCOUNT;
> +
> +pub const SB_RDONLY: core::ffi::c_ulong = BINDINGS_SB_RDONLY;
> +
> +pub const MAX_LFS_FILESIZE: loff_t = BINDINGS_MAX_LFS_FILESIZE;
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index 1df54c234101..31cf643aaded 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -6,16 +6,22 @@
>  //!
>  //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
>  
> -use crate::error::{code::*, from_result, to_result, Error};
> +use crate::error::{code::*, from_result, to_result, Error, Result};
>  use crate::types::Opaque;
>  use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
>  use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin};
>  use macros::{pin_data, pinned_drop};
>  
> +/// Maximum size of an inode.
> +pub const MAX_LFS_FILESIZE: i64 = bindings::MAX_LFS_FILESIZE;
> +
>  /// A file system type.
>  pub trait FileSystem {
>      /// The name of the file system type.
>      const NAME: &'static CStr;
> +
> +    /// Returns the parameters to initialise a super block.
> +    fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams>;
>  }
>  
>  /// A registration of a file system.
> @@ -49,7 +55,7 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>                  let fs = unsafe { &mut *fs_ptr };
>                  fs.owner = module.0;
>                  fs.name = T::NAME.as_char_ptr();
> -                fs.init_fs_context = Some(Self::init_fs_context_callback);
> +                fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
>                  fs.kill_sb = Some(Self::kill_sb_callback);
>                  fs.fs_flags = 0;
>  
> @@ -60,13 +66,22 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>          })
>      }
>  
> -    unsafe extern "C" fn init_fs_context_callback(
> -        _fc_ptr: *mut bindings::fs_context,
> +    unsafe extern "C" fn init_fs_context_callback<T: FileSystem + ?Sized>(
> +        fc_ptr: *mut bindings::fs_context,
>      ) -> core::ffi::c_int {
> -        from_result(|| Err(ENOTSUPP))
> +        from_result(|| {
> +            // SAFETY: The C callback API guarantees that `fc_ptr` is valid.
> +            let fc = unsafe { &mut *fc_ptr };
> +            fc.ops = &Tables::<T>::CONTEXT;
> +            Ok(0)
> +        })
>      }
>  
> -    unsafe extern "C" fn kill_sb_callback(_sb_ptr: *mut bindings::super_block) {}
> +    unsafe extern "C" fn kill_sb_callback(sb_ptr: *mut bindings::super_block) {
> +        // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
> +        // the appropriate function to call for cleanup.
> +        unsafe { bindings::kill_anon_super(sb_ptr) };
> +    }
>  }
>  
>  #[pinned_drop]
> @@ -79,6 +94,151 @@ fn drop(self: Pin<&mut Self>) {
>      }
>  }
>  
> +/// A file system super block.
> +///
> +/// Wraps the kernel's `struct super_block`.
> +#[repr(transparent)]
> +pub struct SuperBlock<T: FileSystem + ?Sized>(Opaque<bindings::super_block>, PhantomData<T>);
> +
> +/// Required superblock parameters.
> +///
> +/// This is returned by implementations of [`FileSystem::super_params`].
> +pub struct SuperParams {
> +    /// The magic number of the superblock.
> +    pub magic: u32,
> +
> +    /// The size of a block in powers of 2 (i.e., for a value of `n`, the size is `2^n`).
> +    pub blocksize_bits: u8,
> +
> +    /// Maximum size of a file.
> +    ///
> +    /// The maximum allowed value is [`MAX_LFS_FILESIZE`].
> +    pub maxbytes: i64,
> +
> +    /// Granularity of c/m/atime in ns (cannot be worse than a second).
> +    pub time_gran: u32,
> +}
> +
> +/// A superblock that is still being initialised.
> +///
> +/// # Invariants
> +///
> +/// The superblock is a newly-created one and this is the only active pointer to it.
> +#[repr(transparent)]
> +pub struct NewSuperBlock<T: FileSystem + ?Sized>(bindings::super_block, PhantomData<T>);

How about using the state type parameter [1] instead of using a separate
struct for each state? I think Andreas Hindborg mentioned this during
Kangrejos [2].

The gist of it is that you define a trait and implement it for the two
states of the superblock: NewSuperBlockState and
InitializedSuperblockState:
```
pub trait SuperBlockState {}
/// A superblock that is still being initialised.
pub enum NewSuperBlockState {}

/// An initialized superblock
pub enum InitializedSuperBlockState {}

impl SuperBlockState for NewSuperBlockState {}
impl SuperBlockState for InitializedSuperBlockState {}
```

Then add another generic parameter (the state) to the SuperBlock:
```
#[repr(transparent)]
pub struct SuperBlock<T: FileSystem + ?Sized, S: SuperBlockState>(Opaque<bindings::super_block>, PhantomData<T>, PhantomData<S>);
```

Now you implement the functions separately on each variant of the
generic instead of implementing them on separate structs:
```
impl<T: FileSystem + ?Sized> SuperBlock<T, NewSuperBlockState> {
...
impl<T: FileSystem + ?Sized> SuperBlock<T, InitializedSuperBlockState> {
...
```

I think this pattern makes it clearer that there's only one SuperBlock
object which can be in different states, and it more clearly conveys
that the Typestate pattern is being used (we could find shorter names
for the states).

See [3] for the complete example.

Cheers,
Ariel

[1] https://cliffle.com/blog/rust-typestate/#variation-state-type-parameter
[2] https://kangrejos.com/
[3] https://github.com/ariel-miculas/linux/commit/655607228ff4ac9e56295ddd74fff8910dfbef14#diff-9b893393ed2a537222d79f6e2fceffb7e9d8967791c2016962be3171c446210f
> +
> +struct Tables<T: FileSystem + ?Sized>(T);
> +impl<T: FileSystem + ?Sized> Tables<T> {
> +    const CONTEXT: bindings::fs_context_operations = bindings::fs_context_operations {
> +        free: None,
> +        parse_param: None,
> +        get_tree: Some(Self::get_tree_callback),
> +        reconfigure: None,
> +        parse_monolithic: None,
> +        dup: None,
> +    };
> +
> +    unsafe extern "C" fn get_tree_callback(fc: *mut bindings::fs_context) -> core::ffi::c_int {
> +        // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
> +        // the right type and is a valid callback.
> +        unsafe { bindings::get_tree_nodev(fc, Some(Self::fill_super_callback)) }
> +    }
> +
> +    unsafe extern "C" fn fill_super_callback(
> +        sb_ptr: *mut bindings::super_block,
> +        _fc: *mut bindings::fs_context,
> +    ) -> core::ffi::c_int {
> +        from_result(|| {
> +            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
> +            // newly-created superblock.
> +            let sb = unsafe { &mut *sb_ptr.cast() };
> +            let params = T::super_params(sb)?;
> +
> +            sb.0.s_magic = params.magic as _;
> +            sb.0.s_op = &Tables::<T>::SUPER_BLOCK;
> +            sb.0.s_maxbytes = params.maxbytes;
> +            sb.0.s_time_gran = params.time_gran;
> +            sb.0.s_blocksize_bits = params.blocksize_bits;
> +            sb.0.s_blocksize = 1;
> +            if sb.0.s_blocksize.leading_zeros() < params.blocksize_bits.into() {
> +                return Err(EINVAL);
> +            }
> +            sb.0.s_blocksize = 1 << sb.0.s_blocksize_bits;
> +            sb.0.s_flags |= bindings::SB_RDONLY;
> +
> +            // The following is scaffolding code that will be removed in a subsequent patch. It is
> +            // needed to build a root dentry, otherwise core code will BUG().
> +            // SAFETY: `sb` is the superblock being initialised, it is valid for read and write.
> +            let inode = unsafe { bindings::new_inode(&mut sb.0) };
> +            if inode.is_null() {
> +                return Err(ENOMEM);
> +            }
> +
> +            // SAFETY: `inode` is valid for write.
> +            unsafe { bindings::set_nlink(inode, 2) };
> +
> +            {
> +                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
> +                // safe to mutably dereference it.
> +                let inode = unsafe { &mut *inode };
> +                inode.i_ino = 1;
> +                inode.i_mode = (bindings::S_IFDIR | 0o755) as _;
> +
> +                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
> +                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
> +
> +                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
> +                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
> +            }
> +
> +            // SAFETY: `d_make_root` requires that `inode` be valid and referenced, which is the
> +            // case for this call.
> +            //
> +            // It takes over the inode, even on failure, so we don't need to clean it up.
> +            let dentry = unsafe { bindings::d_make_root(inode) };
> +            if dentry.is_null() {
> +                return Err(ENOMEM);
> +            }
> +
> +            sb.0.s_root = dentry;
> +
> +            Ok(0)
> +        })
> +    }
> +
> +    const SUPER_BLOCK: bindings::super_operations = bindings::super_operations {
> +        alloc_inode: None,
> +        destroy_inode: None,
> +        free_inode: None,
> +        dirty_inode: None,
> +        write_inode: None,
> +        drop_inode: None,
> +        evict_inode: None,
> +        put_super: None,
> +        sync_fs: None,
> +        freeze_super: None,
> +        freeze_fs: None,
> +        thaw_super: None,
> +        unfreeze_fs: None,
> +        statfs: None,
> +        remount_fs: None,
> +        umount_begin: None,
> +        show_options: None,
> +        show_devname: None,
> +        show_path: None,
> +        show_stats: None,
> +        #[cfg(CONFIG_QUOTA)]
> +        quota_read: None,
> +        #[cfg(CONFIG_QUOTA)]
> +        quota_write: None,
> +        #[cfg(CONFIG_QUOTA)]
> +        get_dquots: None,
> +        nr_cached_objects: None,
> +        free_cached_objects: None,
> +        shutdown: None,
> +    };
> +}
> +
>  /// Kernel module that exposes a single file system implemented by `T`.
>  #[pin_data]
>  pub struct Module<T: FileSystem + ?Sized> {
> @@ -105,6 +265,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>  ///
>  /// ```
>  /// # mod module_fs_sample {
> +/// use kernel::fs::{NewSuperBlock, SuperParams};
>  /// use kernel::prelude::*;
>  /// use kernel::{c_str, fs};
>  ///
> @@ -119,6 +280,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>  /// struct MyFs;
>  /// impl fs::FileSystem for MyFs {
>  ///     const NAME: &'static CStr = c_str!("myfs");
> +///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams> {
> +///         todo!()
> +///     }
>  /// }
>  /// # }
>  /// ```
> diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
> index 1c00b1da8b94..9878bf88b991 100644
> --- a/samples/rust/rust_rofs.rs
> +++ b/samples/rust/rust_rofs.rs
> @@ -2,6 +2,7 @@
>  
>  //! Rust read-only file system sample.
>  
> +use kernel::fs::{NewSuperBlock, SuperParams};
>  use kernel::prelude::*;
>  use kernel::{c_str, fs};
>  
> @@ -16,4 +17,13 @@
>  struct RoFs;
>  impl fs::FileSystem for RoFs {
>      const NAME: &'static CStr = c_str!("rust-fs");
> +
> +    fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams> {
> +        Ok(SuperParams {
> +            magic: 0x52555354,
> +            blocksize_bits: 12,
> +            maxbytes: fs::MAX_LFS_FILESIZE,
> +            time_gran: 1,
> +        })
> +    }
>  }
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-20  4:11           ` Matthew Wilcox
@ 2023-10-20 15:17             ` Matthew Wilcox
  2023-10-23 12:32               ` Wedson Almeida Filho
  2023-10-23 10:48             ` Andreas Hindborg (Samsung)
  2023-10-23 12:29             ` Wedson Almeida Filho
  2 siblings, 1 reply; 125+ messages in thread
From: Matthew Wilcox @ 2023-10-20 15:17 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Fri, Oct 20, 2023 at 05:11:38AM +0100, Matthew Wilcox wrote:
> > Sure, it's safe to map a folio in general, but Rust has stricter rules
> > about aliasing and mutability that are part of how memory safety is
> > achieved. In particular, it requires that we never have mutable and
> > immutable pointers to the same memory at once (modulo interior
> > mutability).
> > 
> > So we need to avoid something like:
> > 
> > let a = folio.map(); // `a` is a shared pointer to the contents of the folio.
> > 
> > // While we have a shared (and therefore immutable) pointer, we're
> > changing the contents of the folio.
> > sb.sread(sector_number, sector_count, folio);
> > 
> > This violates Rust rules. `UniqueFolio` helps us address this for our
> > use case; if we try the above with a UniqueFolio, the compiler will
> > error out saying that  `a` has a shared reference to the folio, so we
> > can't call `sread` on it (because sread requires a mutable, and
> > therefore not shareable, reference to the folio).
> 
> This is going to be quite the impedance mismatch.  Still, I imagine
> you're used to dealing with those by now and have a toolbox of ideas.
> 
> We don't have that rule for the pagecache as it is.  We do have rules that
> prevent data corruption!  For example, if the folio is !uptodate then you
> must have the lock to write to the folio in order to bring it uptodate
> (so we have a single writer rule in that regard).  But once the folio is
> uptodate, all bets are off in terms of who can be writing to it / reading
> it at the same time.  And that's going to have to continue to be true;
> multiple processes can have the same page mmaped writable and write to
> it at the same time.  There's no possible synchronisation between them.
> 
> But I think your concern is really more limited.  You're concerned
> with filesystem metadata obeying Rust's rules.  And for a read-write
> filesystem, you're going to have to have ... something ... which gets a
> folio from the page cache, and establishes that this is the only thread
> which can modify that folio (maybe it's an interior node of a Btree,
> maybe it's a free space bitmap, ...).  We could probably use the folio
> lock bit for that purpose,  For the read-only filesystems, you only need
> be concerned about freshly-allocated folios, but you need something more
> when it comes to doing an ext2 clone.
> 
> There's general concern about the overuse of the folio lock bit, but
> this is a reasonable use -- preventing two threads from modifying the
> same folio at the same time.

Sorry, I didn't quite finish this thought; that'll teach me to write
complicated emails last thing at night.

The page cache has no single-writer vs multiple-reader exclusion on folios
found in the page cache.  We expect filesystems to implement whatever
exclusion they need at a higher level.  For example, ext2 has no higher
level lock on its block allocator.  Once the buffer is uptodate (ie has
been read from storage), it uses atomic bit operations in order to track
which blocks are freed.  It does use a spinlock to control access to
"how many blocks are currently free".

I'm not suggesting ext2 is an optimal strategy.  I know XFS and btrfs
use rwsems, although I'm not familiar enough with either to describe
exactly how it works.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir`
  2023-10-18 12:25 ` [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir` Wedson Almeida Filho
@ 2023-10-21  8:33   ` Benno Lossin
  2024-01-03 14:09   ` Andreas Hindborg (Samsung)
  2024-01-21 21:00   ` Askar Safin
  2 siblings, 0 replies; 125+ messages in thread
From: Benno Lossin @ 2023-10-21  8:33 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 18.10.23 14:25, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to report the contents of their directory
> inodes. The reported entries cannot be opened yet.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>   rust/kernel/fs.rs         | 193 +++++++++++++++++++++++++++++++++++++-
>   samples/rust/rust_rofs.rs |  49 +++++++++-
>   2 files changed, 236 insertions(+), 6 deletions(-)
> 
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index f3a41cf57502..89611c44e4c5 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -28,6 +28,70 @@ pub trait FileSystem {
>       /// This is called during initialisation of a superblock after [`FileSystem::super_params`] has
>       /// completed successfully.
>       fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>>;
> +
> +    /// Reads directory entries from directory inodes.
> +    ///
> +    /// [`DirEmitter::pos`] holds the current position of the directory reader.
> +    fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result;
> +}
> +
> +/// The types of directory entries reported by [`FileSystem::read_dir`].
> +#[repr(u32)]
> +#[derive(Copy, Clone)]
> +pub enum DirEntryType {
> +    /// Unknown type.
> +    Unknown = bindings::DT_UNKNOWN,
> +
> +    /// Named pipe (first-in, first-out) type.
> +    Fifo = bindings::DT_FIFO,
> +
> +    /// Character device type.
> +    Chr = bindings::DT_CHR,
> +
> +    /// Directory type.
> +    Dir = bindings::DT_DIR,
> +
> +    /// Block device type.
> +    Blk = bindings::DT_BLK,
> +
> +    /// Regular file type.
> +    Reg = bindings::DT_REG,
> +
> +    /// Symbolic link type.
> +    Lnk = bindings::DT_LNK,
> +
> +    /// Named unix-domain socket type.
> +    Sock = bindings::DT_SOCK,
> +
> +    /// White-out type.
> +    Wht = bindings::DT_WHT,
> +}
> +
> +impl From<INodeType> for DirEntryType {
> +    fn from(value: INodeType) -> Self {
> +        match value {
> +            INodeType::Dir => DirEntryType::Dir,
> +        }
> +    }
> +}
> +
> +impl core::convert::TryFrom<u32> for DirEntryType {
> +    type Error = crate::error::Error;
> +
> +    fn try_from(v: u32) -> Result<Self> {
> +        match v {
> +            v if v == Self::Unknown as u32 => Ok(Self::Unknown),
> +            v if v == Self::Fifo as u32 => Ok(Self::Fifo),
> +            v if v == Self::Chr as u32 => Ok(Self::Chr),
> +            v if v == Self::Dir as u32 => Ok(Self::Dir),
> +            v if v == Self::Blk as u32 => Ok(Self::Blk),
> +            v if v == Self::Reg as u32 => Ok(Self::Reg),
> +            v if v == Self::Lnk as u32 => Ok(Self::Lnk),
> +            v if v == Self::Sock as u32 => Ok(Self::Sock),
> +            v if v == Self::Wht as u32 => Ok(Self::Wht),
> +            _ => Err(EDOM),
> +        }
> +    }
>   }
> 
>   /// A registration of a file system.
> @@ -161,9 +225,7 @@ pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
> 
>           let mode = match params.typ {
>               INodeType::Dir => {
> -                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
> -                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
> -
> +                inode.__bindgen_anon_3.i_fop = &Tables::<T>::DIR_FILE_OPERATIONS;
>                   // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
>                   inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
>                   bindings::S_IFDIR
> @@ -403,6 +465,126 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>           free_cached_objects: None,
>           shutdown: None,
>       };
> +
> +    const DIR_FILE_OPERATIONS: bindings::file_operations = bindings::file_operations {
> +        owner: ptr::null_mut(),
> +        llseek: Some(bindings::generic_file_llseek),
> +        read: Some(bindings::generic_read_dir),
> +        write: None,
> +        read_iter: None,
> +        write_iter: None,
> +        iopoll: None,
> +        iterate_shared: Some(Self::read_dir_callback),
> +        poll: None,
> +        unlocked_ioctl: None,
> +        compat_ioctl: None,
> +        mmap: None,
> +        mmap_supported_flags: 0,
> +        open: None,
> +        flush: None,
> +        release: None,
> +        fsync: None,
> +        fasync: None,
> +        lock: None,
> +        get_unmapped_area: None,
> +        check_flags: None,
> +        flock: None,
> +        splice_write: None,
> +        splice_read: None,
> +        splice_eof: None,
> +        setlease: None,
> +        fallocate: None,
> +        show_fdinfo: None,
> +        copy_file_range: None,
> +        remap_file_range: None,
> +        fadvise: None,
> +        uring_cmd: None,
> +        uring_cmd_iopoll: None,
> +    };
> +
> +    unsafe extern "C" fn read_dir_callback(
> +        file: *mut bindings::file,
> +        ctx_ptr: *mut bindings::dir_context,
> +    ) -> core::ffi::c_int {
> +        from_result(|| {
> +            // SAFETY: The C API guarantees that `file` is valid for read. And since `f_inode` is
> +            // immutable, we can read it directly.
> +            let inode = unsafe { &*(*file).f_inode.cast::<INode<T>>() };
> +
> +            // SAFETY: The C API guarantees that this is the only reference to the `dir_context`
> +            // instance.
> +            let emitter = unsafe { &mut *ctx_ptr.cast::<DirEmitter>() };
> +            let orig_pos = emitter.pos();
> +
> +            // Call the module implementation. We ignore errors if directory entries have been
> +            // succesfully emitted: this is because we want users to see them before the error.
> +            match T::read_dir(inode, emitter) {
> +                Ok(_) => Ok(0),
> +                Err(e) => {
> +                    if emitter.pos() == orig_pos {
> +                        Err(e)
> +                    } else {
> +                        Ok(0)
> +                    }
> +                }
> +            }
> +        })
> +    }
> +}
> +
> +/// Directory entry emitter.
> +///
> +/// This is used in [`FileSystem::read_dir`] implementations to report the directory entry.
> +#[repr(transparent)]
> +pub struct DirEmitter(bindings::dir_context);

No `Opaque`?

> +
> +impl DirEmitter {
> +    /// Returns the current position of the emitter.
> +    pub fn pos(&self) -> i64 {
> +        self.0.pos
> +    }
> +
> +    /// Emits a directory entry.
> +    ///
> +    /// `pos_inc` is the number with which to increment the current position on success.
> +    ///
> +    /// `name` is the name of the entry.
> +    ///
> +    /// `ino` is the inode number of the entry.
> +    ///
> +    /// `etype` is the type of the entry.

It might make sense to create a struct for all these parameters.

> +    ///
> +    /// Returns `false` when the entry could not be emitted, possibly because the user-provided
> +    /// buffer is full.
> +    pub fn emit(&mut self, pos_inc: i64, name: &[u8], ino: Ino, etype: DirEntryType) -> bool {
> +        let Ok(name_len) = i32::try_from(name.len()) else {
> +            return false;
> +        };
> +
> +        let Some(actor) = self.0.actor else {
> +            return false;
> +        };
> +
> +        let Some(new_pos) = self.0.pos.checked_add(pos_inc) else {
> +            return false;
> +        };
> +
> +        // SAFETY: `name` is valid at least for the duration of the `actor` call.

What about `&mut self.0`?  Since this is a function pointer, can we
really be sure about the safety requirements?

-- 
Cheers,
Benno

> +        let ret = unsafe {
> +            actor(
> +                &mut self.0,
> +                name.as_ptr().cast(),
> +                name_len,
> +                self.0.pos,
> +                ino,
> +                etype as _,
> +            )
> +        };
> +        if ret {
> +            self.0.pos = new_pos;
> +        }
> +        ret
> +    }
>   }
> 
>   /// Kernel module that exposes a single file system implemented by `T`.
> @@ -431,7 +613,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>   ///
>   /// ```
>   /// # mod module_fs_sample {
> -/// use kernel::fs::{INode, NewSuperBlock, SuperBlock, SuperParams};
> +/// use kernel::fs::{DirEmitter, INode, NewSuperBlock, SuperBlock, SuperParams};
>   /// use kernel::prelude::*;
>   /// use kernel::{c_str, fs, types::ARef};
>   ///
> @@ -452,6 +634,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>   ///     fn init_root(_sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
>   ///         todo!()
>   ///     }
> +///     fn read_dir(_: &INode<Self>, _: &mut DirEmitter) -> Result {
> +///         todo!()
> +///     }
>   /// }
>   /// # }
>   /// ```
> diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
> index 9e5f4c7d1c06..4e61a94afa70 100644
> --- a/samples/rust/rust_rofs.rs
> +++ b/samples/rust/rust_rofs.rs
> @@ -2,7 +2,9 @@
> 
>   //! Rust read-only file system sample.
> 
> -use kernel::fs::{INode, INodeParams, INodeType, NewSuperBlock, SuperBlock, SuperParams};
> +use kernel::fs::{
> +    DirEmitter, INode, INodeParams, INodeType, NewSuperBlock, SuperBlock, SuperParams,
> +};
>   use kernel::prelude::*;
>   use kernel::{c_str, fs, time::UNIX_EPOCH, types::ARef, types::Either};
> 
> @@ -14,6 +16,30 @@
>       license: "GPL",
>   }
> 
> +struct Entry {
> +    name: &'static [u8],
> +    ino: u64,
> +    etype: INodeType,
> +}
> +
> +const ENTRIES: [Entry; 3] = [
> +    Entry {
> +        name: b".",
> +        ino: 1,
> +        etype: INodeType::Dir,
> +    },
> +    Entry {
> +        name: b"..",
> +        ino: 1,
> +        etype: INodeType::Dir,
> +    },
> +    Entry {
> +        name: b"subdir",
> +        ino: 2,
> +        etype: INodeType::Dir,
> +    },
> +];
> +
>   struct RoFs;
>   impl fs::FileSystem for RoFs {
>       const NAME: &'static CStr = c_str!("rust-fs");
> @@ -33,7 +59,7 @@ fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
>               Either::Right(new) => new.init(INodeParams {
>                   typ: INodeType::Dir,
>                   mode: 0o555,
> -                size: 1,
> +                size: ENTRIES.len().try_into()?,
>                   blocks: 1,
>                   nlink: 2,
>                   uid: 0,
> @@ -44,4 +70,23 @@ fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
>               }),
>           }
>       }
> +
> +    fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result {
> +        if inode.ino() != 1 {
> +            return Ok(());
> +        }
> +
> +        let pos = emitter.pos();
> +        if pos >= ENTRIES.len().try_into()? {
> +            return Ok(());
> +        }
> +
> +        for e in ENTRIES.iter().skip(pos.try_into()?) {
> +            if !emitter.emit(1, e.name, e.ino, e.etype.into()) {
> +                break;
> +            }
> +        }
> +
> +        Ok(())
> +    }
>   }
> --
> 2.34.1
> 
>

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-18 12:25 ` [RFC PATCH 09/19] rust: folio: introduce basic support for folios Wedson Almeida Filho
  2023-10-18 17:17   ` Matthew Wilcox
@ 2023-10-21  9:21   ` Benno Lossin
  1 sibling, 0 replies; 125+ messages in thread
From: Benno Lossin @ 2023-10-21  9:21 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 18.10.23 14:25, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to handle ref-counted folios.
> 
> Provide the minimum needed to implement `read_folio` (part of `struct
> address_space_operations`) in read-only file systems and to read
> uncached blocks.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>   rust/bindings/bindings_helper.h |   3 +
>   rust/bindings/lib.rs            |   2 +
>   rust/helpers.c                  |  81 ++++++++++++
>   rust/kernel/folio.rs            | 215 ++++++++++++++++++++++++++++++++
>   rust/kernel/lib.rs              |   1 +
>   5 files changed, 302 insertions(+)
>   create mode 100644 rust/kernel/folio.rs
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index ca1898ce9527..53a99ea512d1 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -11,6 +11,7 @@
>   #include <linux/fs.h>
>   #include <linux/fs_context.h>
>   #include <linux/slab.h>
> +#include <linux/pagemap.h>
>   #include <linux/refcount.h>
>   #include <linux/wait.h>
>   #include <linux/sched.h>
> @@ -27,3 +28,5 @@ const slab_flags_t BINDINGS_SLAB_ACCOUNT = SLAB_ACCOUNT;
>   const unsigned long BINDINGS_SB_RDONLY = SB_RDONLY;
> 
>   const loff_t BINDINGS_MAX_LFS_FILESIZE = MAX_LFS_FILESIZE;
> +
> +const size_t BINDINGS_PAGE_SIZE = PAGE_SIZE;
> diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs
> index 426915d3fb57..a96b7f08e57d 100644
> --- a/rust/bindings/lib.rs
> +++ b/rust/bindings/lib.rs
> @@ -59,3 +59,5 @@ mod bindings_helper {
>   pub const SB_RDONLY: core::ffi::c_ulong = BINDINGS_SB_RDONLY;
> 
>   pub const MAX_LFS_FILESIZE: loff_t = BINDINGS_MAX_LFS_FILESIZE;
> +
> +pub const PAGE_SIZE: usize = BINDINGS_PAGE_SIZE;
> diff --git a/rust/helpers.c b/rust/helpers.c
> index c5a2bec6467d..f2ce3e7b688c 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -23,10 +23,14 @@
>   #include <kunit/test-bug.h>
>   #include <linux/bug.h>
>   #include <linux/build_bug.h>
> +#include <linux/cacheflush.h>
>   #include <linux/err.h>
>   #include <linux/errname.h>
>   #include <linux/fs.h>
> +#include <linux/highmem.h>
> +#include <linux/mm.h>
>   #include <linux/mutex.h>
> +#include <linux/pagemap.h>
>   #include <linux/refcount.h>
>   #include <linux/sched/signal.h>
>   #include <linux/spinlock.h>
> @@ -145,6 +149,77 @@ struct kunit *rust_helper_kunit_get_current_test(void)
>   }
>   EXPORT_SYMBOL_GPL(rust_helper_kunit_get_current_test);
> 
> +void *rust_helper_kmap(struct page *page)
> +{
> +	return kmap(page);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_kmap);
> +
> +void rust_helper_kunmap(struct page *page)
> +{
> +	kunmap(page);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_kunmap);
> +
> +void rust_helper_folio_get(struct folio *folio)
> +{
> +	folio_get(folio);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_folio_get);
> +
> +void rust_helper_folio_put(struct folio *folio)
> +{
> +	folio_put(folio);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_folio_put);
> +
> +struct page *rust_helper_folio_page(struct folio *folio, size_t n)
> +{
> +	return folio_page(folio, n);
> +}
> +
> +loff_t rust_helper_folio_pos(struct folio *folio)
> +{
> +	return folio_pos(folio);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_folio_pos);
> +
> +size_t rust_helper_folio_size(struct folio *folio)
> +{
> +	return folio_size(folio);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_folio_size);
> +
> +void rust_helper_folio_mark_uptodate(struct folio *folio)
> +{
> +	folio_mark_uptodate(folio);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_folio_mark_uptodate);
> +
> +void rust_helper_folio_set_error(struct folio *folio)
> +{
> +	folio_set_error(folio);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_folio_set_error);
> +
> +void rust_helper_flush_dcache_folio(struct folio *folio)
> +{
> +	flush_dcache_folio(folio);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_flush_dcache_folio);
> +
> +void *rust_helper_kmap_local_folio(struct folio *folio, size_t offset)
> +{
> +	return kmap_local_folio(folio, offset);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_kmap_local_folio);
> +
> +void rust_helper_kunmap_local(const void *vaddr)
> +{
> +	kunmap_local(vaddr);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_kunmap_local);
> +
>   void rust_helper_i_uid_write(struct inode *inode, uid_t uid)
>   {
>   	i_uid_write(inode, uid);
> @@ -163,6 +238,12 @@ off_t rust_helper_i_size_read(const struct inode *inode)
>   }
>   EXPORT_SYMBOL_GPL(rust_helper_i_size_read);
> 
> +void rust_helper_mapping_set_large_folios(struct address_space *mapping)
> +{
> +	mapping_set_large_folios(mapping);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_mapping_set_large_folios);
> +
>   /*
>    * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
>    * use it in contexts where Rust expects a `usize` like slice (array) indices.
> diff --git a/rust/kernel/folio.rs b/rust/kernel/folio.rs
> new file mode 100644
> index 000000000000..ef8a08b97962
> --- /dev/null
> +++ b/rust/kernel/folio.rs
> @@ -0,0 +1,215 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Groups of contiguous pages, folios.
> +//!
> +//! C headers: [`include/linux/mm.h`](../../include/linux/mm.h)
> +
> +use crate::error::{code::*, Result};
> +use crate::types::{ARef, AlwaysRefCounted, Opaque, ScopeGuard};
> +use core::{cmp::min, ptr};
> +
> +/// Wraps the kernel's `struct folio`.
> +///
> +/// # Invariants
> +///
> +/// Instances of this type are always ref-counted, that is, a call to `folio_get` ensures that the
> +/// allocation remains valid at least until the matching call to `folio_put`.
> +#[repr(transparent)]
> +pub struct Folio(pub(crate) Opaque<bindings::folio>);
> +
> +// SAFETY: The type invariants guarantee that `Folio` is always ref-counted.
> +unsafe impl AlwaysRefCounted for Folio {
> +    fn inc_ref(&self) {
> +        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> +        unsafe { bindings::folio_get(self.0.get()) };
> +    }
> +
> +    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
> +        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
> +        unsafe { bindings::folio_put(obj.cast().as_ptr()) }
> +    }
> +}
> +
> +impl Folio {
> +    /// Tries to allocate a new folio.
> +    ///
> +    /// On success, returns a folio made up of 2^order pages.
> +    pub fn try_new(order: u32) -> Result<UniqueFolio> {
> +        if order > bindings::MAX_ORDER {
> +            return Err(EDOM);
> +        }
> +
> +        // SAFETY: We checked that `order` is within the max allowed value.
> +        let f = ptr::NonNull::new(unsafe { bindings::folio_alloc(bindings::GFP_KERNEL, order) })
> +            .ok_or(ENOMEM)?;
> +
> +        // SAFETY: The folio returned by `folio_alloc` is referenced. The ownership of the
> +        // reference is transferred to the `ARef` instance.
> +        Ok(UniqueFolio(unsafe { ARef::from_raw(f.cast()) }))
> +    }
> +
> +    /// Returns the byte position of this folio in its file.
> +    pub fn pos(&self) -> i64 {
> +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> +        unsafe { bindings::folio_pos(self.0.get()) }
> +    }
> +
> +    /// Returns the byte size of this folio.
> +    pub fn size(&self) -> usize {
> +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> +        unsafe { bindings::folio_size(self.0.get()) }
> +    }
> +
> +    /// Flushes the data cache for the pages that make up the folio.
> +    pub fn flush_dcache(&self) {
> +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> +        unsafe { bindings::flush_dcache_folio(self.0.get()) }
> +    }
> +}
> +
> +/// A [`Folio`] that has a single reference to it.

This should be an invariant.

> +pub struct UniqueFolio(pub(crate) ARef<Folio>);
> +
> +impl UniqueFolio {
> +    /// Maps the contents of a folio page into a slice.
> +    pub fn map_page(&self, page_index: usize) -> Result<MapGuard<'_>> {
> +        if page_index >= self.0.size() / bindings::PAGE_SIZE {
> +            return Err(EDOM);
> +        }
> +
> +        // SAFETY: We just checked that the index is within bounds of the folio.
> +        let page = unsafe { bindings::folio_page(self.0 .0.get(), page_index) };
> +
> +        // SAFETY: `page` is valid because it was returned by `folio_page` above.
> +        let ptr = unsafe { bindings::kmap(page) };
> +
> +        // SAFETY: We just mapped `ptr`, so it's valid for read.
> +        let data = unsafe { core::slice::from_raw_parts(ptr.cast::<u8>(), bindings::PAGE_SIZE) };
> +
> +        Ok(MapGuard { data, page })
> +    }
> +}
> +
> +/// A mapped [`UniqueFolio`].
> +pub struct MapGuard<'a> {
> +    data: &'a [u8],
> +    page: *mut bindings::page,
> +}
> +
> +impl core::ops::Deref for MapGuard<'_> {
> +    type Target = [u8];
> +
> +    fn deref(&self) -> &Self::Target {
> +        self.data
> +    }
> +}
> +
> +impl Drop for MapGuard<'_> {
> +    fn drop(&mut self) {
> +        // SAFETY: A `MapGuard` instance is only created when `kmap` succeeds, so it's ok to unmap
> +        // it when the guard is dropped.
> +        unsafe { bindings::kunmap(self.page) };
> +    }
> +}
> +
> +/// A locked [`Folio`].

This should be an invariant.

-- 
Cheers,
Benno

> +pub struct LockedFolio<'a>(&'a Folio);
> +
> +impl LockedFolio<'_> {
> +    /// Creates a new locked folio from a raw pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Callers must ensure that the folio is valid and locked. Additionally, that the
> +    /// responsibility of unlocking is transferred to the new instance of [`LockedFolio`]. Lastly,
> +    /// that the returned [`LockedFolio`] doesn't outlive the refcount that keeps it alive.
> +    #[allow(dead_code)]
> +    pub(crate) unsafe fn from_raw(folio: *const bindings::folio) -> Self {
> +        let ptr = folio.cast();
> +        // SAFETY: The safety requirements ensure that `folio` (from which `ptr` is derived) is
> +        // valid and will remain valid while the `LockedFolio` instance lives.
> +        Self(unsafe { &*ptr })
> +    }
> +
> +    /// Marks the folio as being up to date.
> +    pub fn mark_uptodate(&mut self) {
> +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> +        unsafe { bindings::folio_mark_uptodate(self.0 .0.get()) }
> +    }
> +
> +    /// Sets the error flag on the folio.
> +    pub fn set_error(&mut self) {
> +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> +        unsafe { bindings::folio_set_error(self.0 .0.get()) }
> +    }
> +
> +    fn for_each_page(
> +        &mut self,
> +        offset: usize,
> +        len: usize,
> +        mut cb: impl FnMut(&mut [u8]) -> Result,
> +    ) -> Result {
> +        let mut remaining = len;
> +        let mut next_offset = offset;
> +
> +        // Check that we don't overflow the folio.
> +        let end = offset.checked_add(len).ok_or(EDOM)?;
> +        if end > self.size() {
> +            return Err(EINVAL);
> +        }
> +
> +        while remaining > 0 {
> +            let page_offset = next_offset & (bindings::PAGE_SIZE - 1);
> +            let usable = min(remaining, bindings::PAGE_SIZE - page_offset);
> +            // SAFETY: The folio is valid because the shared reference implies a non-zero refcount;
> +            // `next_offset` is also guaranteed be lesss than the folio size.
> +            let ptr = unsafe { bindings::kmap_local_folio(self.0 .0.get(), next_offset) };
> +
> +            // SAFETY: `ptr` was just returned by the `kmap_local_folio` above.
> +            let _guard = ScopeGuard::new(|| unsafe { bindings::kunmap_local(ptr) });
> +
> +            // SAFETY: `kmap_local_folio` maps whole page so we know it's mapped for at least
> +            // `usable` bytes.
> +            let s = unsafe { core::slice::from_raw_parts_mut(ptr.cast::<u8>(), usable) };
> +            cb(s)?;
> +
> +            next_offset += usable;
> +            remaining -= usable;
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Writes the given slice into the folio.
> +    pub fn write(&mut self, offset: usize, data: &[u8]) -> Result {
> +        let mut remaining = data;
> +
> +        self.for_each_page(offset, data.len(), |s| {
> +            s.copy_from_slice(&remaining[..s.len()]);
> +            remaining = &remaining[s.len()..];
> +            Ok(())
> +        })
> +    }
> +
> +    /// Writes zeroes into the folio.
> +    pub fn zero_out(&mut self, offset: usize, len: usize) -> Result {
> +        self.for_each_page(offset, len, |s| {
> +            s.fill(0);
> +            Ok(())
> +        })
> +    }
> +}
> +
> +impl core::ops::Deref for LockedFolio<'_> {
> +    type Target = Folio;
> +    fn deref(&self) -> &Self::Target {
> +        self.0
> +    }
> +}
> +
> +impl Drop for LockedFolio<'_> {
> +    fn drop(&mut self) {
> +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> +        unsafe { bindings::folio_unlock(self.0 .0.get()) }
> +    }
> +}
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index 00059b80c240..0e85b380da64 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -34,6 +34,7 @@
>   mod allocator;
>   mod build_assert;
>   pub mod error;
> +pub mod folio;
>   pub mod fs;
>   pub mod init;
>   pub mod ioctl;
> --
> 2.34.1
> 
>

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 16/19] rust: fs: allow file systems backed by a block device
  2023-10-18 12:25 ` [RFC PATCH 16/19] rust: fs: allow file systems backed by a block device Wedson Almeida Filho
@ 2023-10-21 13:39   ` Benno Lossin
  2024-01-24  4:14     ` Wedson Almeida Filho
  2024-01-03 14:38   ` Andreas Hindborg (Samsung)
  1 sibling, 1 reply; 125+ messages in thread
From: Benno Lossin @ 2023-10-21 13:39 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 18.10.23 14:25, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems that are backed by block devices (in addition to
> in-memory ones).
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>   rust/bindings/bindings_helper.h |   1 +
>   rust/helpers.c                  |  14 +++
>   rust/kernel/fs.rs               | 177 +++++++++++++++++++++++++++++---
>   rust/kernel/fs/buffer.rs        |   1 -
>   4 files changed, 180 insertions(+), 13 deletions(-)
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index d328375f7cb7..8403f13d4d48 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -7,6 +7,7 @@
>    */
> 
>   #include <kunit/test.h>
> +#include <linux/bio.h>
>   #include <linux/buffer_head.h>
>   #include <linux/errname.h>
>   #include <linux/fs.h>
> diff --git a/rust/helpers.c b/rust/helpers.c
> index a5393c6b93f2..bc19f3b7b93e 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -21,6 +21,7 @@
>    */
> 
>   #include <kunit/test-bug.h>
> +#include <linux/blkdev.h>
>   #include <linux/buffer_head.h>
>   #include <linux/bug.h>
>   #include <linux/build_bug.h>
> @@ -252,6 +253,13 @@ unsigned int rust_helper_MKDEV(unsigned int major, unsigned int minor)
>   EXPORT_SYMBOL_GPL(rust_helper_MKDEV);
> 
>   #ifdef CONFIG_BUFFER_HEAD
> +struct buffer_head *rust_helper_sb_bread(struct super_block *sb,
> +		sector_t block)
> +{
> +	return sb_bread(sb, block);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_sb_bread);
> +
>   void rust_helper_get_bh(struct buffer_head *bh)
>   {
>   	get_bh(bh);
> @@ -265,6 +273,12 @@ void rust_helper_put_bh(struct buffer_head *bh)
>   EXPORT_SYMBOL_GPL(rust_helper_put_bh);
>   #endif
> 
> +sector_t rust_helper_bdev_nr_sectors(struct block_device *bdev)
> +{
> +	return bdev_nr_sectors(bdev);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_bdev_nr_sectors);
> +
>   /*
>    * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
>    * use it in contexts where Rust expects a `usize` like slice (array) indices.
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index 4f04cb1d3c6f..b1ad5c110dbb 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -7,11 +7,9 @@
>   //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
> 
>   use crate::error::{code::*, from_result, to_result, Error, Result};
> -use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Opaque};
> -use crate::{
> -    bindings, folio::LockedFolio, init::PinInit, str::CStr, time::Timespec, try_pin_init,
> -    ThisModule,
> -};
> +use crate::folio::{LockedFolio, UniqueFolio};
> +use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Opaque, ScopeGuard};
> +use crate::{bindings, init::PinInit, str::CStr, time::Timespec, try_pin_init, ThisModule};
>   use core::{marker::PhantomData, marker::PhantomPinned, mem::ManuallyDrop, pin::Pin, ptr};
>   use macros::{pin_data, pinned_drop};
> 
> @@ -21,6 +19,17 @@
>   /// Maximum size of an inode.
>   pub const MAX_LFS_FILESIZE: i64 = bindings::MAX_LFS_FILESIZE;
> 
> +/// Type of superblock keying.
> +///
> +/// It determines how C's `fs_context_operations::get_tree` is implemented.
> +pub enum Super {
> +    /// Multiple independent superblocks may exist.
> +    Independent,
> +
> +    /// Uses a block device.
> +    BlockDev,
> +}
> +
>   /// A file system type.
>   pub trait FileSystem {
>       /// Data associated with each file system instance (super-block).
> @@ -29,6 +38,9 @@ pub trait FileSystem {
>       /// The name of the file system type.
>       const NAME: &'static CStr;
> 
> +    /// Determines how superblocks for this file system type are keyed.
> +    const SUPER_TYPE: Super = Super::Independent;
> +
>       /// Returns the parameters to initialise a super block.
>       fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>>;
> 
> @@ -181,7 +193,9 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>                   fs.name = T::NAME.as_char_ptr();
>                   fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
>                   fs.kill_sb = Some(Self::kill_sb_callback::<T>);
> -                fs.fs_flags = 0;
> +                fs.fs_flags = if let Super::BlockDev = T::SUPER_TYPE {
> +                    bindings::FS_REQUIRES_DEV as i32
> +                } else { 0 };
> 
>                   // SAFETY: Pointers stored in `fs` are static so will live for as long as the
>                   // registration is active (it is undone in `drop`).
> @@ -204,9 +218,16 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>       unsafe extern "C" fn kill_sb_callback<T: FileSystem + ?Sized>(
>           sb_ptr: *mut bindings::super_block,
>       ) {
> -        // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
> -        // the appropriate function to call for cleanup.
> -        unsafe { bindings::kill_anon_super(sb_ptr) };
> +        match T::SUPER_TYPE {
> +            // SAFETY: In `get_tree_callback` we always call `get_tree_bdev` for
> +            // `Super::BlockDev`, so `kill_block_super` is the appropriate function to call
> +            // for cleanup.
> +            Super::BlockDev => unsafe { bindings::kill_block_super(sb_ptr) },
> +            // SAFETY: In `get_tree_callback` we always call `get_tree_nodev` for
> +            // `Super::Independent`, so `kill_anon_super` is the appropriate function to call
> +            // for cleanup.
> +            Super::Independent => unsafe { bindings::kill_anon_super(sb_ptr) },
> +        }
> 
>           // SAFETY: The C API contract guarantees that `sb_ptr` is valid for read.
>           let ptr = unsafe { (*sb_ptr).s_fs_info };
> @@ -479,6 +500,65 @@ pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, New
>               })))
>           }
>       }
> +
> +    /// Reads a block from the block device.
> +    #[cfg(CONFIG_BUFFER_HEAD)]
> +    pub fn bread(&self, block: u64) -> Result<ARef<buffer::Head>> {
> +        // Fail requests for non-blockdev file systems. This is a compile-time check.
> +        match T::SUPER_TYPE {
> +            Super::BlockDev => {}
> +            _ => return Err(EIO),
> +        }

Would it make sense to use `build_error` instead of returning an error
here?

Also, do you think that separating this into a trait, `BlockDevFS` would
make sense?

> +
> +        // SAFETY: This function is only valid after the `NeedsInit` typestate, so the block size
> +        // is known and the superblock can be used to read blocks.

Stale SAFETY comment, there are not typestates in this patch?

> +        let ptr =
> +            ptr::NonNull::new(unsafe { bindings::sb_bread(self.0.get(), block) }).ok_or(EIO)?;
> +        // SAFETY: `sb_bread` returns a referenced buffer head. Ownership of the increment is
> +        // passed to the `ARef` instance.
> +        Ok(unsafe { ARef::from_raw(ptr.cast()) })
> +    }
> +
> +    /// Reads `size` bytes starting from `offset` bytes.
> +    ///
> +    /// Returns an iterator that returns slices based on blocks.
> +    #[cfg(CONFIG_BUFFER_HEAD)]
> +    pub fn read(
> +        &self,
> +        offset: u64,
> +        size: u64,
> +    ) -> Result<impl Iterator<Item = Result<buffer::View>> + '_> {
> +        struct BlockIter<'a, T: FileSystem + ?Sized> {
> +            sb: &'a SuperBlock<T>,
> +            next_offset: u64,
> +            end: u64,
> +        }
> +        impl<'a, T: FileSystem + ?Sized> Iterator for BlockIter<'a, T> {
> +            type Item = Result<buffer::View>;
> +
> +            fn next(&mut self) -> Option<Self::Item> {
> +                if self.next_offset >= self.end {
> +                    return None;
> +                }
> +
> +                // SAFETY: The superblock is valid and has had its block size initialised.
> +                let block_size = unsafe { (*self.sb.0.get()).s_blocksize };
> +                let bh = match self.sb.bread(self.next_offset / block_size) {
> +                    Ok(bh) => bh,
> +                    Err(e) => return Some(Err(e)),
> +                };
> +                let boffset = self.next_offset & (block_size - 1);
> +                let bsize = core::cmp::min(self.end - self.next_offset, block_size - boffset);
> +                self.next_offset += bsize;
> +                Some(Ok(buffer::View::new(bh, boffset as usize, bsize as usize)))
> +            }
> +        }
> +        Ok(BlockIter {
> +            sb: self,
> +            next_offset: offset,
> +            end: offset.checked_add(size).ok_or(ERANGE)?,
> +        })
> +    }
>   }
> 
>   /// Required superblock parameters.
> @@ -511,6 +591,70 @@ pub struct SuperParams<T: ForeignOwnable + Send + Sync> {
>   #[repr(transparent)]
>   pub struct NewSuperBlock<T: FileSystem + ?Sized>(bindings::super_block, PhantomData<T>);
> 
> +impl<T: FileSystem + ?Sized> NewSuperBlock<T> {
> +    /// Reads sectors.
> +    ///
> +    /// `count` must be such that the total size doesn't exceed a page.
> +    pub fn sread(&self, sector: u64, count: usize, folio: &mut UniqueFolio) -> Result {
> +        // Fail requests for non-blockdev file systems. This is a compile-time check.
> +        match T::SUPER_TYPE {
> +            // The superblock is valid and given that it's a blockdev superblock it must have a
> +            // valid `s_bdev`.
> +            Super::BlockDev => {}
> +            _ => return Err(EIO),
> +        }
> +
> +        crate::build_assert!(count * (bindings::SECTOR_SIZE as usize) <= bindings::PAGE_SIZE);

Maybe add an error message that explains why this is not ok?

> +
> +        // Read the sectors.
> +        let mut bio = bindings::bio::default();
> +        let bvec = Opaque::<bindings::bio_vec>::uninit();
> +
> +        // SAFETY: `bio` and `bvec` are allocated on the stack, they're both valid.
> +        unsafe {
> +            bindings::bio_init(
> +                &mut bio,
> +                self.0.s_bdev,
> +                bvec.get(),
> +                1,
> +                bindings::req_op_REQ_OP_READ,
> +            )
> +        };
> +
> +        // SAFETY: `bio` was just initialised with `bio_init` above, so it's safe to call
> +        // `bio_uninit` on the way out.
> +        let mut bio =
> +            ScopeGuard::new_with_data(bio, |mut b| unsafe { bindings::bio_uninit(&mut b) });
> +
> +        // SAFETY: We have one free `bvec` (initialsied above). We also know that size won't exceed
> +        // a page size (build_assert above).

I think you should move the `build_assert` above this line.

-- 
Cheers,
Benno

> +        unsafe {
> +            bindings::bio_add_folio_nofail(
> +                &mut *bio,
> +                folio.0 .0.get(),
> +                count * (bindings::SECTOR_SIZE as usize),
> +                0,
> +            )
> +        };
> +        bio.bi_iter.bi_sector = sector;
> +
> +        // SAFETY: The bio was fully initialised above.
> +        to_result(unsafe { bindings::submit_bio_wait(&mut *bio) })?;
> +        Ok(())
> +    }
> +
> +    /// Returns the number of sectors in the underlying block device.
> +    pub fn sector_count(&self) -> Result<u64> {
> +        // Fail requests for non-blockdev file systems. This is a compile-time check.
> +        match T::SUPER_TYPE {
> +            // The superblock is valid and given that it's a blockdev superblock it must have a
> +            // valid `s_bdev`.
> +            Super::BlockDev => Ok(unsafe { bindings::bdev_nr_sectors(self.0.s_bdev) }),
> +            _ => Err(EIO),
> +        }
> +    }
> +}
> +
>   struct Tables<T: FileSystem + ?Sized>(T);
>   impl<T: FileSystem + ?Sized> Tables<T> {
>       const CONTEXT: bindings::fs_context_operations = bindings::fs_context_operations {
> @@ -523,9 +667,18 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>       };
> 
>       unsafe extern "C" fn get_tree_callback(fc: *mut bindings::fs_context) -> core::ffi::c_int {
> -        // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
> -        // the right type and is a valid callback.
> -        unsafe { bindings::get_tree_nodev(fc, Some(Self::fill_super_callback)) }
> +        match T::SUPER_TYPE {
> +            // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
> +            // the right type and is a valid callback.
> +            Super::BlockDev => unsafe {
> +                bindings::get_tree_bdev(fc, Some(Self::fill_super_callback))
> +            },
> +            // SAFETY: `fc` is valid per the callback contract. `fill_super_callback` also has
> +            // the right type and is a valid callback.
> +            Super::Independent => unsafe {
> +                bindings::get_tree_nodev(fc, Some(Self::fill_super_callback))
> +            },
> +        }
>       }
> 
>       unsafe extern "C" fn fill_super_callback(
> diff --git a/rust/kernel/fs/buffer.rs b/rust/kernel/fs/buffer.rs
> index 6052af8822b3..de23d0fee66c 100644
> --- a/rust/kernel/fs/buffer.rs
> +++ b/rust/kernel/fs/buffer.rs
> @@ -49,7 +49,6 @@ pub struct View {
>   }
> 
>   impl View {
> -    #[allow(dead_code)]
>       pub(crate) fn new(head: ARef<Head>, offset: usize, size: usize) -> Self {
>           Self { head, size, offset }
>       }
> --
> 2.34.1
> 
>

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-20  0:52     ` Boqun Feng
@ 2023-10-21 13:48       ` Benno Lossin
  2023-10-21 15:57         ` Boqun Feng
  0 siblings, 1 reply; 125+ messages in thread
From: Benno Lossin @ 2023-10-21 13:48 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Wedson Almeida Filho, Alexander Viro, Christian Brauner,
	Matthew Wilcox, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho

On 20.10.23 02:52, Boqun Feng wrote:
> On Thu, Oct 19, 2023 at 02:30:56PM +0000, Benno Lossin wrote:
> [...]
>>> +        let inode =
>>> +            ptr::NonNull::new(unsafe { bindings::iget_locked(self.0.get(), ino) }).ok_or(ENOMEM)?;
>>> +
>>> +        // SAFETY: `inode` is valid for read, but there could be concurrent writers (e.g., if it's
>>> +        // an already-initialised inode), so we use `read_volatile` to read its current state.
>>> +        let state = unsafe { ptr::read_volatile(ptr::addr_of!((*inode.as_ptr()).i_state)) };
>>
>> Are you sure that `read_volatile` is sufficient for this use case? The
>> documentation [1] clearly states that concurrent write operations are still
>> UB:
>>
>>      Just like in C, whether an operation is volatile has no bearing
>>      whatsoever on questions involving concurrent access from multiple
>>      threads. Volatile accesses behave exactly like non-atomic accesses in
>>      that regard. In particular, a race between a read_volatile and any
>>      write operation to the same location is undefined behavior.
>>
> 
> Right, `read_volatile` can have data race. I think what we can do here
> is:
> 
> 	// SAFETY: `i_state` in `inode` is `unsigned long`, therefore
> 	// it's safe to treat it as `AtomicUsize` and do a relaxed read.
> 	let state = unsafe { *(ptr::addr_of!((*inode.as_ptr()).i_state).cast::<AtomicUsize>()).load(Relaxed) };

I am not sure if that is enough. What kind of writes happen
concurrently on the C side? If they are atomic, then this should
be fine, if they are not synchronized at all, then it could be
problematic, as miri says that it is still UB:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=aa75fb6805c8d67ade8837531a2096d0

-- 
Cheers,
Benno

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 17/19] rust: fs: allow per-inode data
  2023-10-18 12:25 ` [RFC PATCH 17/19] rust: fs: allow per-inode data Wedson Almeida Filho
@ 2023-10-21 13:57   ` Benno Lossin
  2024-01-03 14:39   ` Andreas Hindborg (Samsung)
  1 sibling, 0 replies; 125+ messages in thread
From: Benno Lossin @ 2023-10-21 13:57 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 18.10.23 14:25, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to attach extra [typed] data to each inode. If
> no data is needed, use the regular inode kmem_cache, otherwise we create
> a new one.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>   rust/helpers.c            |   7 +++
>   rust/kernel/fs.rs         | 128 +++++++++++++++++++++++++++++++++++---
>   rust/kernel/mem_cache.rs  |   2 -
>   samples/rust/rust_rofs.rs |   9 ++-
>   4 files changed, 131 insertions(+), 15 deletions(-)
> 
> diff --git a/rust/helpers.c b/rust/helpers.c
> index bc19f3b7b93e..7b12a6d4cf5c 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -222,6 +222,13 @@ void rust_helper_kunmap_local(const void *vaddr)
>   }
>   EXPORT_SYMBOL_GPL(rust_helper_kunmap_local);
> 
> +void *rust_helper_alloc_inode_sb(struct super_block *sb,
> +				 struct kmem_cache *cache, gfp_t gfp)
> +{
> +	return alloc_inode_sb(sb, cache, gfp);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_alloc_inode_sb);
> +
>   void rust_helper_i_uid_write(struct inode *inode, uid_t uid)
>   {
>   	i_uid_write(inode, uid);
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index b1ad5c110dbb..b07203758674 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -9,8 +9,12 @@
>   use crate::error::{code::*, from_result, to_result, Error, Result};
>   use crate::folio::{LockedFolio, UniqueFolio};
>   use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Opaque, ScopeGuard};
> -use crate::{bindings, init::PinInit, str::CStr, time::Timespec, try_pin_init, ThisModule};
> -use core::{marker::PhantomData, marker::PhantomPinned, mem::ManuallyDrop, pin::Pin, ptr};
> +use crate::{
> +    bindings, container_of, init::PinInit, mem_cache::MemCache, str::CStr, time::Timespec,
> +    try_pin_init, ThisModule,
> +};
> +use core::mem::{size_of, ManuallyDrop, MaybeUninit};
> +use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin, ptr};
>   use macros::{pin_data, pinned_drop};
> 
>   #[cfg(CONFIG_BUFFER_HEAD)]
> @@ -35,6 +39,9 @@ pub trait FileSystem {
>       /// Data associated with each file system instance (super-block).
>       type Data: ForeignOwnable + Send + Sync;
> 
> +    /// Type of data associated with each inode.
> +    type INodeData: Send + Sync;
> +
>       /// The name of the file system type.
>       const NAME: &'static CStr;
> 
> @@ -165,6 +172,7 @@ fn try_from(v: u32) -> Result<Self> {
>   pub struct Registration {
>       #[pin]
>       fs: Opaque<bindings::file_system_type>,
> +    inode_cache: Option<MemCache>,
>       #[pin]
>       _pin: PhantomPinned,
>   }
> @@ -182,6 +190,14 @@ impl Registration {
>       pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>           try_pin_init!(Self {
>               _pin: PhantomPinned,
> +            inode_cache: if size_of::<T::INodeData>() == 0 {
> +                None
> +            } else {
> +                Some(MemCache::try_new::<INodeWithData<T::INodeData>>(
> +                    T::NAME,
> +                    Some(Self::inode_init_once_callback::<T>),
> +                )?)
> +            },
>               fs <- Opaque::try_ffi_init(|fs_ptr: *mut bindings::file_system_type| {
>                   // SAFETY: `try_ffi_init` guarantees that `fs_ptr` is valid for write.
>                   unsafe { fs_ptr.write(bindings::file_system_type::default()) };
> @@ -239,6 +255,16 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>               unsafe { T::Data::from_foreign(ptr) };
>           }
>       }
> +
> +    unsafe extern "C" fn inode_init_once_callback<T: FileSystem + ?Sized>(
> +        outer_inode: *mut core::ffi::c_void,
> +    ) {
> +        let ptr = outer_inode.cast::<INodeWithData<T::INodeData>>();
> +
> +        // SAFETY: This is only used in `new`, so we know that we have a valid `INodeWithData`
> +        // instance whose inode part can be initialised.
> +        unsafe { bindings::inode_init_once(ptr::addr_of_mut!((*ptr).inode)) };
> +    }
>   }
> 
>   #[pinned_drop]
> @@ -280,6 +306,15 @@ pub fn super_block(&self) -> &SuperBlock<T> {
>           unsafe { &*(*self.0.get()).i_sb.cast() }
>       }
> 
> +    /// Returns the data associated with the inode.
> +    pub fn data(&self) -> &T::INodeData {
> +        let outerp = container_of!(self.0.get(), INodeWithData<T::INodeData>, inode);
> +        // SAFETY: `self` is guaranteed to be valid by the existence of a shared reference
> +        // (`&self`) to it. Additionally, we know `T::INodeData` is always initialised in an
> +        // `INode`.
> +        unsafe { &*(*outerp).data.as_ptr() }
> +    }
> +
>       /// Returns the size of the inode contents.
>       pub fn size(&self) -> i64 {
>           // SAFETY: `self` is guaranteed to be valid by the existence of a shared reference.
> @@ -300,15 +335,29 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
>       }
>   }
> 
> +struct INodeWithData<T> {
> +    data: MaybeUninit<T>,
> +    inode: bindings::inode,

No `Opaque`?

> +}
> +
>   /// An inode that is locked and hasn't been initialised yet.
>   #[repr(transparent)]
>   pub struct NewINode<T: FileSystem + ?Sized>(ARef<INode<T>>);
> 
>   impl<T: FileSystem + ?Sized> NewINode<T> {
>       /// Initialises the new inode with the given parameters.
> -    pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
> -        // SAFETY: This is a new inode, so it's safe to manipulate it mutably.
> -        let inode = unsafe { &mut *self.0 .0.get() };
> +    pub fn init(self, params: INodeParams<T::INodeData>) -> Result<ARef<INode<T>>> {
> +        let outerp = container_of!(self.0 .0.get(), INodeWithData<T::INodeData>, inode);
> +
> +        // SAFETY: This is a newly-created inode. No other references to it exist, so it is
> +        // safe to mutably dereference it.
> +        let outer = unsafe { &mut *outerp.cast_mut() };
> +
> +        // N.B. We must always write this to a newly allocated inode because the free callback
> +        // expects the data to be initialised and drops it.

This should be an invariant.

> +        outer.data.write(params.value);
> +
> +        let inode = &mut outer.inode;
> 
>           let mode = match params.typ {
>               INodeType::Dir => {
> @@ -424,7 +473,7 @@ pub enum INodeType {
>   /// Required inode parameters.
>   ///
>   /// This is used when creating new inodes.
> -pub struct INodeParams {
> +pub struct INodeParams<T> {
>       /// The access mode. It's a mask that grants execute (1), write (2) and read (4) access to
>       /// everyone, the owner group, and the owner.
>       pub mode: u16,
> @@ -459,6 +508,9 @@ pub struct INodeParams {
> 
>       /// Last access time.
>       pub atime: Timespec,
> +
> +    /// Value to attach to this node.
> +    pub value: T,
>   }
> 
>   /// A file system super block.
> @@ -735,8 +787,12 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>       }
> 
>       const SUPER_BLOCK: bindings::super_operations = bindings::super_operations {
> -        alloc_inode: None,
> -        destroy_inode: None,
> +        alloc_inode: if size_of::<T::INodeData>() != 0 {
> +            Some(Self::alloc_inode_callback)
> +        } else {
> +            None
> +        },
> +        destroy_inode: Some(Self::destroy_inode_callback),
>           free_inode: None,
>           dirty_inode: None,
>           write_inode: None,
> @@ -766,6 +822,61 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>           shutdown: None,
>       };
> 
> +    unsafe extern "C" fn alloc_inode_callback(
> +        sb: *mut bindings::super_block,
> +    ) -> *mut bindings::inode {
> +        // SAFETY: The callback contract guarantees that `sb` is valid for read.
> +        let super_type = unsafe { (*sb).s_type };
> +
> +        // SAFETY: This callback is only used in `Registration`, so `super_type` is necessarily
> +        // embedded in a `Registration`, which is guaranteed to be valid because it has a
> +        // superblock associated to it.
> +        let reg = unsafe { &*container_of!(super_type, Registration, fs) };
> +
> +        // SAFETY: `sb` and `cache` are guaranteed to be valid by the callback contract and by
> +        // the existence of a superblock respectively.
> +        let ptr = unsafe {
> +            bindings::alloc_inode_sb(sb, MemCache::ptr(&reg.inode_cache), bindings::GFP_KERNEL)
> +        }
> +        .cast::<INodeWithData<T::INodeData>>();
> +        if ptr.is_null() {
> +            return ptr::null_mut();
> +        }
> +        ptr::addr_of_mut!((*ptr).inode)
> +    }
> +
> +    unsafe extern "C" fn destroy_inode_callback(inode: *mut bindings::inode) {
> +        // SAFETY: By the C contract, `inode` is a valid pointer.
> +        let is_bad = unsafe { bindings::is_bad_inode(inode) };
> +
> +        // SAFETY: The inode is guaranteed to be valid by the callback contract. Additionally, the
> +        // superblock is also guaranteed to still be valid by the inode existence.
> +        let super_type = unsafe { (*(*inode).i_sb).s_type };
> +
> +        // SAFETY: This callback is only used in `Registration`, so `super_type` is necessarily
> +        // embedded in a `Registration`, which is guaranteed to be valid because it has a
> +        // superblock associated to it.
> +        let reg = unsafe { &*container_of!(super_type, Registration, fs) };
> +        let ptr = container_of!(inode, INodeWithData<T::INodeData>, inode).cast_mut();
> +
> +        if !is_bad {
> +            // SAFETY: The code either initialises the data or marks the inode as bad. Since the

Where exactly is it marked as "bad"?

-- 
Cheers,
Benno

> +            // inode is not bad, the data is initialised, and thus safe to drop.
> +            unsafe { ptr::drop_in_place((*ptr).data.as_mut_ptr()) };
> +        }
> +
> +        if size_of::<T::INodeData>() == 0 {
> +            // SAFETY: When the size of `INodeData` is zero, we don't use a separate mem_cache, so
> +            // it is allocated from the regular mem_cache, which is what `free_inode_nonrcu` uses
> +            // to free the inode.
> +            unsafe { bindings::free_inode_nonrcu(inode) };
> +        } else {
> +            // The callback contract guarantees that the inode was previously allocated via the
> +            // `alloc_inode_callback` callback, so it is safe to free it back to the cache.
> +            unsafe { bindings::kmem_cache_free(MemCache::ptr(&reg.inode_cache), ptr.cast()) };
> +        }
> +    }
> +
>       unsafe extern "C" fn statfs_callback(
>           dentry: *mut bindings::dentry,
>           buf: *mut bindings::kstatfs,
> @@ -1120,6 +1231,7 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>   /// struct MyFs;
>   /// impl fs::FileSystem for MyFs {
>   ///     type Data = ();
> +///     type INodeData =();
>   ///     const NAME: &'static CStr = c_str!("myfs");
>   ///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>> {
>   ///         todo!()
> diff --git a/rust/kernel/mem_cache.rs b/rust/kernel/mem_cache.rs
> index 05e5f2bc9781..bf6ce2d2d3e1 100644
> --- a/rust/kernel/mem_cache.rs
> +++ b/rust/kernel/mem_cache.rs
> @@ -20,7 +20,6 @@ impl MemCache {
>       /// Allocates a new `kmem_cache` for type `T`.
>       ///
>       /// `init` is called by the C code when entries are allocated.
> -    #[allow(dead_code)]
>       pub(crate) fn try_new<T>(
>           name: &'static CStr,
>           init: Option<unsafe extern "C" fn(*mut core::ffi::c_void)>,
> @@ -43,7 +42,6 @@ pub(crate) fn try_new<T>(
>       /// Returns the pointer to the `kmem_cache` instance, or null if it's `None`.
>       ///
>       /// This is a helper for functions like `alloc_inode_sb` where the cache is optional.
> -    #[allow(dead_code)]
>       pub(crate) fn ptr(c: &Option<Self>) -> *mut bindings::kmem_cache {
>           match c {
>               Some(m) => m.ptr.as_ptr(),
> diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
> index 093425650f26..dfe745439842 100644
> --- a/samples/rust/rust_rofs.rs
> +++ b/samples/rust/rust_rofs.rs
> @@ -53,6 +53,7 @@ struct Entry {
>   struct RoFs;
>   impl fs::FileSystem for RoFs {
>       type Data = ();
> +    type INodeData = &'static Entry;
>       const NAME: &'static CStr = c_str!("rust-fs");
> 
>       fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>> {
> @@ -79,6 +80,7 @@ fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
>                   atime: UNIX_EPOCH,
>                   ctime: UNIX_EPOCH,
>                   mtime: UNIX_EPOCH,
> +                value: &ENTRIES[0],
>               }),
>           }
>       }
> @@ -122,6 +124,7 @@ fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>> {
>                           atime: UNIX_EPOCH,
>                           ctime: UNIX_EPOCH,
>                           mtime: UNIX_EPOCH,
> +                        value: e,
>                       }),
>                   };
>               }
> @@ -131,11 +134,7 @@ fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>> {
>       }
> 
>       fn read_folio(inode: &INode<Self>, mut folio: LockedFolio<'_>) -> Result {
> -        let data = match inode.ino() {
> -            2 => ENTRIES[2].contents,
> -            3 => ENTRIES[3].contents,
> -            _ => return Err(EINVAL),
> -        };
> +        let data = inode.data().contents;
> 
>           let pos = usize::try_from(folio.pos()).unwrap_or(usize::MAX);
>           let copied = if pos >= data.len() {
> --
> 2.34.1
> 
>

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-21 13:48       ` Benno Lossin
@ 2023-10-21 15:57         ` Boqun Feng
  2023-10-21 17:01           ` Matthew Wilcox
  0 siblings, 1 reply; 125+ messages in thread
From: Boqun Feng @ 2023-10-21 15:57 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Wedson Almeida Filho, Alexander Viro, Christian Brauner,
	Matthew Wilcox, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho, Marco Elver

On Sat, Oct 21, 2023 at 01:48:28PM +0000, Benno Lossin wrote:
> On 20.10.23 02:52, Boqun Feng wrote:
> > On Thu, Oct 19, 2023 at 02:30:56PM +0000, Benno Lossin wrote:
> > [...]
> >>> +        let inode =
> >>> +            ptr::NonNull::new(unsafe { bindings::iget_locked(self.0.get(), ino) }).ok_or(ENOMEM)?;
> >>> +
> >>> +        // SAFETY: `inode` is valid for read, but there could be concurrent writers (e.g., if it's
> >>> +        // an already-initialised inode), so we use `read_volatile` to read its current state.
> >>> +        let state = unsafe { ptr::read_volatile(ptr::addr_of!((*inode.as_ptr()).i_state)) };
> >>
> >> Are you sure that `read_volatile` is sufficient for this use case? The
> >> documentation [1] clearly states that concurrent write operations are still
> >> UB:
> >>
> >>      Just like in C, whether an operation is volatile has no bearing
> >>      whatsoever on questions involving concurrent access from multiple
> >>      threads. Volatile accesses behave exactly like non-atomic accesses in
> >>      that regard. In particular, a race between a read_volatile and any
> >>      write operation to the same location is undefined behavior.
> >>
> > 
> > Right, `read_volatile` can have data race. I think what we can do here
> > is:
> > 
> > 	// SAFETY: `i_state` in `inode` is `unsigned long`, therefore
> > 	// it's safe to treat it as `AtomicUsize` and do a relaxed read.
> > 	let state = unsafe { *(ptr::addr_of!((*inode.as_ptr()).i_state).cast::<AtomicUsize>()).load(Relaxed) };
> 
> I am not sure if that is enough. What kind of writes happen
> concurrently on the C side? If they are atomic, then this should
> be fine, if they are not synchronized at all, then it could be
> problematic, as miri says that it is still UB:
> https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=aa75fb6805c8d67ade8837531a2096d0
> 

You're not wrong, my suggestion here had the assumption that write part
of ->i_state is atomic (I hadn't look into that). Now a quick look tells
it isn't, for example in fs/f2fs/namei.c, there is:

	inode->i_state |= I_LINKABLE;

so I think we need to take the inode->i_lock here for a data-race free
solution. Or if we have something like:

	https://github.com/rust-lang/unsafe-code-guidelines/issues/321

in Rust.

Benno, notice my reasoning about whether a write is atomic is less
strict, since in C side, in the current rule of the kernel, plain
writes to machine words can be treated as atomic, in case you're
interested CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC is the pointer ;-)

While we are at it, adding Marco, could kcsan work for Rust code? If I
understand correctly, as long as Rust compilers could generate these
__tsan_* instrument functions, it should work, right?

Regards,
Boqun

> -- 
> Cheers,
> Benno

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-21 15:57         ` Boqun Feng
@ 2023-10-21 17:01           ` Matthew Wilcox
  2023-10-21 19:33             ` Boqun Feng
  0 siblings, 1 reply; 125+ messages in thread
From: Matthew Wilcox @ 2023-10-21 17:01 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Benno Lossin, Wedson Almeida Filho, Alexander Viro,
	Christian Brauner, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho, Marco Elver

On Sat, Oct 21, 2023 at 08:57:30AM -0700, Boqun Feng wrote:
> You're not wrong, my suggestion here had the assumption that write part
> of ->i_state is atomic (I hadn't look into that). Now a quick look tells
> it isn't, for example in fs/f2fs/namei.c, there is:
> 
> 	inode->i_state |= I_LINKABLE;

But it doesn't matter what f2fs does to _its_ inodes.  tarfs will never
see an f2fs inode.  I don't know what the rules are around inode->i_state;
I'm only an expert on the page cache, not the rest of the VFS.  So
what are the rules around modifying i_state for the VFS?


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-21 17:01           ` Matthew Wilcox
@ 2023-10-21 19:33             ` Boqun Feng
  2023-10-23  5:29               ` Dave Chinner
  0 siblings, 1 reply; 125+ messages in thread
From: Boqun Feng @ 2023-10-21 19:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Benno Lossin, Wedson Almeida Filho, Alexander Viro,
	Christian Brauner, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho, Marco Elver

On Sat, Oct 21, 2023 at 06:01:02PM +0100, Matthew Wilcox wrote:
> On Sat, Oct 21, 2023 at 08:57:30AM -0700, Boqun Feng wrote:
> > You're not wrong, my suggestion here had the assumption that write part
> > of ->i_state is atomic (I hadn't look into that). Now a quick look tells
> > it isn't, for example in fs/f2fs/namei.c, there is:
> > 
> > 	inode->i_state |= I_LINKABLE;
> 
> But it doesn't matter what f2fs does to _its_ inodes.  tarfs will never
> see an f2fs inode.  I don't know what the rules are around inode->i_state;

Well, maybe I choose a bad example ;-) I agree that tarfs will never see
an f2fs inode and since tarfs is the only user right now, the data race
should really depend on tarfs right now. But this is general filesystem
Rust API, so it should in theory work with everything. Plus fs/dcache.c
has something similar:

	inode->i_state &= ~I_NEW & ~I_CREATING;

> I'm only an expert on the page cache, not the rest of the VFS.  So
> what are the rules around modifying i_state for the VFS?
> 

Agreed, same question here.

Regards,
Boqun

> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-21 19:33             ` Boqun Feng
@ 2023-10-23  5:29               ` Dave Chinner
  2023-10-23 12:55                 ` Wedson Almeida Filho
  0 siblings, 1 reply; 125+ messages in thread
From: Dave Chinner @ 2023-10-23  5:29 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Matthew Wilcox, Benno Lossin, Wedson Almeida Filho,
	Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho, Marco Elver

On Sat, Oct 21, 2023 at 12:33:57PM -0700, Boqun Feng wrote:
> On Sat, Oct 21, 2023 at 06:01:02PM +0100, Matthew Wilcox wrote:
> > I'm only an expert on the page cache, not the rest of the VFS.  So
> > what are the rules around modifying i_state for the VFS?
> 
> Agreed, same question here.

inode->i_state should only be modified under inode->i_lock.

And in most situations, you have to hold the inode->i_lock to read
state flags as well so that reads are serialised against
modifications which are typically non-atomic RMW operations.

There is, I think, one main exception to read side locking and this
is find_inode_rcu() which does an unlocked check for I_WILL_FREE |
I_FREEING. In this case, the inode->i_state updates in iput_final()
use WRITE_ONCE under the inode->i_lock to provide the necessary
semantics for the unlocked READ_ONCE() done under rcu_read_lock().

IOWs, if you follow the general rule that any inode->i_state access
(read or write) needs to hold inode->i_lock, you probably won't
screw up. 

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-20  4:11           ` Matthew Wilcox
  2023-10-20 15:17             ` Matthew Wilcox
@ 2023-10-23 10:48             ` Andreas Hindborg (Samsung)
  2023-10-23 14:28               ` Matthew Wilcox
  2023-10-23 12:29             ` Wedson Almeida Filho
  2 siblings, 1 reply; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2023-10-23 10:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Wedson Almeida Filho, Alexander Viro, Christian Brauner,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho


Matthew Wilcox <willy@infradead.org> writes:

(snip)

>> > Something I forgot to mention was that I found it more useful to express
>> > "map this chunk of a folio" in bytes rather than pages.  You might find
>> > the same, in which case it's just folio.map(offset: usize) instead of
>> > folio.map_page(page_index: usize)
>> 
>> Oh, thanks for the feedback. I'll switch to bytes then for v2.
>> (Already did in the example above.)
>
> Great!  Something else I think would be a good idea is open-coding some
> of the trivial accessors.  eg instead of doing:
>
> +size_t rust_helper_folio_size(struct folio *folio)
> +{
> +	return folio_size(folio);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_folio_size);
> [...]
> +    pub fn size(&self) -> usize {
> +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> +        unsafe { bindings::folio_size(self.0.get()) }
> +    }
>
> add:
>
> impl Folio {
> ...
>     pub fn order(&self) -> u8 {
> 	if (self.flags & (1 << PG_head))
> 	    self._flags_1 & 0xff
> 	else
> 	    0
>     }
>
>     pub fn size(&self) -> usize {
> 	bindings::PAGE_SIZE << self.order()
>     }
> }
>
> ... or have I misunderstood what is possible here?  My hope is that the
> compiler gets to "see through" the abstraction, which surely can't be
> done when there's a function call.

The build system and Rust compiler can inline and optimize across
function calls and languages when LTO is enabled. Some patches are
needed to make it work though.

BR Andreas

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-20  4:11           ` Matthew Wilcox
  2023-10-20 15:17             ` Matthew Wilcox
  2023-10-23 10:48             ` Andreas Hindborg (Samsung)
@ 2023-10-23 12:29             ` Wedson Almeida Filho
  2 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-23 12:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Fri, 20 Oct 2023 at 01:11, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Oct 19, 2023 at 10:25:39AM -0300, Wedson Almeida Filho wrote:
> > On Wed, 18 Oct 2023 at 16:21, Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Wed, Oct 18, 2023 at 03:32:36PM -0300, Wedson Almeida Filho wrote:
> > > > On Wed, 18 Oct 2023 at 14:17, Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Wed, Oct 18, 2023 at 09:25:08AM -0300, Wedson Almeida Filho wrote:
> > > > > > +void *rust_helper_kmap(struct page *page)
> > > > > > +{
> > > > > > +     return kmap(page);
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(rust_helper_kmap);
> > > > > > +
> > > > > > +void rust_helper_kunmap(struct page *page)
> > > > > > +{
> > > > > > +     kunmap(page);
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(rust_helper_kunmap);
> > > > >
> > > > > I'm not thrilled by exposing kmap()/kunmap() to Rust code.  The vast
> > > > > majority of code really only needs kmap_local_*() / kunmap_local().
> > > > > Can you elaborate on why you need the old kmap() in new Rust code?
> > > >
> > > > The difficulty we have with kmap_local_*() has to do with the
> > > > requirement that maps and unmaps need to be nested neatly. For
> > > > example:
> > > >
> > > > let a = folio1.map_local(...);
> > > > let b = folio2.map_local(...);
> > > > // Do something with `a` and `b`.
> > > > drop(a);
> > > > drop(b);
> > > >
> > > > The code obviously violates the requirements.
> > >
> > > Is that the only problem, or are there situations where we might try
> > > to do something like:
> > >
> > > a = folio1.map.local()
> > > b = folio2.map.local()
> > > drop(a)
> > > a = folio3.map.local()
> > > drop(b)
> > > b = folio4.map.local()
> > > drop (a)
> > > a = folio5.map.local()
> > > ...
> >
> > This is also a problem. We don't control the order in which users are
> > going to unmap.
>
> OK.  I have something in the works, but it's not quite ready yet.

Please share once you're happy with it!

Or before. :)

> > This violates Rust rules. `UniqueFolio` helps us address this for our
> > use case; if we try the above with a UniqueFolio, the compiler will
> > error out saying that  `a` has a shared reference to the folio, so we
> > can't call `sread` on it (because sread requires a mutable, and
> > therefore not shareable, reference to the folio).
>
> This is going to be quite the impedance mismatch.  Still, I imagine
> you're used to dealing with those by now and have a toolbox of ideas.
>
> We don't have that rule for the pagecache as it is.  We do have rules that
> prevent data corruption!  For example, if the folio is !uptodate then you
> must have the lock to write to the folio in order to bring it uptodate
> (so we have a single writer rule in that regard).  But once the folio is
> uptodate, all bets are off in terms of who can be writing to it / reading
> it at the same time.  And that's going to have to continue to be true;
> multiple processes can have the same page mmaped writable and write to
> it at the same time.  There's no possible synchronisation between them.
>
> But I think your concern is really more limited.  You're concerned
> with filesystem metadata obeying Rust's rules.  And for a read-write
> filesystem, you're going to have to have ... something ... which gets a
> folio from the page cache, and establishes that this is the only thread
> which can modify that folio (maybe it's an interior node of a Btree,
> maybe it's a free space bitmap, ...).  We could probably use the folio
> lock bit for that purpose,  For the read-only filesystems, you only need
> be concerned about freshly-allocated folios, but you need something more
> when it comes to doing an ext2 clone.
>
> There's general concern about the overuse of the folio lock bit, but
> this is a reasonable use -- preventing two threads from modifying the
> same folio at the same time.
>
> (I have simplified all this; both iomap and buffer heads support folios
> which are partially uptodate, but conceptually this is accurate)

Yes, that's precisely the case. Rust doesn't mind if data is mapped
multiple times for a given folio as in most cases it won't inspect the
contents anyway. But for metadata, it will need some synchronisation.

In this read-only scenario we're supporting now, we conveniently
already get locked folios in `read_folio` calls, so we're using the
fact that it's locked to have a single writer to it -- note that the
`write` and `zero_out` functions are only available  in `LockedFolio`,
not in `Folio`.

`UniqueFolio` is for when a module needs to read sectors without the
cache, in particular the super block. (Before it has decided on the
block size and has initialised the superblock with the block size.) So
it's essentially a sequence of allocate a folio, read from a block
device, map it, use the contents, unmap, free. The folio isn't really
shared with anyone else. Eventually we may want to merge this with
concept with that of a locked folio so that we only have one writable
folio.

> > > On systems without HIGHMEM, kmap() is a no-op.  So we could do something
> > > like this:
> > >
> > >         let data = unsafe { core::slice::from_raw_parts(ptr.cast::<u8>(),
> > >                 if (folio_test_highmem(folio))
> > >                         bindings::PAGE_SIZE
> > >                 else
> > >                         folio_size(folio) - page_idx * PAGE_SIZE) }
> > >
> > > ... modulo whatever the correct syntax is in Rust.
> >
> > We can certainly do that. But since there's the possibility that the
> > array will be capped at PAGE_SIZE in the HIGHMEM case, callers would
> > still need a loop to traverse the whole folio, right?
> >
> > let mut offset = 0;
> > while offset < folio.size() {
> >     let a = folio.map(offset);
> >     // Do something with a.
> >     offset += a.len();
> > }
> >
> > I guess the advantage is that we'd have a single iteration in systems
> > without HIGHMEM.
>
> Right.  You can see something similar to that in memcpy_from_folio() in
> highmem.h.

I will have this in v2.

> > > Something I forgot to mention was that I found it more useful to express
> > > "map this chunk of a folio" in bytes rather than pages.  You might find
> > > the same, in which case it's just folio.map(offset: usize) instead of
> > > folio.map_page(page_index: usize)
> >
> > Oh, thanks for the feedback. I'll switch to bytes then for v2.
> > (Already did in the example above.)
>
> Great!  Something else I think would be a good idea is open-coding some
> of the trivial accessors.  eg instead of doing:
>
> +size_t rust_helper_folio_size(struct folio *folio)
> +{
> +       return folio_size(folio);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_folio_size);
> [...]
> +    pub fn size(&self) -> usize {
> +        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.
> +        unsafe { bindings::folio_size(self.0.get()) }
> +    }
>
> add:
>
> impl Folio {
> ...
>     pub fn order(&self) -> u8 {
>         if (self.flags & (1 << PG_head))
>             self._flags_1 & 0xff
>         else
>             0
>     }
>
>     pub fn size(&self) -> usize {
>         bindings::PAGE_SIZE << self.order()
>     }
> }
>
> ... or have I misunderstood what is possible here?  My hope is that the
> compiler gets to "see through" the abstraction, which surely can't be
> done when there's a function call.

As Andreas pointed out, with LTO we can get the linker to see through
these functions, even across different languages, and optimise when it
makes sense.

Having said that, while it's possible to do what you suggested above,
we try to avoid it so that maintainers can continue to have a single
place they need to change if they ever decide to change things. A
simple example from above is order(), if you decide to implement it
differently (I don't know, if you change the flag, you decide to have
an explicit field, whatever), then you'd have to change the C _and_
the Rust versions. Worse yet, there's a chance that forgetting to
update the Rust version wouldn't break the build, which would make it
harder to catch mismatched versions.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-20 15:17             ` Matthew Wilcox
@ 2023-10-23 12:32               ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-23 12:32 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Fri, 20 Oct 2023 at 12:17, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Oct 20, 2023 at 05:11:38AM +0100, Matthew Wilcox wrote:
> > > Sure, it's safe to map a folio in general, but Rust has stricter rules
> > > about aliasing and mutability that are part of how memory safety is
> > > achieved. In particular, it requires that we never have mutable and
> > > immutable pointers to the same memory at once (modulo interior
> > > mutability).
> > >
> > > So we need to avoid something like:
> > >
> > > let a = folio.map(); // `a` is a shared pointer to the contents of the folio.
> > >
> > > // While we have a shared (and therefore immutable) pointer, we're
> > > changing the contents of the folio.
> > > sb.sread(sector_number, sector_count, folio);
> > >
> > > This violates Rust rules. `UniqueFolio` helps us address this for our
> > > use case; if we try the above with a UniqueFolio, the compiler will
> > > error out saying that  `a` has a shared reference to the folio, so we
> > > can't call `sread` on it (because sread requires a mutable, and
> > > therefore not shareable, reference to the folio).
> >
> > This is going to be quite the impedance mismatch.  Still, I imagine
> > you're used to dealing with those by now and have a toolbox of ideas.
> >
> > We don't have that rule for the pagecache as it is.  We do have rules that
> > prevent data corruption!  For example, if the folio is !uptodate then you
> > must have the lock to write to the folio in order to bring it uptodate
> > (so we have a single writer rule in that regard).  But once the folio is
> > uptodate, all bets are off in terms of who can be writing to it / reading
> > it at the same time.  And that's going to have to continue to be true;
> > multiple processes can have the same page mmaped writable and write to
> > it at the same time.  There's no possible synchronisation between them.
> >
> > But I think your concern is really more limited.  You're concerned
> > with filesystem metadata obeying Rust's rules.  And for a read-write
> > filesystem, you're going to have to have ... something ... which gets a
> > folio from the page cache, and establishes that this is the only thread
> > which can modify that folio (maybe it's an interior node of a Btree,
> > maybe it's a free space bitmap, ...).  We could probably use the folio
> > lock bit for that purpose,  For the read-only filesystems, you only need
> > be concerned about freshly-allocated folios, but you need something more
> > when it comes to doing an ext2 clone.
> >
> > There's general concern about the overuse of the folio lock bit, but
> > this is a reasonable use -- preventing two threads from modifying the
> > same folio at the same time.
>
> Sorry, I didn't quite finish this thought; that'll teach me to write
> complicated emails last thing at night.
>
> The page cache has no single-writer vs multiple-reader exclusion on folios
> found in the page cache.  We expect filesystems to implement whatever
> exclusion they need at a higher level.  For example, ext2 has no higher
> level lock on its block allocator.  Once the buffer is uptodate (ie has
> been read from storage), it uses atomic bit operations in order to track
> which blocks are freed.  It does use a spinlock to control access to
> "how many blocks are currently free".
>
> I'm not suggesting ext2 is an optimal strategy.  I know XFS and btrfs
> use rwsems, although I'm not familiar enough with either to describe
> exactly how it works.

Thanks for this explanation, it's good to see this kind of high-level
decisions/directions spelled out. When we need writable file systems,
we'll encode this in the type system.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-20  0:30   ` Boqun Feng
@ 2023-10-23 12:36     ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-23 12:36 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Thu, 19 Oct 2023 at 21:31, Boqun Feng <boqun.feng@gmail.com> wrote:
> On Wed, Oct 18, 2023 at 09:25:05AM -0300, Wedson Almeida Filho wrote:
> > +        // SAFETY: We are manually destructuring `self` and preventing `drop` from being called.
> > +        Ok(unsafe { (&ManuallyDrop::new(self).0 as *const ARef<INode<T>>).read() })
>
> How do we feel about using transmute here? ;-) I.e.
>
>         // SAFETY: `NewINode` is transparent to `ARef<INode<_>>`, and
>         // the inode has been initialised, so it's safety to change the
>         // object type.
>         Ok(unsafe { core::mem::transmute(self) })
>
> What we actually want here is changing the type of the object (i.e.
> bitwise move from one type to another), seems to me that transmute is
> the best fit here.
>
> Thoughts?

That's much nicer. I'll do this in v2.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-23  5:29               ` Dave Chinner
@ 2023-10-23 12:55                 ` Wedson Almeida Filho
  2023-10-30  2:29                   ` Dave Chinner
  0 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-23 12:55 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Boqun Feng, Matthew Wilcox, Benno Lossin, Alexander Viro,
	Christian Brauner, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho, Marco Elver

On Mon, 23 Oct 2023 at 02:29, Dave Chinner <david@fromorbit.com> wrote:
>
> On Sat, Oct 21, 2023 at 12:33:57PM -0700, Boqun Feng wrote:
> > On Sat, Oct 21, 2023 at 06:01:02PM +0100, Matthew Wilcox wrote:
> > > I'm only an expert on the page cache, not the rest of the VFS.  So
> > > what are the rules around modifying i_state for the VFS?
> >
> > Agreed, same question here.
>
> inode->i_state should only be modified under inode->i_lock.
>
> And in most situations, you have to hold the inode->i_lock to read
> state flags as well so that reads are serialised against
> modifications which are typically non-atomic RMW operations.
>
> There is, I think, one main exception to read side locking and this
> is find_inode_rcu() which does an unlocked check for I_WILL_FREE |
> I_FREEING. In this case, the inode->i_state updates in iput_final()
> use WRITE_ONCE under the inode->i_lock to provide the necessary
> semantics for the unlocked READ_ONCE() done under rcu_read_lock().
>
> IOWs, if you follow the general rule that any inode->i_state access
> (read or write) needs to hold inode->i_lock, you probably won't
> screw up.

I don't see filesystems doing this though. In particular, see
iget_locked() -- if a new inode is returned, then it is locked, but if
a cached one is found, it's not locked.

So we're in this situation where a returned inode may or may not be
locked. And the way to determine if it's locked or not is to read
i_state.

Here are examples of kernfs, ext2, ext4 and squashfs doing it:
https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/kernfs/inode.c#L252
https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/ext2/inode.c#L1392
https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/ext4/inode.c#L4707
https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/squashfs/inode.c#L82

They all call iget_locked(), and if I_NEW is set, they initialise the
inode and unlock it with unlock_new_inode(); otherwise they just
return the unlocked inode.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-23 10:48             ` Andreas Hindborg (Samsung)
@ 2023-10-23 14:28               ` Matthew Wilcox
  2023-10-24 15:04                 ` Ariel Miculas (amiculas)
  0 siblings, 1 reply; 125+ messages in thread
From: Matthew Wilcox @ 2023-10-23 14:28 UTC (permalink / raw)
  To: Andreas Hindborg (Samsung)
  Cc: Wedson Almeida Filho, Alexander Viro, Christian Brauner,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Mon, Oct 23, 2023 at 12:48:33PM +0200, Andreas Hindborg (Samsung) wrote:
> The build system and Rust compiler can inline and optimize across
> function calls and languages when LTO is enabled. Some patches are
> needed to make it work though.

That's fine, but something like folio_put() is performance-critical.

Relying on the linker to figure out that it _should_ inline through

+void rust_helper_folio_put(struct folio *folio)
+{
+	folio_put(folio);
+}
+EXPORT_SYMBOL_GPL(rust_helper_folio_put);

seems like a very bad idea to me.  For reference, folio_put() is
defined as:

static inline void folio_put(struct folio *folio)
{
        if (folio_put_testzero(folio))
                __folio_put(folio);
}

which turns into (once you work your way through all the gunk that hasn't
been cleaned up yet)

	if (atomic_dec_and_test(&folio->_refcount))
		__folio_put(folio)

ie it's a single dec-and-test insn followed by a conditional function
call.  Yes, there's some expensive debug you can turn on in there, but
it's an incredibly frequent call, and we shouldn't be relying on linker
magic to optimise it all away.

Of course, I don't want to lose the ability to turn on the debug code,
so folio.put() can't be as simple as the call to atomic_dec_and_test(),
but I hope you see my point.

Wedson wrote in a later email,
> Having said that, while it's possible to do what you suggested above,
> we try to avoid it so that maintainers can continue to have a single
> place they need to change if they ever decide to change things. A
> simple example from above is order(), if you decide to implement it
> differently (I don't know, if you change the flag, you decide to have
> an explicit field, whatever), then you'd have to change the C _and_
> the Rust versions. Worse yet, there's a chance that forgetting to
> update the Rust version wouldn't break the build, which would make it
> harder to catch mismatched versions.

I understand that concern!  Indeed, I did change the implementation
of folio_order() recently.  I'm happy to commit to keeping the Rust
implementation updated as I modify the C implementation of folios,
but I appreciate that other maintainers may not be willing to make such
a commitment.

I'm all the way up to Chapter 5: References in the Blandy book now!
I expect to understand the patches you're sending any week now ;-)

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios
  2023-10-23 14:28               ` Matthew Wilcox
@ 2023-10-24 15:04                 ` Ariel Miculas (amiculas)
  0 siblings, 0 replies; 125+ messages in thread
From: Ariel Miculas (amiculas) @ 2023-10-24 15:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andreas Hindborg (Samsung), Wedson Almeida Filho, Alexander Viro,
	Christian Brauner, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	Wedson Almeida Filho

On 23/10/23 03:28PM, Matthew Wilcox wrote:
(snip)
> I'm all the way up to Chapter 5: References in the Blandy book now!
> I expect to understand the patches you're sending any week now ;-)

That book (Programming Rust) has some great explanations, I went through
the first 7 chapters after reading "The Rust Programming Language" [1].
If you also want to dive into unsafe Rust, I recommend "Learn Rust With
Entirely Too Many Linked Lists" [2], it covers quite a few complex
concepts and it's very well written.

[1] https://doc.rust-lang.org/book/
[2] https://rust-unofficial.github.io/too-many-lists/

Cheers,
Ariel

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 14/19] rust: fs: add per-superblock data
  2023-10-18 12:25 ` [RFC PATCH 14/19] rust: fs: add per-superblock data Wedson Almeida Filho
@ 2023-10-25 15:51   ` Ariel Miculas (amiculas)
  2023-10-26 13:46   ` Ariel Miculas (amiculas)
  2024-01-03 14:16   ` Andreas Hindborg (Samsung)
  2 siblings, 0 replies; 125+ messages in thread
From: Ariel Miculas (amiculas) @ 2023-10-25 15:51 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	Wedson Almeida Filho

On 23/10/18 09:25AM, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to associate [typed] data to super blocks when
> they're created. Since we only have a pointer-sized field in which to
> store the state, it must implement the `ForeignOwnable` trait.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>  rust/kernel/fs.rs         | 42 +++++++++++++++++++++++++++++++++------
>  samples/rust/rust_rofs.rs |  4 +++-
>  2 files changed, 39 insertions(+), 7 deletions(-)
> 
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index 5b7eaa16d254..e9a9362d2897 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -7,7 +7,7 @@
>  //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
>  
>  use crate::error::{code::*, from_result, to_result, Error, Result};
> -use crate::types::{ARef, AlwaysRefCounted, Either, Opaque};
> +use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Opaque};
>  use crate::{
>      bindings, folio::LockedFolio, init::PinInit, str::CStr, time::Timespec, try_pin_init,
>      ThisModule,
> @@ -20,11 +20,14 @@
>  
>  /// A file system type.
>  pub trait FileSystem {
> +    /// Data associated with each file system instance (super-block).
> +    type Data: ForeignOwnable + Send + Sync;
> +
>      /// The name of the file system type.
>      const NAME: &'static CStr;
>  
>      /// Returns the parameters to initialise a super block.
> -    fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams>;
> +    fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>>;
>  
>      /// Initialises and returns the root inode of the given superblock.
>      ///
> @@ -174,7 +177,7 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>                  fs.owner = module.0;
>                  fs.name = T::NAME.as_char_ptr();
>                  fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
> -                fs.kill_sb = Some(Self::kill_sb_callback);
> +                fs.kill_sb = Some(Self::kill_sb_callback::<T>);
>                  fs.fs_flags = 0;
>  
>                  // SAFETY: Pointers stored in `fs` are static so will live for as long as the
> @@ -195,10 +198,22 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>          })
>      }
>  
> -    unsafe extern "C" fn kill_sb_callback(sb_ptr: *mut bindings::super_block) {
> +    unsafe extern "C" fn kill_sb_callback<T: FileSystem + ?Sized>(
> +        sb_ptr: *mut bindings::super_block,
> +    ) {
>          // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
>          // the appropriate function to call for cleanup.
>          unsafe { bindings::kill_anon_super(sb_ptr) };
> +
> +        // SAFETY: The C API contract guarantees that `sb_ptr` is valid for read.
> +        let ptr = unsafe { (*sb_ptr).s_fs_info };
> +        if !ptr.is_null() {
> +            // SAFETY: The only place where `s_fs_info` is assigned is `NewSuperBlock::init`, where
> +            // it's initialised with the result of an `into_foreign` call. We checked above that
> +            // `ptr` is non-null because it would be null if we never reached the point where we
> +            // init the field.
> +            unsafe { T::Data::from_foreign(ptr) };
> +        }
>      }
>  }
>  
> @@ -429,6 +444,14 @@ pub struct INodeParams {
>  pub struct SuperBlock<T: FileSystem + ?Sized>(Opaque<bindings::super_block>, PhantomData<T>);
>  
>  impl<T: FileSystem + ?Sized> SuperBlock<T> {
> +    /// Returns the data associated with the superblock.
> +    pub fn data(&self) -> <T::Data as ForeignOwnable>::Borrowed<'_> {
> +        // SAFETY: This method is only available after the `NeedsData` typestate, so `s_fs_info`

`NeedsData` typestate no longer exists in your latest patch version.

Cheers,
Ariel
> +        // has been initialised initialised with the result of a call to `T::into_foreign`.
> +        let ptr = unsafe { (*self.0.get()).s_fs_info };
> +        unsafe { T::Data::borrow(ptr) }
> +    }
> +
>      /// Tries to get an existing inode or create a new one if it doesn't exist yet.
>      pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, NewINode<T>>> {
>          // SAFETY: The only initialisation missing from the superblock is the root, and this
> @@ -458,7 +481,7 @@ pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, New
>  /// Required superblock parameters.
>  ///
>  /// This is returned by implementations of [`FileSystem::super_params`].
> -pub struct SuperParams {
> +pub struct SuperParams<T: ForeignOwnable + Send + Sync> {
>      /// The magic number of the superblock.
>      pub magic: u32,
>  
> @@ -472,6 +495,9 @@ pub struct SuperParams {
>  
>      /// Granularity of c/m/atime in ns (cannot be worse than a second).
>      pub time_gran: u32,
> +
> +    /// Data to be associated with the superblock.
> +    pub data: T,
>  }
>  
>  /// A superblock that is still being initialised.
> @@ -522,6 +548,9 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>              sb.0.s_blocksize = 1 << sb.0.s_blocksize_bits;
>              sb.0.s_flags |= bindings::SB_RDONLY;
>  
> +            // N.B.: Even on failure, `kill_sb` is called and frees the data.
> +            sb.0.s_fs_info = params.data.into_foreign().cast_mut();
> +
>              // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
>              // newly-created (and initialised above) superblock.
>              let sb = unsafe { &mut *sb_ptr.cast() };
> @@ -934,8 +963,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>  ///
>  /// struct MyFs;
>  /// impl fs::FileSystem for MyFs {
> +///     type Data = ();
>  ///     const NAME: &'static CStr = c_str!("myfs");
> -///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams> {
> +///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>> {
>  ///         todo!()
>  ///     }
>  ///     fn init_root(_sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
> diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
> index 95ce28efa1c3..093425650f26 100644
> --- a/samples/rust/rust_rofs.rs
> +++ b/samples/rust/rust_rofs.rs
> @@ -52,14 +52,16 @@ struct Entry {
>  
>  struct RoFs;
>  impl fs::FileSystem for RoFs {
> +    type Data = ();
>      const NAME: &'static CStr = c_str!("rust-fs");
>  
> -    fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams> {
> +    fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>> {
>          Ok(SuperParams {
>              magic: 0x52555354,
>              blocksize_bits: 12,
>              maxbytes: fs::MAX_LFS_FILESIZE,
>              time_gran: 1,
> +            data: (),
>          })
>      }
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 14/19] rust: fs: add per-superblock data
  2023-10-18 12:25 ` [RFC PATCH 14/19] rust: fs: add per-superblock data Wedson Almeida Filho
  2023-10-25 15:51   ` Ariel Miculas (amiculas)
@ 2023-10-26 13:46   ` Ariel Miculas (amiculas)
  2024-01-03 14:16   ` Andreas Hindborg (Samsung)
  2 siblings, 0 replies; 125+ messages in thread
From: Ariel Miculas (amiculas) @ 2023-10-26 13:46 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	Wedson Almeida Filho

On 23/10/18 09:25AM, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to associate [typed] data to super blocks when
> they're created. Since we only have a pointer-sized field in which to
> store the state, it must implement the `ForeignOwnable` trait.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>  rust/kernel/fs.rs         | 42 +++++++++++++++++++++++++++++++++------
>  samples/rust/rust_rofs.rs |  4 +++-
>  2 files changed, 39 insertions(+), 7 deletions(-)
> 
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index 5b7eaa16d254..e9a9362d2897 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -7,7 +7,7 @@
>  //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
>  
>  use crate::error::{code::*, from_result, to_result, Error, Result};
> -use crate::types::{ARef, AlwaysRefCounted, Either, Opaque};
> +use crate::types::{ARef, AlwaysRefCounted, Either, ForeignOwnable, Opaque};
>  use crate::{
>      bindings, folio::LockedFolio, init::PinInit, str::CStr, time::Timespec, try_pin_init,
>      ThisModule,
> @@ -20,11 +20,14 @@
>  
>  /// A file system type.
>  pub trait FileSystem {
> +    /// Data associated with each file system instance (super-block).
> +    type Data: ForeignOwnable + Send + Sync;
> +
>      /// The name of the file system type.
>      const NAME: &'static CStr;
>  
>      /// Returns the parameters to initialise a super block.
> -    fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams>;
> +    fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>>;
>  
>      /// Initialises and returns the root inode of the given superblock.
>      ///
> @@ -174,7 +177,7 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>                  fs.owner = module.0;
>                  fs.name = T::NAME.as_char_ptr();
>                  fs.init_fs_context = Some(Self::init_fs_context_callback::<T>);
> -                fs.kill_sb = Some(Self::kill_sb_callback);
> +                fs.kill_sb = Some(Self::kill_sb_callback::<T>);
>                  fs.fs_flags = 0;
>  
>                  // SAFETY: Pointers stored in `fs` are static so will live for as long as the
> @@ -195,10 +198,22 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>          })
>      }
>  
> -    unsafe extern "C" fn kill_sb_callback(sb_ptr: *mut bindings::super_block) {
> +    unsafe extern "C" fn kill_sb_callback<T: FileSystem + ?Sized>(
> +        sb_ptr: *mut bindings::super_block,
> +    ) {
>          // SAFETY: In `get_tree_callback` we always call `get_tree_nodev`, so `kill_anon_super` is
>          // the appropriate function to call for cleanup.
>          unsafe { bindings::kill_anon_super(sb_ptr) };
> +
> +        // SAFETY: The C API contract guarantees that `sb_ptr` is valid for read.
> +        let ptr = unsafe { (*sb_ptr).s_fs_info };
> +        if !ptr.is_null() {
> +            // SAFETY: The only place where `s_fs_info` is assigned is `NewSuperBlock::init`, where
> +            // it's initialised with the result of an `into_foreign` call. We checked above that
> +            // `ptr` is non-null because it would be null if we never reached the point where we
> +            // init the field.
> +            unsafe { T::Data::from_foreign(ptr) };
> +        }

I would also make `s_fs_info` NULL, as a lot of filesystems seem to do
(e.g. erofs). This would avoid any potential double frees and it's a
useful pattern in general (setting pointers to NULL after freeing
memory).
Maybe you could also mention that memory is actually freed at this point
because the newly converted Rust object is immediately dropped.

Cheers,
Ariel

>      }
>  }
>  
> @@ -429,6 +444,14 @@ pub struct INodeParams {
>  pub struct SuperBlock<T: FileSystem + ?Sized>(Opaque<bindings::super_block>, PhantomData<T>);
>  
>  impl<T: FileSystem + ?Sized> SuperBlock<T> {
> +    /// Returns the data associated with the superblock.
> +    pub fn data(&self) -> <T::Data as ForeignOwnable>::Borrowed<'_> {
> +        // SAFETY: This method is only available after the `NeedsData` typestate, so `s_fs_info`
> +        // has been initialised initialised with the result of a call to `T::into_foreign`.
> +        let ptr = unsafe { (*self.0.get()).s_fs_info };
> +        unsafe { T::Data::borrow(ptr) }
> +    }
> +
>      /// Tries to get an existing inode or create a new one if it doesn't exist yet.
>      pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, NewINode<T>>> {
>          // SAFETY: The only initialisation missing from the superblock is the root, and this
> @@ -458,7 +481,7 @@ pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, New
>  /// Required superblock parameters.
>  ///
>  /// This is returned by implementations of [`FileSystem::super_params`].
> -pub struct SuperParams {
> +pub struct SuperParams<T: ForeignOwnable + Send + Sync> {
>      /// The magic number of the superblock.
>      pub magic: u32,
>  
> @@ -472,6 +495,9 @@ pub struct SuperParams {
>  
>      /// Granularity of c/m/atime in ns (cannot be worse than a second).
>      pub time_gran: u32,
> +
> +    /// Data to be associated with the superblock.
> +    pub data: T,
>  }
>  
>  /// A superblock that is still being initialised.
> @@ -522,6 +548,9 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>              sb.0.s_blocksize = 1 << sb.0.s_blocksize_bits;
>              sb.0.s_flags |= bindings::SB_RDONLY;
>  
> +            // N.B.: Even on failure, `kill_sb` is called and frees the data.
> +            sb.0.s_fs_info = params.data.into_foreign().cast_mut();
> +
>              // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
>              // newly-created (and initialised above) superblock.
>              let sb = unsafe { &mut *sb_ptr.cast() };
> @@ -934,8 +963,9 @@ fn init(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>  ///
>  /// struct MyFs;
>  /// impl fs::FileSystem for MyFs {
> +///     type Data = ();
>  ///     const NAME: &'static CStr = c_str!("myfs");
> -///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams> {
> +///     fn super_params(_: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>> {
>  ///         todo!()
>  ///     }
>  ///     fn init_root(_sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>> {
> diff --git a/samples/rust/rust_rofs.rs b/samples/rust/rust_rofs.rs
> index 95ce28efa1c3..093425650f26 100644
> --- a/samples/rust/rust_rofs.rs
> +++ b/samples/rust/rust_rofs.rs
> @@ -52,14 +52,16 @@ struct Entry {
>  
>  struct RoFs;
>  impl fs::FileSystem for RoFs {
> +    type Data = ();
>      const NAME: &'static CStr = c_str!("rust-fs");
>  
> -    fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams> {
> +    fn super_params(_sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>> {
>          Ok(SuperParams {
>              magic: 0x52555354,
>              blocksize_bits: 12,
>              maxbytes: fs::MAX_LFS_FILESIZE,
>              time_gran: 1,
> +            data: (),
>          })
>      }
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 03/19] samples: rust: add initial ro file system sample
  2023-10-18 12:25 ` [RFC PATCH 03/19] samples: rust: add initial ro file system sample Wedson Almeida Filho
@ 2023-10-28 16:18   ` Alice Ryhl
  2024-01-10 18:25     ` Wedson Almeida Filho
  0 siblings, 1 reply; 125+ messages in thread
From: Alice Ryhl @ 2023-10-28 16:18 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho, Alexander Viro,
	Christian Brauner, Matthew Wilcox

On 10/18/23 14:25, Wedson Almeida Filho wrote:> +kernel::module_fs! {
> +    type: RoFs,
> +    name: "rust_rofs",
> +    author: "Rust for Linux Contributors",
> +    description: "Rust read-only file system sample",
> +    license: "GPL",
> +}
> +
> +struct RoFs;
> +impl fs::FileSystem for RoFs {
> +    const NAME: &'static CStr = c_str!("rust-fs");
> +}

Why use two different names here?

Alice

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params`
  2023-10-18 16:34   ` Benno Lossin
@ 2023-10-28 16:39     ` Alice Ryhl
  2023-10-30  8:21       ` Benno Lossin
  0 siblings, 1 reply; 125+ messages in thread
From: Alice Ryhl @ 2023-10-28 16:39 UTC (permalink / raw)
  To: Benno Lossin, Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 10/18/23 18:34, Benno Lossin wrote:>> +        from_result(|| {
>> +            // SAFETY: The C callback API guarantees that `fc_ptr` is valid.
>> +            let fc = unsafe { &mut *fc_ptr };
> 
> This safety comment is not enough, the pointer needs to be unique and
> pointing to a valid value for this to be ok. I would recommend to do
> this instead:
> 
>      unsafe { addr_of_mut!((*fc_ptr).ops).write(&Tables::<T>::CONTEXT) };

It doesn't really need to be unique. Or at least, that wording gives the 
wrong intuition even if it's technically correct when you use the right 
definition of "unique".

To clarify what I mean: Using `ptr::write` on a raw pointer is valid if 
and only if creating a mutable reference and using that to write is 
valid. (Assuming the type has no destructor.)

Of course, in this case you *also* have the difference of whether you 
create a mutable to the entire struct or just the field.
>> +                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
>> +                // safe to mutably dereference it.
>> +                let inode = unsafe { &mut *inode };
> 
> The inode also needs to be initialized and have valid values as its fields.
> Not sure if this is kept and it would probably be better to keep using raw
> pointers here.

My understanding is that this is just a safety invariant, and not a 
validity invariant, so as long as the uninitialized memory is not read, 
it's fine.

See e.g.:
https://github.com/rust-lang/unsafe-code-guidelines/issues/346

Alice

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 05/19] rust: fs: introduce `INode<T>`
  2023-10-18 12:25 ` [RFC PATCH 05/19] rust: fs: introduce `INode<T>` Wedson Almeida Filho
@ 2023-10-28 18:00   ` Alice Ryhl
  2024-01-03 12:45     ` Andreas Hindborg (Samsung)
  2024-01-03 12:54   ` Andreas Hindborg (Samsung)
  2024-01-04  5:14   ` Darrick J. Wong
  2 siblings, 1 reply; 125+ messages in thread
From: Alice Ryhl @ 2023-10-28 18:00 UTC (permalink / raw)
  To: Wedson Almeida Filho, Alexander Viro, Christian Brauner,
	Matthew Wilcox
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 10/18/23 14:25, Wedson Almeida Filho wrote:
> +    /// Returns the super-block that owns the inode.
> +    pub fn super_block(&self) -> &SuperBlock<T> {
> +        // SAFETY: `i_sb` is immutable, and `self` is guaranteed to be valid by the existence of a
> +        // shared reference (&self) to it.
> +        unsafe { &*(*self.0.get()).i_sb.cast() }
> +    }

This makes me a bit nervous. I had to look up whether this field was a 
pointer to a superblock, or just a superblock embedded directly in 
`struct inode`. It does look like it's correct as-is, but I'd feel more 
confident about it if it doesn't use a cast to completely ignore the 
type going in to the pointer cast.

Could you define a `from_raw` on `SuperBlock` and change this to:

     unsafe { &*SuperBlock::from_raw((*self.0.get()).i_sb) }

or perhaps add a type annotation like this:

     let i_sb: *mut super_block = unsafe { (*self.0.get()).i_sb };
     i_sb.cast()

Alice

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
                   ` (19 preceding siblings ...)
  2023-10-18 13:40 ` [RFC PATCH 00/19] Rust abstractions for VFS Ariel Miculas (amiculas)
@ 2023-10-29 20:31 ` Matthew Wilcox
  2023-10-31 20:14   ` Wedson Almeida Filho
  20 siblings, 1 reply; 125+ messages in thread
From: Matthew Wilcox @ 2023-10-29 20:31 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Oct 18, 2023 at 09:24:59AM -0300, Wedson Almeida Filho wrote:
> Rust file system modules can be declared with the `module_fs` macro and are
> required to implement the following functions (which are part of the
> `FileSystem` trait):
> 
> impl FileSystem for MyFS {
>     fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>>;
>     fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>>;
>     fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result;
>     fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>>;
>     fn read_folio(inode: &INode<Self>, folio: LockedFolio<'_>) -> Result;
> }

Does it make sense to describe filesystem methods like this?  As I
understand (eg) how inodes are laid out, we typically malloc a

foofs_inode {
	x; y; z;
	struct inode vfs_inode;
};

and then the first line of many functions that take an inode is:

	struct ext2_inode_info *ei = EXT2_I(dir);

That feels like unnecessary boilerplate, and might lead to questions like
"What if I'm passed an inode that isn't an ext2 inode".  Do we want to
always pass in the foofs_inode instead of the inode?

Also, I see you're passing an inode to read_dir.  Why did you decide to
do that?  There's information in the struct file that's either necessary
or useful to have in the filesystem.  Maybe not in toy filesystems, but eg
network filesystems need user credentials to do readdir, which are stored
in struct file.  Block filesystems store readahead data in struct file.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-23 12:55                 ` Wedson Almeida Filho
@ 2023-10-30  2:29                   ` Dave Chinner
  2023-10-31 20:49                     ` Wedson Almeida Filho
  0 siblings, 1 reply; 125+ messages in thread
From: Dave Chinner @ 2023-10-30  2:29 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Boqun Feng, Matthew Wilcox, Benno Lossin, Alexander Viro,
	Christian Brauner, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho, Marco Elver

On Mon, Oct 23, 2023 at 09:55:08AM -0300, Wedson Almeida Filho wrote:
> On Mon, 23 Oct 2023 at 02:29, Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Sat, Oct 21, 2023 at 12:33:57PM -0700, Boqun Feng wrote:
> > > On Sat, Oct 21, 2023 at 06:01:02PM +0100, Matthew Wilcox wrote:
> > > > I'm only an expert on the page cache, not the rest of the VFS.  So
> > > > what are the rules around modifying i_state for the VFS?
> > >
> > > Agreed, same question here.
> >
> > inode->i_state should only be modified under inode->i_lock.
> >
> > And in most situations, you have to hold the inode->i_lock to read
> > state flags as well so that reads are serialised against
> > modifications which are typically non-atomic RMW operations.
> >
> > There is, I think, one main exception to read side locking and this
> > is find_inode_rcu() which does an unlocked check for I_WILL_FREE |
> > I_FREEING. In this case, the inode->i_state updates in iput_final()
> > use WRITE_ONCE under the inode->i_lock to provide the necessary
> > semantics for the unlocked READ_ONCE() done under rcu_read_lock().
> >
> > IOWs, if you follow the general rule that any inode->i_state access
> > (read or write) needs to hold inode->i_lock, you probably won't
> > screw up.
> 
> I don't see filesystems doing this though. In particular, see
> iget_locked() -- if a new inode is returned, then it is locked, but if
> a cached one is found, it's not locked.

I did say "if you follow the general rule".

And where there is a "general rule" there is the implication that
there are special cases where the "general rule" doesn't get
applied, yes? :)

I_NEW is the exception to the general rule, and very few people
writing filesystems actually know about it let alone care about
it...

> So we're in this situation where a returned inode may or may not be
> locked. And the way to determine if it's locked or not is to read
> i_state.
> 
> Here are examples of kernfs, ext2, ext4 and squashfs doing it:
> https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/kernfs/inode.c#L252
> https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/ext2/inode.c#L1392
> https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/ext4/inode.c#L4707
> https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/squashfs/inode.c#L82
> 
> They all call iget_locked(), and if I_NEW is set, they initialise the
> inode and unlock it with unlock_new_inode(); otherwise they just
> return the unlocked inode.

All of them are perfectly fine.

I_NEW is the bit we use to synchronise inode initialisation - we
have to ensure there is only a single initialisation running while
there are concurrent lookups that can find the inode whilst it is
being initialised. We cannot hold a spin lock over inode
initialisation (it may have to do IO!), so we set the I_NEW flag
under the i_lock and the inode_hash_lock during hash insertion so
that they are set atomically from the hash lookup POV. If the inode
is then found in cache, wait_on_inode() does the serialisation
against the running initialisation indicated by the __I_NEW bit in
the i_state word.

Hence if the caller of iget_locked() ever sees I_NEW, it is
guaranteed to have exclusive access to the inode and -must- first
initialise the inode and then call unlock_new_inode() when it has
completed. It doesn't need to hold inode->i_lock in this case
because there's nothing it needs to serialise against as
iget_locked() has already done all that work.

If the inode is found in cache by iget_locked, then the
wait_on_inode() call is guaranteed to ensure that I_NEW is not set
when it returns. The atomic bit operations on __I_NEW and the memory
barriers in unlock_new_inode() plays an important part in this
dance, and they guarantee that I_NEW has been cleared before
iget_locked() returns. No need for inode->i_lock to be held in this
case, either, because iget_locked() did all the serialisation for
us.

This special dance is an optimisation that avoids the need to take
inode->i_lock in the inode lookup fast path just to check I_NEW. It
is an exception to the general rule but internal it uses
inode->i_lock in the places it is needed to ensure anything using
the general rule about accessing i_state still behaves correctly.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params`
  2023-10-28 16:39     ` Alice Ryhl
@ 2023-10-30  8:21       ` Benno Lossin
  2023-10-30 21:36         ` Alice Ryhl
  0 siblings, 1 reply; 125+ messages in thread
From: Benno Lossin @ 2023-10-30  8:21 UTC (permalink / raw)
  To: Alice Ryhl, Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 28.10.23 18:39, Alice Ryhl wrote:
> On 10/18/23 18:34, Benno Lossin wrote:>> +        from_result(|| {
>>> +            // SAFETY: The C callback API guarantees that `fc_ptr` is valid.
>>> +            let fc = unsafe { &mut *fc_ptr };
>>
>> This safety comment is not enough, the pointer needs to be unique and
>> pointing to a valid value for this to be ok. I would recommend to do
>> this instead:
>>
>>       unsafe { addr_of_mut!((*fc_ptr).ops).write(&Tables::<T>::CONTEXT) };
> 
> It doesn't really need to be unique. Or at least, that wording gives the
> wrong intuition even if it's technically correct when you use the right
> definition of "unique".
> 
> To clarify what I mean: Using `ptr::write` on a raw pointer is valid if
> and only if creating a mutable reference and using that to write is
> valid. (Assuming the type has no destructor.)

I tried looking in the nomicon and UCG, but was not able to find this
statement, where is it from?

> Of course, in this case you *also* have the difference of whether you
> create a mutable to the entire struct or just the field.
>>> +                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
>>> +                // safe to mutably dereference it.
>>> +                let inode = unsafe { &mut *inode };
>>
>> The inode also needs to be initialized and have valid values as its fields.
>> Not sure if this is kept and it would probably be better to keep using raw
>> pointers here.
> 
> My understanding is that this is just a safety invariant, and not a
> validity invariant, so as long as the uninitialized memory is not read,
> it's fine.
> 
> See e.g.:
> https://github.com/rust-lang/unsafe-code-guidelines/issues/346

I'm not so sure that that discussion is finished and agreed upon. The
nomicon still writes "It is illegal to construct a reference to
uninitialized data" [1].

Using this pattern (&mut uninit to initialize data) is also dangerous
if the underlying type has drop impls, since then by doing
`foo.bar = baz;` you drop the old uninitialized value. Sure in
our bindings there are no types that implement drop (AFAIK) so
it is less of an issue.

If we decide to do this, we should have a comment that explains that
this reference might point to uninitialized memory. Since otherwise
it might be easy to give the reference to another safe function that
then e.g. reads a bool.

[1]: https://doc.rust-lang.org/nomicon/unchecked-uninit.html

-- 
Cheers,
Benno



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params`
  2023-10-30  8:21       ` Benno Lossin
@ 2023-10-30 21:36         ` Alice Ryhl
  0 siblings, 0 replies; 125+ messages in thread
From: Alice Ryhl @ 2023-10-30 21:36 UTC (permalink / raw)
  To: Benno Lossin, Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 10/30/23 09:21, Benno Lossin wrote:
> On 28.10.23 18:39, Alice Ryhl wrote:
>> On 10/18/23 18:34, Benno Lossin wrote:>> +        from_result(|| {
>>>> +            // SAFETY: The C callback API guarantees that `fc_ptr` is valid.
>>>> +            let fc = unsafe { &mut *fc_ptr };
>>>
>>> This safety comment is not enough, the pointer needs to be unique and
>>> pointing to a valid value for this to be ok. I would recommend to do
>>> this instead:
>>>
>>>        unsafe { addr_of_mut!((*fc_ptr).ops).write(&Tables::<T>::CONTEXT) };
>>
>> It doesn't really need to be unique. Or at least, that wording gives the
>> wrong intuition even if it's technically correct when you use the right
>> definition of "unique".
>>
>> To clarify what I mean: Using `ptr::write` on a raw pointer is valid if
>> and only if creating a mutable reference and using that to write is
>> valid. (Assuming the type has no destructor.)
> 
> I tried looking in the nomicon and UCG, but was not able to find this
> statement, where is it from?

Not sure where I got it from originally, but it follows from the tree 
borrows reference:

First, if the type is !Unpin, then the mutable reference gets the same 
tag as the original pointer, so there's trivially no difference.

The more interesting case is for Unpin types. Here, the creation of the 
mutable reference corresponds to a read, and then there's the write of 
the mutable reference itself. The write of the mutable reference itself 
is equivalent to the `ptr::write` operation, since exactly the same tags 
are considered to be affected by child writes and foreign writes. Next, 
it must be shown that [read, write] is equivalent to just a write, which 
can be shown by analyzing the tree borrows rules case-by-case.

You can find a nice summary of tree borrows at the last page of:
https://github.com/Vanille-N/tree-borrows/blob/master/full/main.pdf

I'm pretty sure the same analysis works with stacked borrows.

>> Of course, in this case you *also* have the difference of whether you
>> create a mutable to the entire struct or just the field.
>>>> +                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
>>>> +                // safe to mutably dereference it.
>>>> +                let inode = unsafe { &mut *inode };
>>>
>>> The inode also needs to be initialized and have valid values as its fields.
>>> Not sure if this is kept and it would probably be better to keep using raw
>>> pointers here.
>>
>> My understanding is that this is just a safety invariant, and not a
>> validity invariant, so as long as the uninitialized memory is not read,
>> it's fine.
>>
>> See e.g.:
>> https://github.com/rust-lang/unsafe-code-guidelines/issues/346
> 
> I'm not so sure that that discussion is finished and agreed upon. The
> nomicon still writes "It is illegal to construct a reference to
> uninitialized data" [1].
> 
> Using this pattern (&mut uninit to initialize data) is also dangerous
> if the underlying type has drop impls, since then by doing
> `foo.bar = baz;` you drop the old uninitialized value. Sure in
> our bindings there are no types that implement drop (AFAIK) so
> it is less of an issue.
> 
> If we decide to do this, we should have a comment that explains that
> this reference might point to uninitialized memory. Since otherwise
> it might be easy to give the reference to another safe function that
> then e.g. reads a bool.
> 
> [1]: https://doc.rust-lang.org/nomicon/unchecked-uninit.html

That's fair. I agree that we should explicitly decide whether or not to 
allow this kind of thing.

Alice


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2023-10-29 20:31 ` Matthew Wilcox
@ 2023-10-31 20:14   ` Wedson Almeida Filho
  2024-01-03 18:02     ` Matthew Wilcox
  0 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-31 20:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Sun, 29 Oct 2023 at 17:32, Matthew Wilcox <willy@infradead.org> wrote:
> > impl FileSystem for MyFS {
> >     fn super_params(sb: &NewSuperBlock<Self>) -> Result<SuperParams<Self::Data>>;
> >     fn init_root(sb: &SuperBlock<Self>) -> Result<ARef<INode<Self>>>;
> >     fn read_dir(inode: &INode<Self>, emitter: &mut DirEmitter) -> Result;
> >     fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>>;
> >     fn read_folio(inode: &INode<Self>, folio: LockedFolio<'_>) -> Result;
> > }
>
> Does it make sense to describe filesystem methods like this?  As I
> understand (eg) how inodes are laid out, we typically malloc a
>
> foofs_inode {
>         x; y; z;
>         struct inode vfs_inode;
> };
>
> and then the first line of many functions that take an inode is:
>
>         struct ext2_inode_info *ei = EXT2_I(dir);
>
> That feels like unnecessary boilerplate, and might lead to questions like
> "What if I'm passed an inode that isn't an ext2 inode".  Do we want to
> always pass in the foofs_inode instead of the inode?

We're well aligned here. :)

Note that the type is `&INode<Self>` -- `Self` is an alias for the
type implementing this filesystem. For example, in tarfs, the type is
really `&INode<TarFs>`, so it is what you're asking for: the TarFs
filesystem only sees TarFs inodes and superblocks (through the
FileSystem trait, maybe they have to deal with other inodes for other
reasons).

In fact, when you have inode of type `INode<TarFs>`, and you have a call like:

let d = inode.data();

What you get back has the type declared in `TarFs::INodeData`.
Similarly, if you do:

let d = inode.super_block().data();

What you get back has the type declared in `TarFs::Data`.

So all `container_of` calls are hidden away, and we store super-block
data in `s_fs_info` and inode data by having a new struct that
contains the data the fs wants plus a struct inode (this is done with
generics, it's called `INodeWithData`). This is required for type
safety: you always get the right type. If someone changes the type in
one place but forgets to change it in another place, they'll get a
compilation error.

> Also, I see you're passing an inode to read_dir.  Why did you decide to
> do that?  There's information in the struct file that's either necessary
> or useful to have in the filesystem.  Maybe not in toy filesystems, but eg
> network filesystems need user credentials to do readdir, which are stored
> in struct file.  Block filesystems store readahead data in struct file.

Because the two file systems we have don't use anything from `struct
file` beyond the inode.

Passing a `file` to `read_dir` would require us to introduce an
unnecessary abstraction that no one uses, which we've been told not to
do.

There is no technical reason that makes it impractical though. We can
add it when the need arises.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-30  2:29                   ` Dave Chinner
@ 2023-10-31 20:49                     ` Wedson Almeida Filho
  2023-11-08  4:54                       ` Dave Chinner
  0 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-10-31 20:49 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Boqun Feng, Matthew Wilcox, Benno Lossin, Alexander Viro,
	Christian Brauner, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho, Marco Elver

On Sun, 29 Oct 2023 at 23:29, Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Oct 23, 2023 at 09:55:08AM -0300, Wedson Almeida Filho wrote:
> > On Mon, 23 Oct 2023 at 02:29, Dave Chinner <david@fromorbit.com> wrote:
> > > IOWs, if you follow the general rule that any inode->i_state access
> > > (read or write) needs to hold inode->i_lock, you probably won't
> > > screw up.
> >
> > I don't see filesystems doing this though. In particular, see
> > iget_locked() -- if a new inode is returned, then it is locked, but if
> > a cached one is found, it's not locked.
>
> I did say "if you follow the general rule".
>
> And where there is a "general rule" there is the implication that
> there are special cases where the "general rule" doesn't get
> applied, yes? :)

Sure. But when you say "if _you_ do X", it gives me the impression
that I have a choice. But if want to use `iget_locked`, I don't have
the option to follow the "general rule" you state.

I guess I have the option to ignore `iget_locked`. :)

> I_NEW is the exception to the general rule, and very few people
> writing filesystems actually know about it let alone care about
> it...
<snip>
> All of them are perfectly fine.

I'm not sure I agree with this. They may be fine, but I wouldn't say
perfectly. :)

> I_NEW is the bit we use to synchronise inode initialisation - we
> have to ensure there is only a single initialisation running while
> there are concurrent lookups that can find the inode whilst it is
> being initialised. We cannot hold a spin lock over inode
> initialisation (it may have to do IO!), so we set the I_NEW flag
> under the i_lock and the inode_hash_lock during hash insertion so
> that they are set atomically from the hash lookup POV. If the inode
> is then found in cache, wait_on_inode() does the serialisation
> against the running initialisation indicated by the __I_NEW bit in
> the i_state word.
>
> Hence if the caller of iget_locked() ever sees I_NEW, it is
> guaranteed to have exclusive access to the inode and -must- first
> initialise the inode and then call unlock_new_inode() when it has
> completed. It doesn't need to hold inode->i_lock in this case
> because there's nothing it needs to serialise against as
> iget_locked() has already done all that work.
>
> If the inode is found in cache by iget_locked, then the
> wait_on_inode() call is guaranteed to ensure that I_NEW is not set
> when it returns. The atomic bit operations on __I_NEW and the memory
> barriers in unlock_new_inode() plays an important part in this
> dance, and they guarantee that I_NEW has been cleared before
> iget_locked() returns. No need for inode->i_lock to be held in this
> case, either, because iget_locked() did all the serialisation for
> us.

Thanks for explanation!

Let's consider the case when I call `inode_get`, and it finds an inode
that _has_ been fully initialised before, so I_NEW is not set in
inode->i_state and the inode is _not_ locked.

But the only means of checking that is by inspecting the i_state
field, so I do something like:

if (!(inode->i_state & I_NEW))
    return inode;

But now suppose that while I'm doing a naked load on inode->i_state,
another cpu is running concurrently and happens to be holding the
inode->i_lock, so it is within its right to write to inode->i_state,
for example through a call to __inode_add_lru, which has the
following:

inode->i_state |= I_REFERENCED;

So we have a thread doing a naked read and another thread doing a
naked write, no ordering between them.

Would you agree that this is a data race? (Note that I'm not asking if
"it will be ok" or "the compilers today generate the right code", I'm
asking merely if you agree this is a data race.)

If you do, then you'd have to agree that we are in undefined-behaviour
territory. I can quote the spec if you'd like.

Anyway, the discussion here is that this is also undefined behaviour
in Rust. And we're trying really hard to avoid that. Of course, in
cases like this there's not much we can do on the Rust side alone so
the conclusion now appears to be that we'll introduce helper functions
for this now and live with it. If one day we have a better solution,
we'll update just one place.

But we want the be very deliberate about these. We don't want to
accidentally introduce data races (and therefore potential undefined
behaviour).

Cheers,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 10/19] rust: fs: introduce `FileSystem::read_folio`
  2023-10-18 12:25 ` [RFC PATCH 10/19] rust: fs: introduce `FileSystem::read_folio` Wedson Almeida Filho
@ 2023-11-07 22:18   ` Matthew Wilcox
  2023-11-07 22:22     ` Al Viro
  0 siblings, 1 reply; 125+ messages in thread
From: Matthew Wilcox @ 2023-11-07 22:18 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Oct 18, 2023 at 09:25:09AM -0300, Wedson Almeida Filho wrote:
> @@ -36,6 +39,9 @@ pub trait FileSystem {
>  
>      /// Returns the inode corresponding to the directory entry with the given name.
>      fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>>;
> +
> +    /// Reads the contents of the inode into the given folio.
> +    fn read_folio(inode: &INode<Self>, folio: LockedFolio<'_>) -> Result;
>  }
>  

This really shouldn't be a per-filesystem operation.  We have operations
split up into mapping_ops, inode_ops and file_ops for a reason.  In this
case, read_folio() can have a very different implementation for, eg,
symlinks, directories and files.  So we want to have different aops
for each of symlinks, directories and files.  We should maintain that
separation for filesystems written in Rust too.  Unless there's a good
reason to change it, and then we should change it in C too.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 10/19] rust: fs: introduce `FileSystem::read_folio`
  2023-11-07 22:18   ` Matthew Wilcox
@ 2023-11-07 22:22     ` Al Viro
  2023-11-08  0:35       ` Wedson Almeida Filho
  0 siblings, 1 reply; 125+ messages in thread
From: Al Viro @ 2023-11-07 22:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Wedson Almeida Filho, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Tue, Nov 07, 2023 at 10:18:05PM +0000, Matthew Wilcox wrote:
> On Wed, Oct 18, 2023 at 09:25:09AM -0300, Wedson Almeida Filho wrote:
> > @@ -36,6 +39,9 @@ pub trait FileSystem {
> >  
> >      /// Returns the inode corresponding to the directory entry with the given name.
> >      fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>>;
> > +
> > +    /// Reads the contents of the inode into the given folio.
> > +    fn read_folio(inode: &INode<Self>, folio: LockedFolio<'_>) -> Result;
> >  }
> >  
> 
> This really shouldn't be a per-filesystem operation.  We have operations
> split up into mapping_ops, inode_ops and file_ops for a reason.  In this
> case, read_folio() can have a very different implementation for, eg,
> symlinks, directories and files.  So we want to have different aops
> for each of symlinks, directories and files.  We should maintain that
> separation for filesystems written in Rust too.  Unless there's a good
> reason to change it, and then we should change it in C too.

While we are at it, lookup is also very much not a per-filesystem operation.
Take a look at e.g. procfs, for an obvious example...

Wait a minute... what in name of everything unholy is that thing doing tied
to inodes in the first place?

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 10/19] rust: fs: introduce `FileSystem::read_folio`
  2023-11-07 22:22     ` Al Viro
@ 2023-11-08  0:35       ` Wedson Almeida Filho
  2023-11-08  0:56         ` Al Viro
  0 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-11-08  0:35 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthew Wilcox, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Tue, 7 Nov 2023 at 19:22, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Nov 07, 2023 at 10:18:05PM +0000, Matthew Wilcox wrote:
> > On Wed, Oct 18, 2023 at 09:25:09AM -0300, Wedson Almeida Filho wrote:
> > > @@ -36,6 +39,9 @@ pub trait FileSystem {
> > >
> > >      /// Returns the inode corresponding to the directory entry with the given name.
> > >      fn lookup(parent: &INode<Self>, name: &[u8]) -> Result<ARef<INode<Self>>>;
> > > +
> > > +    /// Reads the contents of the inode into the given folio.
> > > +    fn read_folio(inode: &INode<Self>, folio: LockedFolio<'_>) -> Result;
> > >  }
> > >
> >
> > This really shouldn't be a per-filesystem operation.  We have operations
> > split up into mapping_ops, inode_ops and file_ops for a reason.  In this
> > case, read_folio() can have a very different implementation for, eg,
> > symlinks, directories and files.  So we want to have different aops
> > for each of symlinks, directories and files.  We should maintain that
> > separation for filesystems written in Rust too.  Unless there's a good
> > reason to change it, and then we should change it in C too.

read_folio() is only called for regular files and symlinks. All other
modes (directories, pipes, sockets, char devices, block devices) have
their own read callbacks that don't involve read_folio().

For the filesystems that we have in Rust today, reading the contents
of a symlink is the same as reading a file (i.e., the name of the link
target is stored the same way as data in a file). For cases when this
is different, read_folio() can of course just check the mode of the
inode and take the appropriate path.

This is also what a bunch of C file systems do. But you folks are the
ones with most experience in file systems, if you think this isn't a
good idea, we could use read_folio() only for regular files and
introduce a function for reading symblinks, say read_symlink().

> While we are at it, lookup is also very much not a per-filesystem operation.
> Take a look at e.g. procfs, for an obvious example...

The C api offers the greatest freedom: one could write a file system
where each file has its own set of mapping_ops, inode_ops and
file_ops; and while we could choose to replicate this freedom in Rust
but we haven't.

Mostly because we don't need it, and we've been repeatedly told (by
Greg KH and others) not to introduce abstractions/bindings for
anything for which there isn't a user. Besides being a longstanding
rule in the kernel, they also say that they can't reasonably decide if
the interfaces are good if they can't see the users.

The existing Rust users (tarfs and puzzlefs) only need a single
lookup. And a quick grep (git grep \\\.lookup\\\> -- fs/) appears to
show that the vast majority of C filesystems only have a single lookup
as well. So we choose simplicity, knowing well that we may have to
revisit it in the future if the needs change.

> Wait a minute... what in name of everything unholy is that thing doing tied
> to inodes in the first place?

For the same reason as above, we don't need it in our current
filesystems. A bunch of C ones (e.g., xfs, ext2, romfs, erofs) only
use the dentry to get the name and later call d_splice_alias(), so we
hide the name extraction and call to d_splice_alias() in the
"trampoline" function.

BTW, thank you Matthew and Al, I very much appreciate that you take
the time to look into and raise concerns.

Cheers,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 10/19] rust: fs: introduce `FileSystem::read_folio`
  2023-11-08  0:35       ` Wedson Almeida Filho
@ 2023-11-08  0:56         ` Al Viro
  2023-11-08  2:39           ` Wedson Almeida Filho
  0 siblings, 1 reply; 125+ messages in thread
From: Al Viro @ 2023-11-08  0:56 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Matthew Wilcox, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Tue, Nov 07, 2023 at 09:35:44PM -0300, Wedson Almeida Filho wrote:

> > While we are at it, lookup is also very much not a per-filesystem operation.
> > Take a look at e.g. procfs, for an obvious example...
> 
> The C api offers the greatest freedom: one could write a file system
> where each file has its own set of mapping_ops, inode_ops and
> file_ops; and while we could choose to replicate this freedom in Rust
> but we haven't.

Too bad.
 
> Mostly because we don't need it, and we've been repeatedly told (by
> Greg KH and others) not to introduce abstractions/bindings for
> anything for which there isn't a user. Besides being a longstanding
> rule in the kernel, they also say that they can't reasonably decide if
> the interfaces are good if they can't see the users.

The interfaces are *already* there.  If it's going to be a separate
set of operations for Rust and for the rest of the filesystems, we
have a major problem right there.

> The existing Rust users (tarfs and puzzlefs) only need a single
> lookup. And a quick grep (git grep \\\.lookup\\\> -- fs/) appears to
> show that the vast majority of C filesystems only have a single lookup
> as well. So we choose simplicity, knowing well that we may have to
> revisit it in the future if the needs change.
> 
> > Wait a minute... what in name of everything unholy is that thing doing tied
> > to inodes in the first place?
> 
> For the same reason as above, we don't need it in our current
> filesystems. A bunch of C ones (e.g., xfs, ext2, romfs, erofs) only
> use the dentry to get the name and later call d_splice_alias(), so we
> hide the name extraction and call to d_splice_alias() in the
> "trampoline" function.

What controls the lifecycle of that stuff from the Rust point of view?

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 10/19] rust: fs: introduce `FileSystem::read_folio`
  2023-11-08  0:56         ` Al Viro
@ 2023-11-08  2:39           ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-11-08  2:39 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthew Wilcox, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Tue, 7 Nov 2023 at 21:56, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Nov 07, 2023 at 09:35:44PM -0300, Wedson Almeida Filho wrote:
>
> > > While we are at it, lookup is also very much not a per-filesystem operation.
> > > Take a look at e.g. procfs, for an obvious example...
> >
> > The C api offers the greatest freedom: one could write a file system
> > where each file has its own set of mapping_ops, inode_ops and
> > file_ops; and while we could choose to replicate this freedom in Rust
> > but we haven't.
>
> Too bad.
>
> > Mostly because we don't need it, and we've been repeatedly told (by
> > Greg KH and others) not to introduce abstractions/bindings for
> > anything for which there isn't a user. Besides being a longstanding
> > rule in the kernel, they also say that they can't reasonably decide if
> > the interfaces are good if they can't see the users.
>
> The interfaces are *already* there.  If it's going to be a separate
> set of operations for Rust and for the rest of the filesystems, we
> have a major problem right there.

The interfaces will be different but compatible -- they boil down to
calls to/from C and follow all rules and requirements imposed by C.

We just use Rust's type system to encode more of the rules into the
interfaces so that the compiler will catch more bugs for us at compile
time (and avoid memory safety issues if developers stay away from
unsafe blocks). For example, if you want to attach non-zero-sized data
to inodes of a given filesystem, in Rust we have a generic type:

struct INodeWithData<T> {
    data: MaybeUninit<T>,
    inode: bindings::inode,
}

And we automatically implement alloc_inode() and destroy_inode() in
super_operations. And all inodes in callbacks are typed so that
developers never need to call container_of directly themselves. The
compiler will catch, at compile time, any type mismatches without
runtime cost.

Another example: instead of implementing functions then declaring
structs containing pointers to these functions (and potentially other
fields), in Rust we expose "traits" that developers need to implement.
Then we can control which functions are required/optional and allow
developers to logically group them, as well as declare constants and
related types (e.g., the additional struct, if any, to be allocated
along with an inode in INodeWithData above). But in the end, these get
translated (at compile time for const ops) into
file_operations/address_space_operations/inode_operations.

> > The existing Rust users (tarfs and puzzlefs) only need a single
> > lookup. And a quick grep (git grep \\\.lookup\\\> -- fs/) appears to
> > show that the vast majority of C filesystems only have a single lookup
> > as well. So we choose simplicity, knowing well that we may have to
> > revisit it in the future if the needs change.
> >
> > > Wait a minute... what in name of everything unholy is that thing doing tied
> > > to inodes in the first place?
> >
> > For the same reason as above, we don't need it in our current
> > filesystems. A bunch of C ones (e.g., xfs, ext2, romfs, erofs) only
> > use the dentry to get the name and later call d_splice_alias(), so we
> > hide the name extraction and call to d_splice_alias() in the
> > "trampoline" function.
>
> What controls the lifecycle of that stuff from the Rust point of view?

The same rules as C. inodes, for example, are ref-counted so while a
callback that has an inode as argument is inflight, we know it (the
inode) is referenced and we can just use it. If/when Rust code wants
to hold on to a pointer to it beyond a callback, it needs to increment
the refcount and release it when it's done. Here the type system also
helps us: it guarantees that pointers to ref-counted objects are never
dangling, if we ever try to hold on to a pointer without incrementing
the refcount, we get a compile-time error (no additional runtime
cost).

We also have a common interface for _all_ C ref-counted objects, so
instead of having to memorise that I should call ihold/iget for
inodes, folio_get/folio_put for folios,
get_task_struct/put_task_struct for tasks, etc., in Rust they're
simply ARef<INode>, ARef<Folio>, ARef<Task> with automatic increment
via clone() and decrement on destruction.

There are a couple of issues that I alluded to in the cover letter but
never actually wrote down, so I will describe them here to get your
thoughts:

First issue:
VFS conflates filesystem unregistration with module unload. The
description of unregister_filesystem() states:

 * Once this function has returned the &struct file_system_type structure
 * may be freed or reused.

When a filesystem is mounted, VFS calls get_filesystem() to presumably
prevent the filesystem from unregistering, and it calls
put_filesystem() when deactivating a super-block.

But get/put_filesystem() are implemented as module_get/put(). So this
works well if unregister_filesystem is only ever called when modules
are unloaded. It doesn't seem to help if it's called anywhere else
(e.g., on failure paths of module load).

Here's an example: init_f2fs_fs() calls init_inodecache() to allocate
an inode cache, then eventually calls register_filesystem(). Let's
suppose at this point, another CPU actually mounts an instance of an
f2fs fs. After register_filesystem() in init_f2fs_fs(), there is a
bunch of extra failure paths; let's suppose
f2fs_init_compress_mempool() fails. The exit path will call
unregister_filesystem() (which prevents new superblocks from being
created, but the existing superblocks continue to exist), then
eventually destroy_inodecache(), which frees the cache from which all
inodes of the existing superblock have been allocated and we have a
bunch of potential user-after-frees.

Granted that the module will not be unloaded immediately (it will wait
for all references, including the ones by get_filesystem() to go
away), so we won't have an issue with the callbacks being called to
unloaded memory. But if we recycle f2fs_fs_type, which
unregister_filesystem claims to be safe, we'll also have
user-after-frees there.

(Note that the example above doesn't require unload at all.)

I think we could fix this by having a different implementation of
get/put_filesystem() that keeps track of a count for filesystem usage
(in addition to avoiding module unload), and only completing
unregister_filesystem when it goes to zero. Would you be interested in
a patch for this?

Second issue:

Leaked inodes: if a filesystem leaks inodes, then after unregistration
most implementations will just free the kmemcache from which they
came, so future attempts to use these leaked inodes (it's possible
they've been stored in some list somewhere) will lead to
user-after-frees. Is there anything we can do improve this? Should we
prevent unregister_filesystem() from completing in such cases?

Thanks,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-31 20:49                     ` Wedson Almeida Filho
@ 2023-11-08  4:54                       ` Dave Chinner
  2023-11-08  6:15                         ` Wedson Almeida Filho
  0 siblings, 1 reply; 125+ messages in thread
From: Dave Chinner @ 2023-11-08  4:54 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Boqun Feng, Matthew Wilcox, Benno Lossin, Alexander Viro,
	Christian Brauner, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho, Marco Elver

On Tue, Oct 31, 2023 at 05:49:19PM -0300, Wedson Almeida Filho wrote:
> On Sun, 29 Oct 2023 at 23:29, Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Mon, Oct 23, 2023 at 09:55:08AM -0300, Wedson Almeida Filho wrote:
> > > On Mon, 23 Oct 2023 at 02:29, Dave Chinner <david@fromorbit.com> wrote:
> > > > IOWs, if you follow the general rule that any inode->i_state access
> > > > (read or write) needs to hold inode->i_lock, you probably won't
> > > > screw up.
> > >
> > > I don't see filesystems doing this though. In particular, see
> > > iget_locked() -- if a new inode is returned, then it is locked, but if
> > > a cached one is found, it's not locked.
> >
> > I did say "if you follow the general rule".
> >
> > And where there is a "general rule" there is the implication that
> > there are special cases where the "general rule" doesn't get
> > applied, yes? :)
> 
> Sure. But when you say "if _you_ do X", it gives me the impression
> that I have a choice. But if want to use `iget_locked`, I don't have
> the option to follow the "general rule" you state.
> 
> I guess I have the option to ignore `iget_locked`. :)
> 
> > I_NEW is the exception to the general rule, and very few people
> > writing filesystems actually know about it let alone care about
> > it...
> <snip>
> > All of them are perfectly fine.
> 
> I'm not sure I agree with this. They may be fine, but I wouldn't say
> perfectly. :)
> 
> > I_NEW is the bit we use to synchronise inode initialisation - we
> > have to ensure there is only a single initialisation running while
> > there are concurrent lookups that can find the inode whilst it is
> > being initialised. We cannot hold a spin lock over inode
> > initialisation (it may have to do IO!), so we set the I_NEW flag
> > under the i_lock and the inode_hash_lock during hash insertion so
> > that they are set atomically from the hash lookup POV. If the inode
> > is then found in cache, wait_on_inode() does the serialisation
> > against the running initialisation indicated by the __I_NEW bit in
> > the i_state word.
> >
> > Hence if the caller of iget_locked() ever sees I_NEW, it is
> > guaranteed to have exclusive access to the inode and -must- first
> > initialise the inode and then call unlock_new_inode() when it has
> > completed. It doesn't need to hold inode->i_lock in this case
> > because there's nothing it needs to serialise against as
> > iget_locked() has already done all that work.
> >
> > If the inode is found in cache by iget_locked, then the
> > wait_on_inode() call is guaranteed to ensure that I_NEW is not set
> > when it returns. The atomic bit operations on __I_NEW and the memory
> > barriers in unlock_new_inode() plays an important part in this
> > dance, and they guarantee that I_NEW has been cleared before
> > iget_locked() returns. No need for inode->i_lock to be held in this
> > case, either, because iget_locked() did all the serialisation for
> > us.
> 
> Thanks for explanation!
> 
> Let's consider the case when I call `inode_get`, and it finds an inode
> that _has_ been fully initialised before, so I_NEW is not set in
> inode->i_state and the inode is _not_ locked.
> 
> But the only means of checking that is by inspecting the i_state
> field, so I do something like:
> 
> if (!(inode->i_state & I_NEW))
>     return inode;
> 
> But now suppose that while I'm doing a naked load on inode->i_state,
> another cpu is running concurrently and happens to be holding the
> inode->i_lock, so it is within its right to write to inode->i_state,
> for example through a call to __inode_add_lru, which has the
> following:
> 
> inode->i_state |= I_REFERENCED;
> 
> So we have a thread doing a naked read and another thread doing a
> naked write, no ordering between them.
> 
> Would you agree that this is a data race? (Note that I'm not asking if
> "it will be ok" or "the compilers today generate the right code", I'm
> asking merely if you agree this is a data race.)

I'll agree that technically it is a data race on the entire i_state
word. Practically, however, it is not a data race on the I_NEW bit
within that word. The I_NEW bit remains unchanged across the entire
operation.

i.e. it does not matter where the read of i_state intersects with
the RMW of I_REFERENCED bit, the I_NEW bit remains unchanged in
memory across the operation. If the above operation results in the
I_NEW bit changing state in memory - even transiently - then the
compiler implementation is simply broken...

> If you do, then you'd have to agree that we are in undefined-behaviour
> territory. I can quote the spec if you'd like.

/me shrugs

I can point you at lots of code that it will break if bit operations
are allowed to randomly change other bits in the word transiently.

> Anyway, the discussion here is that this is also undefined behaviour
> in Rust. And we're trying really hard to avoid that. Of course, in
> cases like this there's not much we can do on the Rust side alone so
> the conclusion now appears to be that we'll introduce helper functions
> for this now and live with it. If one day we have a better solution,
> we'll update just one place.

All the rust code that calls iget_locked() needs to do to "be safe"
is the rust equivalent of:

	spin_lock(&inode->i_lock);
	if (!(inode->i_state & I_NEW)) {
		spin_unlock(&inode->i_lock);
		return inode;
	}
	spin_unlock(&inode->i_lock);

IOWs, we solve the "safety" concern by ensuring that Rust filesystem
implementations follow the general rule of "always hold the i_lock
when accessing inode->i_state" I originally outlined, yes?

> But we want the be very deliberate about these. We don't want to
> accidentally introduce data races (and therefore potential undefined
> behaviour).

The stop looking at the C code and all the exceptions we make for
special case optimisations and just code to the generic rules for
safe access to given fields. Yes, rust will then have to give up the
optimisations we make in the C code, but there's always a price for
safety...

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-11-08  4:54                       ` Dave Chinner
@ 2023-11-08  6:15                         ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2023-11-08  6:15 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Boqun Feng, Matthew Wilcox, Benno Lossin, Alexander Viro,
	Christian Brauner, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho, Marco Elver

On Wed, 8 Nov 2023 at 01:54, Dave Chinner <david@fromorbit.com> wrote:
>
> On Tue, Oct 31, 2023 at 05:49:19PM -0300, Wedson Almeida Filho wrote:
> > On Sun, 29 Oct 2023 at 23:29, Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Mon, Oct 23, 2023 at 09:55:08AM -0300, Wedson Almeida Filho wrote:
> > > > On Mon, 23 Oct 2023 at 02:29, Dave Chinner <david@fromorbit.com> wrote:
>
> > If you do, then you'd have to agree that we are in undefined-behaviour
> > territory. I can quote the spec if you'd like.
>
> /me shrugs
>
> I can point you at lots of code that it will break if bit operations
> are allowed to randomly change other bits in the word transiently.

Sure, in C you have chosen to rely on behaviour that the language spec
says is undefined.

In Rust, we're trying avoid it. When it's unavoidable, we're trying to
clearly mark it so that we can try to fix it later.

> All the rust code that calls iget_locked() needs to do to "be safe"
> is the rust equivalent of:
>
>         spin_lock(&inode->i_lock);
>         if (!(inode->i_state & I_NEW)) {
>                 spin_unlock(&inode->i_lock);
>                 return inode;
>         }
>         spin_unlock(&inode->i_lock);
>
> IOWs, we solve the "safety" concern by ensuring that Rust filesystem
> implementations follow the general rule of "always hold the i_lock
> when accessing inode->i_state" I originally outlined, yes?

Ah, the name of the functions iget_locked() and unlock_new_inode()
threw me off, I thought I wouldn't be able to lock inode->i_lock.

Ok, I will do this for now, I think it's better than relying on
undefined behaviour. Thanks!

Actually, looking at the implementation of iget_locked(), there's a
single place where it returns a new inode. Wouldn't it be better to
just return this piece of information (whether the inode is new or
not) to the caller? Then we would eliminate the data races in C and
the need to lock in Rust, and we would also eliminate a memory load
from inode->i_state in all callers.

> > But we want the be very deliberate about these. We don't want to
> > accidentally introduce data races (and therefore potential undefined
> > behaviour).
>
> The stop looking at the C code and all the exceptions we make for
> special case optimisations and just code to the generic rules for
> safe access to given fields. Yes, rust will then have to give up the
> optimisations we make in the C code, but there's always a price for
> safety...

I'm not trying to do clever optimisations at all. I'm trying to figure
out how to do things by looking at imperfect documentation in
filesystems/porting.rst (which, BTW, checks I_NEW without a lock) and
the functions I call. So I look at what existing filesystems do to
learn the hopefully most up to date way of doing things. If you have a
recommendation on how to do this more efficiently, I'm all ears!

Thanks,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params`
  2023-10-18 12:25 ` [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params` Wedson Almeida Filho
  2023-10-18 16:34   ` Benno Lossin
  2023-10-20 15:04   ` Ariel Miculas (amiculas)
@ 2024-01-03 12:25   ` Andreas Hindborg (Samsung)
  2 siblings, 0 replies; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2024-01-03 12:25 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho


Wedson Almeida Filho <wedsonaf@gmail.com> writes:

<snip>

> +    unsafe extern "C" fn fill_super_callback(
> +        sb_ptr: *mut bindings::super_block,
> +        _fc: *mut bindings::fs_context,
> +    ) -> core::ffi::c_int {
> +        from_result(|| {
> +            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
> +            // newly-created superblock.
> +            let sb = unsafe { &mut *sb_ptr.cast() };
> +            let params = T::super_params(sb)?;
> +
> +            sb.0.s_magic = params.magic as _;

I would prefer an explicit target type for the cast.

BR Andreas


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 05/19] rust: fs: introduce `INode<T>`
  2023-10-28 18:00   ` Alice Ryhl
@ 2024-01-03 12:45     ` Andreas Hindborg (Samsung)
  0 siblings, 0 replies; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2024-01-03 12:45 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Wedson Almeida Filho, Alexander Viro, Christian Brauner,
	Matthew Wilcox, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho


Alice Ryhl <alice@ryhl.io> writes:

> On 10/18/23 14:25, Wedson Almeida Filho wrote:
>> +    /// Returns the super-block that owns the inode.
>> +    pub fn super_block(&self) -> &SuperBlock<T> {
>> +        // SAFETY: `i_sb` is immutable, and `self` is guaranteed to be valid by the existence of a
>> +        // shared reference (&self) to it.
>> +        unsafe { &*(*self.0.get()).i_sb.cast() }
>> +    }
>
> This makes me a bit nervous. I had to look up whether this field was a pointer
> to a superblock, or just a superblock embedded directly in `struct inode`. It
> does look like it's correct as-is, but I'd feel more confident about it if it
> doesn't use a cast to completely ignore the type going in to the pointer cast.
>
> Could you define a `from_raw` on `SuperBlock` and change this to:
>
>     unsafe { &*SuperBlock::from_raw((*self.0.get()).i_sb) }
>
> or perhaps add a type annotation like this:
>
>     let i_sb: *mut super_block = unsafe { (*self.0.get()).i_sb };
>     i_sb.cast()

I think it would also be nice to make the cast explicit:

  i_sb.cast::<SuperBlock<T>>()

otherwise the cast is no different than `as _` with all the caveats that
comes with.

BR Andreas

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 05/19] rust: fs: introduce `INode<T>`
  2023-10-18 12:25 ` [RFC PATCH 05/19] rust: fs: introduce `INode<T>` Wedson Almeida Filho
  2023-10-28 18:00   ` Alice Ryhl
@ 2024-01-03 12:54   ` Andreas Hindborg (Samsung)
  2024-01-04  5:20     ` Darrick J. Wong
  2024-01-10  9:45     ` Benno Lossin
  2024-01-04  5:14   ` Darrick J. Wong
  2 siblings, 2 replies; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2024-01-03 12:54 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho


Wedson Almeida Filho <wedsonaf@gmail.com> writes:

> +/// The number of an inode.
> +pub type Ino = u64;

Would it be possible to use a descriptive name such as `INodeNumber`?

> +    /// Returns the super-block that owns the inode.
> +    pub fn super_block(&self) -> &SuperBlock<T> {
> +        // SAFETY: `i_sb` is immutable, and `self` is guaranteed to be valid by the existence of a
> +        // shared reference (&self) to it.
> +        unsafe { &*(*self.0.get()).i_sb.cast() }
> +    }

I think the safety comment should talk about the pointee rather than the
pointer? "The pointee of `i_sb` is immutable, and ..."

BR Andreas


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2023-10-18 12:25 ` [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root` Wedson Almeida Filho
  2023-10-19 14:30   ` Benno Lossin
  2023-10-20  0:30   ` Boqun Feng
@ 2024-01-03 13:29   ` Andreas Hindborg (Samsung)
  2024-01-24  4:07     ` Wedson Almeida Filho
  2 siblings, 1 reply; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2024-01-03 13:29 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho


Wedson Almeida Filho <wedsonaf@gmail.com> writes:

[...]

>  
> +/// An inode that is locked and hasn't been initialised yet.
> +#[repr(transparent)]
> +pub struct NewINode<T: FileSystem + ?Sized>(ARef<INode<T>>);
> +
> +impl<T: FileSystem + ?Sized> NewINode<T> {
> +    /// Initialises the new inode with the given parameters.
> +    pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
> +        // SAFETY: This is a new inode, so it's safe to manipulate it mutably.
> +        let inode = unsafe { &mut *self.0 .0.get() };

Perhaps it would make sense with a `UniqueARef` that guarantees
uniqueness, in line with `alloc::UniqueRc`?

[...]

>  
> +impl<T: FileSystem + ?Sized> SuperBlock<T> {
> +    /// Tries to get an existing inode or create a new one if it doesn't exist yet.
> +    pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, NewINode<T>>> {
> +        // SAFETY: The only initialisation missing from the superblock is the root, and this
> +        // function is needed to create the root, so it's safe to call it.
> +        let inode =
> +            ptr::NonNull::new(unsafe { bindings::iget_locked(self.0.get(), ino) }).ok_or(ENOMEM)?;

I can't parse this safety comment properly.

> +
> +        // SAFETY: `inode` is valid for read, but there could be concurrent writers (e.g., if it's
> +        // an already-initialised inode), so we use `read_volatile` to read its current state.
> +        let state = unsafe { ptr::read_volatile(ptr::addr_of!((*inode.as_ptr()).i_state)) };
> +        if state & u64::from(bindings::I_NEW) == 0 {
> +            // The inode is cached. Just return it.
> +            //
> +            // SAFETY: `inode` had its refcount incremented by `iget_locked`; this increment is now
> +            // owned by `ARef`.
> +            Ok(Either::Left(unsafe { ARef::from_raw(inode.cast()) }))
> +        } else {
> +            // SAFETY: The new inode is valid but not fully initialised yet, so it's ok to create a
> +            // `NewINode`.
> +            Ok(Either::Right(NewINode(unsafe {
> +                ARef::from_raw(inode.cast())

I would suggest making the destination type explicit for the cast.

> +            })))
> +        }
> +    }
> +}
> +
>  /// Required superblock parameters.
>  ///
>  /// This is returned by implementations of [`FileSystem::super_params`].
> @@ -215,41 +345,28 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>              sb.0.s_blocksize = 1 << sb.0.s_blocksize_bits;
>              sb.0.s_flags |= bindings::SB_RDONLY;
>  
> -            // The following is scaffolding code that will be removed in a subsequent patch. It is
> -            // needed to build a root dentry, otherwise core code will BUG().
> -            // SAFETY: `sb` is the superblock being initialised, it is valid for read and write.
> -            let inode = unsafe { bindings::new_inode(&mut sb.0) };
> -            if inode.is_null() {
> -                return Err(ENOMEM);
> -            }
> -
> -            // SAFETY: `inode` is valid for write.
> -            unsafe { bindings::set_nlink(inode, 2) };
> -
> -            {
> -                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
> -                // safe to mutably dereference it.
> -                let inode = unsafe { &mut *inode };
> -                inode.i_ino = 1;
> -                inode.i_mode = (bindings::S_IFDIR | 0o755) as _;
> -
> -                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
> -                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
> +            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
> +            // newly-created (and initialised above) superblock.
> +            let sb = unsafe { &mut *sb_ptr.cast() };

Again, I would suggest an explicit destination type for the cast.

> +            let root = T::init_root(sb)?;
>  
> -                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
> -                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
> +            // Reject root inode if it belongs to a different superblock.

I am curious how this would happen?

BR Andreas

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir`
  2023-10-18 12:25 ` [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir` Wedson Almeida Filho
  2023-10-21  8:33   ` Benno Lossin
@ 2024-01-03 14:09   ` Andreas Hindborg (Samsung)
  2024-01-21 21:00   ` Askar Safin
  2 siblings, 0 replies; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2024-01-03 14:09 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho


Wedson Almeida Filho <wedsonaf@gmail.com> writes:

[...]

> +    unsafe extern "C" fn read_dir_callback(
> +        file: *mut bindings::file,
> +        ctx_ptr: *mut bindings::dir_context,
> +    ) -> core::ffi::c_int {
> +        from_result(|| {
> +            // SAFETY: The C API guarantees that `file` is valid for read. And since `f_inode` is
> +            // immutable, we can read it directly.

Should this be "the pointee of `f_inode` is immutable" instead?

[...]

> +    pub fn emit(&mut self, pos_inc: i64, name: &[u8], ino: Ino, etype: DirEntryType) -> bool {
> +        let Ok(name_len) = i32::try_from(name.len()) else {
> +            return false;
> +        };
> +
> +        let Some(actor) = self.0.actor else {
> +            return false;
> +        };
> +
> +        let Some(new_pos) = self.0.pos.checked_add(pos_inc) else {
> +            return false;
> +        };
> +
> +        // SAFETY: `name` is valid at least for the duration of the `actor` call.
> +        let ret = unsafe {
> +            actor(
> +                &mut self.0,
> +                name.as_ptr().cast(),
> +                name_len,
> +                self.0.pos,
> +                ino,
> +                etype as _,

I would prefer an explicit target type here.

BR Andreas

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 12/19] rust: fs: introduce `FileSystem::statfs`
  2023-10-18 12:25 ` [RFC PATCH 12/19] rust: fs: introduce `FileSystem::statfs` Wedson Almeida Filho
@ 2024-01-03 14:13   ` Andreas Hindborg (Samsung)
  2024-01-04  5:33   ` Darrick J. Wong
  1 sibling, 0 replies; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2024-01-03 14:13 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho


Wedson Almeida Filho <wedsonaf@gmail.com> writes:

[...]

> +    unsafe extern "C" fn statfs_callback(
> +        dentry: *mut bindings::dentry,
> +        buf: *mut bindings::kstatfs,
> +    ) -> core::ffi::c_int {
> +        from_result(|| {
> +            // SAFETY: The C API guarantees that `dentry` is valid for read. `d_sb` is
> +            // immutable, so it's safe to read it. The superblock is guaranteed to be valid dor

"valid dor" -> "valid for"

BR Andreas

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 14/19] rust: fs: add per-superblock data
  2023-10-18 12:25 ` [RFC PATCH 14/19] rust: fs: add per-superblock data Wedson Almeida Filho
  2023-10-25 15:51   ` Ariel Miculas (amiculas)
  2023-10-26 13:46   ` Ariel Miculas (amiculas)
@ 2024-01-03 14:16   ` Andreas Hindborg (Samsung)
  2 siblings, 0 replies; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2024-01-03 14:16 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho


Wedson Almeida Filho <wedsonaf@gmail.com> writes:

[...]

> @@ -472,6 +495,9 @@ pub struct SuperParams {
>  
>      /// Granularity of c/m/atime in ns (cannot be worse than a second).
>      pub time_gran: u32,
> +
> +    /// Data to be associated with the superblock.
> +    pub data: T,
>  }
>  
>  /// A superblock that is still being initialised.
> @@ -522,6 +548,9 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>              sb.0.s_blocksize = 1 << sb.0.s_blocksize_bits;
>              sb.0.s_flags |= bindings::SB_RDONLY;
>  
> +            // N.B.: Even on failure, `kill_sb` is called and frees the data.
> +            sb.0.s_fs_info = params.data.into_foreign().cast_mut();

I would prefer to make the target type of the cast explicit.

BR Andreas


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 15/19] rust: fs: add basic support for fs buffer heads
  2023-10-18 12:25 ` [RFC PATCH 15/19] rust: fs: add basic support for fs buffer heads Wedson Almeida Filho
@ 2024-01-03 14:17   ` Andreas Hindborg (Samsung)
  0 siblings, 0 replies; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2024-01-03 14:17 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho


Wedson Almeida Filho <wedsonaf@gmail.com> writes:

[...]

> +// SAFETY: The type invariants guarantee that `INode` is always ref-counted.
> +unsafe impl AlwaysRefCounted for Head {
> +    fn inc_ref(&self) {
> +        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> +        unsafe { bindings::get_bh(self.0.get()) };
> +    }
> +
> +    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
> +        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
> +        unsafe { bindings::put_bh(obj.cast().as_ptr()) }

I would prefer the target type of the cast to be explicit.

> +    }
> +}
> +
> +impl Head {
> +    /// Returns the block data associated with the given buffer head.
> +    pub fn data(&self) -> &[u8] {
> +        let h = self.0.get();
> +        // SAFETY: The existence of a shared reference guarantees that the buffer head is
> +        // available and so we can access its contents.
> +        unsafe { core::slice::from_raw_parts((*h).b_data.cast(), (*h).b_size) }

Same

BR Andreas


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 16/19] rust: fs: allow file systems backed by a block device
  2023-10-18 12:25 ` [RFC PATCH 16/19] rust: fs: allow file systems backed by a block device Wedson Almeida Filho
  2023-10-21 13:39   ` Benno Lossin
@ 2024-01-03 14:38   ` Andreas Hindborg (Samsung)
  1 sibling, 0 replies; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2024-01-03 14:38 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho


Wedson Almeida Filho <wedsonaf@gmail.com> writes:

[...]

> @@ -479,6 +500,65 @@ pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, New
>              })))
>          }
>      }
> +
> +    /// Reads a block from the block device.
> +    #[cfg(CONFIG_BUFFER_HEAD)]
> +    pub fn bread(&self, block: u64) -> Result<ARef<buffer::Head>> {
> +        // Fail requests for non-blockdev file systems. This is a compile-time check.
> +        match T::SUPER_TYPE {
> +            Super::BlockDev => {}
> +            _ => return Err(EIO),
> +        }
> +
> +        // SAFETY: This function is only valid after the `NeedsInit` typestate, so the block size
> +        // is known and the superblock can be used to read blocks.
> +        let ptr =
> +            ptr::NonNull::new(unsafe { bindings::sb_bread(self.0.get(), block) }).ok_or(EIO)?;
> +        // SAFETY: `sb_bread` returns a referenced buffer head. Ownership of the increment is
> +        // passed to the `ARef` instance.
> +        Ok(unsafe { ARef::from_raw(ptr.cast()) })

I would prefer the target of the cast to be explicit.

BR Andreas

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 17/19] rust: fs: allow per-inode data
  2023-10-18 12:25 ` [RFC PATCH 17/19] rust: fs: allow per-inode data Wedson Almeida Filho
  2023-10-21 13:57   ` Benno Lossin
@ 2024-01-03 14:39   ` Andreas Hindborg (Samsung)
  1 sibling, 0 replies; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2024-01-03 14:39 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho


Wedson Almeida Filho <wedsonaf@gmail.com> writes:

[...]

> @@ -239,6 +255,16 @@ pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<
>              unsafe { T::Data::from_foreign(ptr) };
>          }
>      }
> +
> +    unsafe extern "C" fn inode_init_once_callback<T: FileSystem + ?Sized>(
> +        outer_inode: *mut core::ffi::c_void,
> +    ) {

A docstring with intended use for this function would be nice.

> +        let ptr = outer_inode.cast::<INodeWithData<T::INodeData>>();
> +
> +        // SAFETY: This is only used in `new`, so we know that we have a valid `INodeWithData`
> +        // instance whose inode part can be initialised.

What does "This" refer to here?

> +        unsafe { bindings::inode_init_once(ptr::addr_of_mut!((*ptr).inode)) };
> +    }
>  }
>  
>  #[pinned_drop]
> @@ -280,6 +306,15 @@ pub fn super_block(&self) -> &SuperBlock<T> {
>          unsafe { &*(*self.0.get()).i_sb.cast() }

I would prefer the target type of the cast to be explicit.

[...]

>  
>  impl<T: FileSystem + ?Sized> NewINode<T> {
>      /// Initialises the new inode with the given parameters.
> -    pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
> -        // SAFETY: This is a new inode, so it's safe to manipulate it mutably.
> -        let inode = unsafe { &mut *self.0 .0.get() };
> +    pub fn init(self, params: INodeParams<T::INodeData>) -> Result<ARef<INode<T>>> {
> +        let outerp = container_of!(self.0 .0.get(), INodeWithData<T::INodeData>, inode);
> +
> +        // SAFETY: This is a newly-created inode. No other references to it exist, so it is
> +        // safe to mutably dereference it.
> +        let outer = unsafe { &mut *outerp.cast_mut() };

Same

[...]

> @@ -766,6 +822,61 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>          shutdown: None,
>      };
>  
> +    unsafe extern "C" fn alloc_inode_callback(
> +        sb: *mut bindings::super_block,
> +    ) -> *mut bindings::inode {
> +        // SAFETY: The callback contract guarantees that `sb` is valid for read.
> +        let super_type = unsafe { (*sb).s_type };
> +
> +        // SAFETY: This callback is only used in `Registration`, so `super_type` is necessarily
> +        // embedded in a `Registration`, which is guaranteed to be valid because it has a
> +        // superblock associated to it.
> +        let reg = unsafe { &*container_of!(super_type, Registration, fs) };
> +
> +        // SAFETY: `sb` and `cache` are guaranteed to be valid by the callback contract and by
> +        // the existence of a superblock respectively.
> +        let ptr = unsafe {
> +            bindings::alloc_inode_sb(sb, MemCache::ptr(&reg.inode_cache), bindings::GFP_KERNEL)
> +        }
> +        .cast::<INodeWithData<T::INodeData>>();
> +        if ptr.is_null() {
> +            return ptr::null_mut();
> +        }
> +        ptr::addr_of_mut!((*ptr).inode)
> +    }
> +
> +    unsafe extern "C" fn destroy_inode_callback(inode: *mut bindings::inode) {
> +        // SAFETY: By the C contract, `inode` is a valid pointer.
> +        let is_bad = unsafe { bindings::is_bad_inode(inode) };
> +
> +        // SAFETY: The inode is guaranteed to be valid by the callback contract. Additionally, the
> +        // superblock is also guaranteed to still be valid by the inode existence.
> +        let super_type = unsafe { (*(*inode).i_sb).s_type };
> +
> +        // SAFETY: This callback is only used in `Registration`, so `super_type` is necessarily
> +        // embedded in a `Registration`, which is guaranteed to be valid because it has a
> +        // superblock associated to it.
> +        let reg = unsafe { &*container_of!(super_type, Registration, fs) };
> +        let ptr = container_of!(inode, INodeWithData<T::INodeData>, inode).cast_mut();

Same

> +
> +        if !is_bad {
> +            // SAFETY: The code either initialises the data or marks the inode as bad. Since the
> +            // inode is not bad, the data is initialised, and thus safe to drop.
> +            unsafe { ptr::drop_in_place((*ptr).data.as_mut_ptr()) };
> +        }
> +
> +        if size_of::<T::INodeData>() == 0 {
> +            // SAFETY: When the size of `INodeData` is zero, we don't use a separate mem_cache, so
> +            // it is allocated from the regular mem_cache, which is what `free_inode_nonrcu` uses
> +            // to free the inode.
> +            unsafe { bindings::free_inode_nonrcu(inode) };
> +        } else {
> +            // The callback contract guarantees that the inode was previously allocated via the
> +            // `alloc_inode_callback` callback, so it is safe to free it back to the cache.
> +            unsafe { bindings::kmem_cache_free(MemCache::ptr(&reg.inode_cache), ptr.cast()) };

Same

BR Andreas


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2023-10-31 20:14   ` Wedson Almeida Filho
@ 2024-01-03 18:02     ` Matthew Wilcox
  2024-01-03 19:04       ` Wedson Almeida Filho
                         ` (2 more replies)
  0 siblings, 3 replies; 125+ messages in thread
From: Matthew Wilcox @ 2024-01-03 18:02 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Tue, Oct 31, 2023 at 05:14:08PM -0300, Wedson Almeida Filho wrote:
> > Also, I see you're passing an inode to read_dir.  Why did you decide to
> > do that?  There's information in the struct file that's either necessary
> > or useful to have in the filesystem.  Maybe not in toy filesystems, but eg
> > network filesystems need user credentials to do readdir, which are stored
> > in struct file.  Block filesystems store readahead data in struct file.
> 
> Because the two file systems we have don't use anything from `struct
> file` beyond the inode.
> 
> Passing a `file` to `read_dir` would require us to introduce an
> unnecessary abstraction that no one uses, which we've been told not to
> do.
> 
> There is no technical reason that makes it impractical though. We can
> add it when the need arises.

Then we shouldn't merge any of this, or even send it out for review
again until there is at least one non-toy filesystems implemented.
Either stick to the object orientation we've already defined (ie
separate aops, iops, fops, ... with substantially similar arguments)
or propose changes to the ones we have in C.  Dealing only with toy
filesystems is leading you to bad architecture.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-03 18:02     ` Matthew Wilcox
@ 2024-01-03 19:04       ` Wedson Almeida Filho
  2024-01-03 19:53         ` Al Viro
  2024-01-04  1:49         ` Matthew Wilcox
  2024-01-03 19:14       ` Kent Overstreet
  2024-01-05  0:04       ` David Howells
  2 siblings, 2 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2024-01-03 19:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, 3 Jan 2024 at 15:02, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Oct 31, 2023 at 05:14:08PM -0300, Wedson Almeida Filho wrote:
> > > Also, I see you're passing an inode to read_dir.  Why did you decide to
> > > do that?  There's information in the struct file that's either necessary
> > > or useful to have in the filesystem.  Maybe not in toy filesystems, but eg
> > > network filesystems need user credentials to do readdir, which are stored
> > > in struct file.  Block filesystems store readahead data in struct file.
> >
> > Because the two file systems we have don't use anything from `struct
> > file` beyond the inode.
> >
> > Passing a `file` to `read_dir` would require us to introduce an
> > unnecessary abstraction that no one uses, which we've been told not to
> > do.
> >
> > There is no technical reason that makes it impractical though. We can
> > add it when the need arises.
>
> Then we shouldn't merge any of this, or even send it out for review
> again until there is at least one non-toy filesystems implemented.

What makes you characterize these filesystems as toys? The fact that
they only use the file's inode in iterate_shared?

> Either stick to the object orientation we've already defined (ie
> separate aops, iops, fops, ... with substantially similar arguments)
> or propose changes to the ones we have in C.  Dealing only with toy
> filesystems is leading you to bad architecture.

I'm trying to understand the argument here. Are saying that Rust
cannot have different APIs with the same performance characteristics
as C's, unless we also fix the C apis?

That isn't even a requirement when introducing new C apis, why would
it be a requirement for Rust apis?

Cheers,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-03 18:02     ` Matthew Wilcox
  2024-01-03 19:04       ` Wedson Almeida Filho
@ 2024-01-03 19:14       ` Kent Overstreet
  2024-01-03 20:41         ` Al Viro
  2024-01-05  0:04       ` David Howells
  2 siblings, 1 reply; 125+ messages in thread
From: Kent Overstreet @ 2024-01-03 19:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Wedson Almeida Filho, Alexander Viro, Christian Brauner,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Wed, Jan 03, 2024 at 06:02:40PM +0000, Matthew Wilcox wrote:
> On Tue, Oct 31, 2023 at 05:14:08PM -0300, Wedson Almeida Filho wrote:
> > > Also, I see you're passing an inode to read_dir.  Why did you decide to
> > > do that?  There's information in the struct file that's either necessary
> > > or useful to have in the filesystem.  Maybe not in toy filesystems, but eg
> > > network filesystems need user credentials to do readdir, which are stored
> > > in struct file.  Block filesystems store readahead data in struct file.
> > 
> > Because the two file systems we have don't use anything from `struct
> > file` beyond the inode.
> > 
> > Passing a `file` to `read_dir` would require us to introduce an
> > unnecessary abstraction that no one uses, which we've been told not to
> > do.
> > 
> > There is no technical reason that makes it impractical though. We can
> > add it when the need arises.
> 
> Then we shouldn't merge any of this, or even send it out for review
> again until there is at least one non-toy filesystems implemented.
> Either stick to the object orientation we've already defined (ie
> separate aops, iops, fops, ... with substantially similar arguments)
> or propose changes to the ones we have in C.  Dealing only with toy
> filesystems is leading you to bad architecture.

Not sure I agree - this is a "waterfall vs. incremental" question, and
personally I would go with doing things incrementally here.

We don't need to copy the C interface as is; we can use this as an
opportunity to incrementally design a new API that will obviously take
lessons from the C API (since it's wrapping it), but it doesn't have to
do things the same and it doesn't have to do everything all at once.

Anyways, like you alluded to the C side is a bit of a mess w.r.t. what's
in a_ops vs. i_ops, and cleaning that up on the C side is a giant hassle
because then you have to fix _everything_ that implements or consumes
those interfaces at the same time.

So instead, it would seem easier to me to do the cleaner version on the
Rust side, and then once we know what that looks like, maybe we update
the C version to match - or maybe we light it all on fire and continue
with rewriting everything in Rust... *shrug*

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-03 19:04       ` Wedson Almeida Filho
@ 2024-01-03 19:53         ` Al Viro
  2024-01-03 20:38           ` Kent Overstreet
  2024-01-04  1:49         ` Matthew Wilcox
  1 sibling, 1 reply; 125+ messages in thread
From: Al Viro @ 2024-01-03 19:53 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Matthew Wilcox, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Jan 03, 2024 at 04:04:26PM -0300, Wedson Almeida Filho wrote:

> > Either stick to the object orientation we've already defined (ie
> > separate aops, iops, fops, ... with substantially similar arguments)
> > or propose changes to the ones we have in C.  Dealing only with toy
> > filesystems is leading you to bad architecture.
> 
> I'm trying to understand the argument here. Are saying that Rust
> cannot have different APIs with the same performance characteristics
> as C's, unless we also fix the C apis?

Different expressive power, not performance characteristics.

It's *NOT* about C vs Rust; we have an existing system of objects and
properties of such.  Independent from the language being used to work
with them.

If we have to keep a separate system for your language, feel free to fork
the kernel and do whatever you want with it.  Just don't expect anybody
else to play with your toy.

In case it's not entirely obvious - your arguments about not needing
something or other for the instances you have tried to work with so far
do not hold water.  At all.

The only acceptable way to use Rust in that space is to treat the existing
set of objects and operations as externally given; we *can* change those,
with good enough reasons, but "the instances in Rust-using filesystems 
don't need this and that" doesn't cut it.

Changes do happen in that area.  Often enough.  And the cost of figuring
out whether they break things shouldn't be doubled because Rust folks
want a universe of their own - the benefits of Rust are not worth that
kind of bother.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-03 19:53         ` Al Viro
@ 2024-01-03 20:38           ` Kent Overstreet
  0 siblings, 0 replies; 125+ messages in thread
From: Kent Overstreet @ 2024-01-03 20:38 UTC (permalink / raw)
  To: Al Viro
  Cc: Wedson Almeida Filho, Matthew Wilcox, Christian Brauner,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Wed, Jan 03, 2024 at 07:53:58PM +0000, Al Viro wrote:
> On Wed, Jan 03, 2024 at 04:04:26PM -0300, Wedson Almeida Filho wrote:
> 
> > > Either stick to the object orientation we've already defined (ie
> > > separate aops, iops, fops, ... with substantially similar arguments)
> > > or propose changes to the ones we have in C.  Dealing only with toy
> > > filesystems is leading you to bad architecture.
> > 
> > I'm trying to understand the argument here. Are saying that Rust
> > cannot have different APIs with the same performance characteristics
> > as C's, unless we also fix the C apis?
> 
> Different expressive power, not performance characteristics.
> 
> It's *NOT* about C vs Rust; we have an existing system of objects and
> properties of such.  Independent from the language being used to work
> with them.
> 
> If we have to keep a separate system for your language, feel free to fork
> the kernel and do whatever you want with it.  Just don't expect anybody
> else to play with your toy.

The rust people have been getting conflicting advice, and your response
is to tell them to fork the kernel and go away?

> In case it's not entirely obvious - your arguments about not needing
> something or other for the instances you have tried to work with so far
> do not hold water.  At all.
> 
> The only acceptable way to use Rust in that space is to treat the existing
> set of objects and operations as externally given; we *can* change those,
> with good enough reasons, but "the instances in Rust-using filesystems 
> don't need this and that" doesn't cut it.

The question was just about whether to add something now that isn't used
on the Rust side yet, or wait until later when it is.

I think this has gone a bit afield, and gotten a bit dramatic.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-03 19:14       ` Kent Overstreet
@ 2024-01-03 20:41         ` Al Viro
  2024-01-09 19:13           ` Wedson Almeida Filho
  0 siblings, 1 reply; 125+ messages in thread
From: Al Viro @ 2024-01-03 20:41 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Matthew Wilcox, Wedson Almeida Filho, Christian Brauner,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Wed, Jan 03, 2024 at 02:14:34PM -0500, Kent Overstreet wrote:

> We don't need to copy the C interface as is; we can use this as an
> opportunity to incrementally design a new API that will obviously take
> lessons from the C API (since it's wrapping it), but it doesn't have to
> do things the same and it doesn't have to do everything all at once.
> 
> Anyways, like you alluded to the C side is a bit of a mess w.r.t. what's
> in a_ops vs. i_ops, and cleaning that up on the C side is a giant hassle
> because then you have to fix _everything_ that implements or consumes
> those interfaces at the same time.
> 
> So instead, it would seem easier to me to do the cleaner version on the
> Rust side, and then once we know what that looks like, maybe we update
> the C version to match - or maybe we light it all on fire and continue
> with rewriting everything in Rust... *shrug*

No.  This "cleaner version on the Rust side" is nothing of that sort;
this "readdir doesn't need any state that might be different for different
file instances beyond the current position, because none of our examples
have needed that so far" is a good example of the garbage we really do
not need to deal with.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-03 19:04       ` Wedson Almeida Filho
  2024-01-03 19:53         ` Al Viro
@ 2024-01-04  1:49         ` Matthew Wilcox
  2024-01-09 18:25           ` Wedson Almeida Filho
  1 sibling, 1 reply; 125+ messages in thread
From: Matthew Wilcox @ 2024-01-04  1:49 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Jan 03, 2024 at 04:04:26PM -0300, Wedson Almeida Filho wrote:
> On Wed, 3 Jan 2024 at 15:02, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Oct 31, 2023 at 05:14:08PM -0300, Wedson Almeida Filho wrote:
> > > > Also, I see you're passing an inode to read_dir.  Why did you decide to
> > > > do that?  There's information in the struct file that's either necessary
> > > > or useful to have in the filesystem.  Maybe not in toy filesystems, but eg
> > > > network filesystems need user credentials to do readdir, which are stored
> > > > in struct file.  Block filesystems store readahead data in struct file.
> > >
> > > Because the two file systems we have don't use anything from `struct
> > > file` beyond the inode.
> > >
> > > Passing a `file` to `read_dir` would require us to introduce an
> > > unnecessary abstraction that no one uses, which we've been told not to
> > > do.
> > >
> > > There is no technical reason that makes it impractical though. We can
> > > add it when the need arises.
> >
> > Then we shouldn't merge any of this, or even send it out for review
> > again until there is at least one non-toy filesystems implemented.
> 
> What makes you characterize these filesystems as toys? The fact that
> they only use the file's inode in iterate_shared?

They're not real filesystems.  You can't put, eg, root or your home
directory on one of these filesystems.

> > Either stick to the object orientation we've already defined (ie
> > separate aops, iops, fops, ... with substantially similar arguments)
> > or propose changes to the ones we have in C.  Dealing only with toy
> > filesystems is leading you to bad architecture.
> 
> I'm trying to understand the argument here. Are saying that Rust
> cannot have different APIs with the same performance characteristics
> as C's, unless we also fix the C apis?
> 
> That isn't even a requirement when introducing new C apis, why would
> it be a requirement for Rust apis?

I'm saying that we have the current object orientation (eg each inode
is an object with inode methods) for a reason.  Don't change it without
understanding what that reason is.  And moving, eg iterate_shared() from
file_operations to struct file_system_type (effectively what you've done)
is something we obviously wouldn't want to do.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 05/19] rust: fs: introduce `INode<T>`
  2023-10-18 12:25 ` [RFC PATCH 05/19] rust: fs: introduce `INode<T>` Wedson Almeida Filho
  2023-10-28 18:00   ` Alice Ryhl
  2024-01-03 12:54   ` Andreas Hindborg (Samsung)
@ 2024-01-04  5:14   ` Darrick J. Wong
  2024-01-24 18:17     ` Wedson Almeida Filho
  2 siblings, 1 reply; 125+ messages in thread
From: Darrick J. Wong @ 2024-01-04  5:14 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Wed, Oct 18, 2023 at 09:25:04AM -0300, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to handle typed and ref-counted inodes.
> 
> This is in preparation for creating new inodes (for example, to create
> the root inode of a new superblock), which comes in the next patch in
> the series.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>  rust/helpers.c    |  7 +++++++
>  rust/kernel/fs.rs | 53 +++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 58 insertions(+), 2 deletions(-)
> 
> diff --git a/rust/helpers.c b/rust/helpers.c
> index 4c86fe4a7e05..fe45f8ddb31f 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -25,6 +25,7 @@
>  #include <linux/build_bug.h>
>  #include <linux/err.h>
>  #include <linux/errname.h>
> +#include <linux/fs.h>
>  #include <linux/mutex.h>
>  #include <linux/refcount.h>
>  #include <linux/sched/signal.h>
> @@ -144,6 +145,12 @@ struct kunit *rust_helper_kunit_get_current_test(void)
>  }
>  EXPORT_SYMBOL_GPL(rust_helper_kunit_get_current_test);
>  
> +off_t rust_helper_i_size_read(const struct inode *inode)

i_size_read returns a loff_t (aka __kernel_loff_t (aka long long)),
but this returns    an off_t (aka __kernel_off_t (aka long)).

Won't that cause truncation issues for files larger than 4GB on some
architectures?

> +{
> +	return i_size_read(inode);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_i_size_read);
> +
>  /*
>   * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
>   * use it in contexts where Rust expects a `usize` like slice (array) indices.
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index 31cf643aaded..30fa1f312f33 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -7,9 +7,9 @@
>  //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
>  
>  use crate::error::{code::*, from_result, to_result, Error, Result};
> -use crate::types::Opaque;
> +use crate::types::{AlwaysRefCounted, Opaque};
>  use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
> -use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin};
> +use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin, ptr};
>  use macros::{pin_data, pinned_drop};
>  
>  /// Maximum size of an inode.
> @@ -94,6 +94,55 @@ fn drop(self: Pin<&mut Self>) {
>      }
>  }
>  
> +/// The number of an inode.
> +pub type Ino = u64;
> +
> +/// A node in the file system index (inode).
> +///
> +/// Wraps the kernel's `struct inode`.
> +///
> +/// # Invariants
> +///
> +/// Instances of this type are always ref-counted, that is, a call to `ihold` ensures that the
> +/// allocation remains valid at least until the matching call to `iput`.
> +#[repr(transparent)]
> +pub struct INode<T: FileSystem + ?Sized>(Opaque<bindings::inode>, PhantomData<T>);
> +
> +impl<T: FileSystem + ?Sized> INode<T> {
> +    /// Returns the number of the inode.
> +    pub fn ino(&self) -> Ino {
> +        // SAFETY: `i_ino` is immutable, and `self` is guaranteed to be valid by the existence of a
> +        // shared reference (&self) to it.
> +        unsafe { (*self.0.get()).i_ino }

Is "*self.0.get()" the means by which the Rust bindings get at the
actual C object?

(Forgive me, I've barely finished drying the primer coat on my rust-fu.)

> +    }
> +
> +    /// Returns the super-block that owns the inode.
> +    pub fn super_block(&self) -> &SuperBlock<T> {
> +        // SAFETY: `i_sb` is immutable, and `self` is guaranteed to be valid by the existence of a
> +        // shared reference (&self) to it.
> +        unsafe { &*(*self.0.get()).i_sb.cast() }
> +    }
> +
> +    /// Returns the size of the inode contents.
> +    pub fn size(&self) -> i64 {

I'm a little surprised I didn't see a

pub type loff_t = i64

followed by this function returning a loff_t.  Or maybe it would be
better to define it as:

struct loff_t(i64);

So that dopey fs developers like me cannot so easily assign a file
position (bytes) to a pgoff_t (page index) without either supplying an
actual conversion operator or seeing complaints from the compiler.

> +        // SAFETY: `self` is guaranteed to be valid by the existence of a shared reference.
> +        unsafe { bindings::i_size_read(self.0.get()) }

It's confusing that rust_i_size_read returns a long but on the rust side,
INode::size returns an i64.

--D

> +    }
> +}
> +
> +// SAFETY: The type invariants guarantee that `INode` is always ref-counted.
> +unsafe impl<T: FileSystem + ?Sized> AlwaysRefCounted for INode<T> {
> +    fn inc_ref(&self) {
> +        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> +        unsafe { bindings::ihold(self.0.get()) };
> +    }
> +
> +    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
> +        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
> +        unsafe { bindings::iput(obj.cast().as_ptr()) }
> +    }
> +}
> +
>  /// A file system super block.
>  ///
>  /// Wraps the kernel's `struct super_block`.
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 05/19] rust: fs: introduce `INode<T>`
  2024-01-03 12:54   ` Andreas Hindborg (Samsung)
@ 2024-01-04  5:20     ` Darrick J. Wong
  2024-01-04  9:57       ` Andreas Hindborg (Samsung)
  2024-01-10  9:45     ` Benno Lossin
  1 sibling, 1 reply; 125+ messages in thread
From: Darrick J. Wong @ 2024-01-04  5:20 UTC (permalink / raw)
  To: Andreas Hindborg (Samsung)
  Cc: Wedson Almeida Filho, Alexander Viro, Christian Brauner,
	Matthew Wilcox, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho

On Wed, Jan 03, 2024 at 01:54:34PM +0100, Andreas Hindborg (Samsung) wrote:
> 
> Wedson Almeida Filho <wedsonaf@gmail.com> writes:
> 
> > +/// The number of an inode.
> > +pub type Ino = u64;
> 
> Would it be possible to use a descriptive name such as `INodeNumber`?

Filesystem programmers are lazy.  Originally the term was "index node
number", which was shortened to "inode number", shortened again to
"inumber", and finally "ino".  The Rust type name might as well mirror
the C type.

(There are probably greyerbeards than I who can quote even more arcane
points of Unix filesystem history.)

> > +    /// Returns the super-block that owns the inode.
> > +    pub fn super_block(&self) -> &SuperBlock<T> {
> > +        // SAFETY: `i_sb` is immutable, and `self` is guaranteed to be valid by the existence of a
> > +        // shared reference (&self) to it.
> > +        unsafe { &*(*self.0.get()).i_sb.cast() }
> > +    }
> 
> I think the safety comment should talk about the pointee rather than the
> pointer? "The pointee of `i_sb` is immutable, and ..."

inode::i_sb (the pointer) shouldn't be reassigned to a different
superblock during the lifetime of the inode; but the superblock object
itself (the pointee) is very much mutable.

--D

> BR Andreas
> 
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 12/19] rust: fs: introduce `FileSystem::statfs`
  2023-10-18 12:25 ` [RFC PATCH 12/19] rust: fs: introduce `FileSystem::statfs` Wedson Almeida Filho
  2024-01-03 14:13   ` Andreas Hindborg (Samsung)
@ 2024-01-04  5:33   ` Darrick J. Wong
  2024-01-24  4:24     ` Wedson Almeida Filho
  1 sibling, 1 reply; 125+ messages in thread
From: Darrick J. Wong @ 2024-01-04  5:33 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Wed, Oct 18, 2023 at 09:25:11AM -0300, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> Allow Rust file systems to expose their stats. `overlayfs` requires that
> this be implemented by all file systems that are part of an overlay.
> The planned file systems need to be overlayed with overlayfs, so they
> must be able to implement this.
> 
> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> ---
>  rust/bindings/bindings_helper.h |  1 +
>  rust/kernel/error.rs            |  1 +
>  rust/kernel/fs.rs               | 52 ++++++++++++++++++++++++++++++++-
>  3 files changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index fa754c5e85a2..e2b2ccc835e3 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -11,6 +11,7 @@
>  #include <linux/fs.h>
>  #include <linux/fs_context.h>
>  #include <linux/slab.h>
> +#include <linux/statfs.h>
>  #include <linux/pagemap.h>
>  #include <linux/refcount.h>
>  #include <linux/wait.h>
> diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
> index 6c167583b275..829756cf6c48 100644
> --- a/rust/kernel/error.rs
> +++ b/rust/kernel/error.rs
> @@ -83,6 +83,7 @@ macro_rules! declare_err {
>      declare_err!(ENOGRACE, "NFS file lock reclaim refused.");
>      declare_err!(ENODATA, "No data available.");
>      declare_err!(EOPNOTSUPP, "Operation not supported on transport endpoint.");
> +    declare_err!(ENOSYS, "Invalid system call number.");
>  }
>  
>  /// Generic integer kernel error.
> diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> index adf9cbee16d2..8f34da50e694 100644
> --- a/rust/kernel/fs.rs
> +++ b/rust/kernel/fs.rs
> @@ -50,6 +50,31 @@ pub trait FileSystem {
>      fn read_xattr(_inode: &INode<Self>, _name: &CStr, _outbuf: &mut [u8]) -> Result<usize> {
>          Err(EOPNOTSUPP)
>      }
> +
> +    /// Get filesystem statistics.
> +    fn statfs(_sb: &SuperBlock<Self>) -> Result<Stat> {
> +        Err(ENOSYS)
> +    }
> +}
> +
> +/// File system stats.
> +///
> +/// A subset of C's `kstatfs`.
> +pub struct Stat {
> +    /// Magic number of the file system.
> +    pub magic: u32,
> +
> +    /// The maximum length of a file name.
> +    pub namelen: i64,

Yikes, I hope I never see an 8EB filename.  The C side doesn't handle
names longer than 255 bytes.

> +
> +    /// Block size.
> +    pub bsize: i64,

Or an 8EB block size.  SMR notwithstanding, I think this could be u32.

Why are these values signed?  Nobody has a -1k block filesystem.

> +    /// Number of files in the file system.
> +    pub files: u64,
> +
> +    /// Number of blocks in the file system.
> +    pub blocks: u64,
>  }
>  
>  /// The types of directory entries reported by [`FileSystem::read_dir`].
> @@ -478,7 +503,7 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>          freeze_fs: None,
>          thaw_super: None,
>          unfreeze_fs: None,
> -        statfs: None,
> +        statfs: Some(Self::statfs_callback),
>          remount_fs: None,
>          umount_begin: None,
>          show_options: None,
> @@ -496,6 +521,31 @@ impl<T: FileSystem + ?Sized> Tables<T> {
>          shutdown: None,
>      };
>  
> +    unsafe extern "C" fn statfs_callback(
> +        dentry: *mut bindings::dentry,
> +        buf: *mut bindings::kstatfs,
> +    ) -> core::ffi::c_int {
> +        from_result(|| {
> +            // SAFETY: The C API guarantees that `dentry` is valid for read. `d_sb` is
> +            // immutable, so it's safe to read it. The superblock is guaranteed to be valid dor
> +            // the duration of the call.
> +            let sb = unsafe { &*(*dentry).d_sb.cast::<SuperBlock<T>>() };
> +            let s = T::statfs(sb)?;
> +
> +            // SAFETY: The C API guarantees that `buf` is valid for read and write.
> +            let buf = unsafe { &mut *buf };
> +            buf.f_type = s.magic.into();
> +            buf.f_namelen = s.namelen;
> +            buf.f_bsize = s.bsize;
> +            buf.f_files = s.files;
> +            buf.f_blocks = s.blocks;
> +            buf.f_bfree = 0;
> +            buf.f_bavail = 0;
> +            buf.f_ffree = 0;

Why is it necessary to fill out the C structure with zeroes?
statfs_by_dentry zeroes the buffer contents before calling ->statfs.

--D

> +            Ok(0)
> +        })
> +    }
> +
>      const XATTR_HANDLERS: [*const bindings::xattr_handler; 2] = [&Self::XATTR_HANDLER, ptr::null()];
>  
>      const XATTR_HANDLER: bindings::xattr_handler = bindings::xattr_handler {
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 05/19] rust: fs: introduce `INode<T>`
  2024-01-04  5:20     ` Darrick J. Wong
@ 2024-01-04  9:57       ` Andreas Hindborg (Samsung)
  0 siblings, 0 replies; 125+ messages in thread
From: Andreas Hindborg (Samsung) @ 2024-01-04  9:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Wedson Almeida Filho, Alexander Viro, Christian Brauner,
	Matthew Wilcox, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho


"Darrick J. Wong" <djwong@kernel.org> writes:

[...]

>> > +    /// Returns the super-block that owns the inode.
>> > +    pub fn super_block(&self) -> &SuperBlock<T> {
>> > +        // SAFETY: `i_sb` is immutable, and `self` is guaranteed to be valid by the existence of a
>> > +        // shared reference (&self) to it.
>> > +        unsafe { &*(*self.0.get()).i_sb.cast() }
>> > +    }
>> 
>> I think the safety comment should talk about the pointee rather than the
>> pointer? "The pointee of `i_sb` is immutable, and ..."
>
> inode::i_sb (the pointer) shouldn't be reassigned to a different
> superblock during the lifetime of the inode; but the superblock object
> itself (the pointee) is very much mutable.

Ah, I thought the comment was about why it is sound to create
`&SuperBlock`, but it is referring to why it is sound to read `i_sb`.
Perhaps the comment should state this? Perhaps it is also worth
mentioning why it is OK to construct a shared reference from this
pointer?

Best regards
Andreas

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-03 18:02     ` Matthew Wilcox
  2024-01-03 19:04       ` Wedson Almeida Filho
  2024-01-03 19:14       ` Kent Overstreet
@ 2024-01-05  0:04       ` David Howells
  2024-01-05 15:54         ` Jarkko Sakkinen
  2 siblings, 1 reply; 125+ messages in thread
From: David Howells @ 2024-01-05  0:04 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: dhowells, Matthew Wilcox, Wedson Almeida Filho, Alexander Viro,
	Christian Brauner, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho

Kent Overstreet <kent.overstreet@linux.dev> wrote:

> So instead, it would seem easier to me to do the cleaner version on the
> Rust side, and then once we know what that looks like, maybe we update
> the C version to match - or maybe we light it all on fire and continue
> with rewriting everything in Rust... *shrug*

Please, no.  Please keep Rust separate and out of the core of the kernel and
subsystems.

David


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-05  0:04       ` David Howells
@ 2024-01-05 15:54         ` Jarkko Sakkinen
  0 siblings, 0 replies; 125+ messages in thread
From: Jarkko Sakkinen @ 2024-01-05 15:54 UTC (permalink / raw)
  To: David Howells, Kent Overstreet
  Cc: Matthew Wilcox, Wedson Almeida Filho, Alexander Viro,
	Christian Brauner, Kent Overstreet, Greg Kroah-Hartman,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho

On Fri Jan 5, 2024 at 2:04 AM EET, David Howells wrote:
> Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> > So instead, it would seem easier to me to do the cleaner version on the
> > Rust side, and then once we know what that looks like, maybe we update
> > the C version to match - or maybe we light it all on fire and continue
> > with rewriting everything in Rust... *shrug*
>
> Please, no.  Please keep Rust separate and out of the core of the kernel and
> subsystems.
>
> David

Yeah, if we ignore that code is field-tested in some cases literally
decades, is infrastructure critical in global scale and similar QA
metrics, any major Rust update to the core code would be pretty hard
to manage for stable kernels...

Using Rust in core would probably require decisions at least that are
not in the scope of a single patch set.

BR, Jarkko

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-04  1:49         ` Matthew Wilcox
@ 2024-01-09 18:25           ` Wedson Almeida Filho
  2024-01-09 19:30             ` Matthew Wilcox
  0 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2024-01-09 18:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, 3 Jan 2024 at 22:49, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Jan 03, 2024 at 04:04:26PM -0300, Wedson Almeida Filho wrote:
> > On Wed, 3 Jan 2024 at 15:02, Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Tue, Oct 31, 2023 at 05:14:08PM -0300, Wedson Almeida Filho wrote:
> > > > > Also, I see you're passing an inode to read_dir.  Why did you decide to
> > > > > do that?  There's information in the struct file that's either necessary
> > > > > or useful to have in the filesystem.  Maybe not in toy filesystems, but eg
> > > > > network filesystems need user credentials to do readdir, which are stored
> > > > > in struct file.  Block filesystems store readahead data in struct file.
> > > >
> > > > Because the two file systems we have don't use anything from `struct
> > > > file` beyond the inode.
> > > >
> > > > Passing a `file` to `read_dir` would require us to introduce an
> > > > unnecessary abstraction that no one uses, which we've been told not to
> > > > do.
> > > >
> > > > There is no technical reason that makes it impractical though. We can
> > > > add it when the need arises.
> > >
> > > Then we shouldn't merge any of this, or even send it out for review
> > > again until there is at least one non-toy filesystems implemented.
> >
> > What makes you characterize these filesystems as toys? The fact that
> > they only use the file's inode in iterate_shared?
>
> They're not real filesystems.  You can't put, eg, root or your home
> directory on one of these filesystems.

tarfs is a real file system, we use it to mount read-only container
layers on top of dm-verity for integrity.

The root of a container is made of potentially several of these
layers, overlaid with overlayfs. We use this in confidential kata
containers where we need to enforce authenticity and integrity of
data: with tarfs, the original tar file is exposed to confidential
VMs, so we can use existing signatures to verify that an attacker
hasn't modified the data before the container starts, and dm-verity
ensures that we catch any attempts by the host to change data after
the container is running.

> > > Either stick to the object orientation we've already defined (ie
> > > separate aops, iops, fops, ... with substantially similar arguments)
> > > or propose changes to the ones we have in C.  Dealing only with toy
> > > filesystems is leading you to bad architecture.
> >
> > I'm trying to understand the argument here. Are saying that Rust
> > cannot have different APIs with the same performance characteristics
> > as C's, unless we also fix the C apis?
> >
> > That isn't even a requirement when introducing new C apis, why would
> > it be a requirement for Rust apis?
>
> I'm saying that we have the current object orientation (eg each inode
> is an object with inode methods) for a reason.  Don't change it without
> understanding what that reason is.  And moving, eg iterate_shared() from
> file_operations to struct file_system_type (effectively what you've done)
> is something we obviously wouldn't want to do.

I don't think I'm changing anything. AFAICT, I'm adding a way to write
file systems in Rust. It uses the C API faithfully -- if you find ways
in which it doesn't, I'd be happy to fix them.

To show its usefulness, I'm providing a real file system that uses it,
is simpler than the C version, and contains no unsafe code. So barring
bugs in the abstractions, it contains no memory safety issues.

Why do you feel I need to mimic the unsafe (in the sense that the
compiler doesn't help you prevent safety issues) way C does it _now_?

Cheers,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-03 20:41         ` Al Viro
@ 2024-01-09 19:13           ` Wedson Almeida Filho
  2024-01-09 19:25             ` Matthew Wilcox
  0 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2024-01-09 19:13 UTC (permalink / raw)
  To: Al Viro, Greg Kroah-Hartman
  Cc: Kent Overstreet, Matthew Wilcox, Christian Brauner,
	Kent Overstreet, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, 3 Jan 2024 at 17:41, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Jan 03, 2024 at 02:14:34PM -0500, Kent Overstreet wrote:
>
> > We don't need to copy the C interface as is; we can use this as an
> > opportunity to incrementally design a new API that will obviously take
> > lessons from the C API (since it's wrapping it), but it doesn't have to
> > do things the same and it doesn't have to do everything all at once.
> >
> > Anyways, like you alluded to the C side is a bit of a mess w.r.t. what's
> > in a_ops vs. i_ops, and cleaning that up on the C side is a giant hassle
> > because then you have to fix _everything_ that implements or consumes
> > those interfaces at the same time.
> >
> > So instead, it would seem easier to me to do the cleaner version on the
> > Rust side, and then once we know what that looks like, maybe we update
> > the C version to match - or maybe we light it all on fire and continue
> > with rewriting everything in Rust... *shrug*
>
> No.  This "cleaner version on the Rust side" is nothing of that sort;
> this "readdir doesn't need any state that might be different for different
> file instances beyond the current position, because none of our examples
> have needed that so far" is a good example of the garbage we really do
> not need to deal with.

What you're calling garbage is what Greg KH asked us to do, namely,
not introduce anything for which there are no users. See a couple of
quotes below.

https://lore.kernel.org/rust-for-linux/2023081411-apache-tubeless-7bb3@gregkh/
The best feedback is "who will use these new interfaces?"  Without that,
it's really hard to review a patchset as it's difficult to see how the
bindings will be used, right?

https://lore.kernel.org/rust-for-linux/2023071049-gigabyte-timing-0673@gregkh/
And I'd recommend that we not take any more bindings without real users,
as there seems to be just a collection of these and it's hard to
actually review them to see how they are used...

Cheers,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-09 19:13           ` Wedson Almeida Filho
@ 2024-01-09 19:25             ` Matthew Wilcox
  2024-01-09 19:32               ` Greg Kroah-Hartman
  2024-01-09 22:19               ` Dave Chinner
  0 siblings, 2 replies; 125+ messages in thread
From: Matthew Wilcox @ 2024-01-09 19:25 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Al Viro, Greg Kroah-Hartman, Kent Overstreet, Christian Brauner,
	Kent Overstreet, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Tue, Jan 09, 2024 at 04:13:15PM -0300, Wedson Almeida Filho wrote:
> On Wed, 3 Jan 2024 at 17:41, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > No.  This "cleaner version on the Rust side" is nothing of that sort;
> > this "readdir doesn't need any state that might be different for different
> > file instances beyond the current position, because none of our examples
> > have needed that so far" is a good example of the garbage we really do
> > not need to deal with.
> 
> What you're calling garbage is what Greg KH asked us to do, namely,
> not introduce anything for which there are no users. See a couple of
> quotes below.
> 
> https://lore.kernel.org/rust-for-linux/2023081411-apache-tubeless-7bb3@gregkh/
> The best feedback is "who will use these new interfaces?"  Without that,
> it's really hard to review a patchset as it's difficult to see how the
> bindings will be used, right?
> 
> https://lore.kernel.org/rust-for-linux/2023071049-gigabyte-timing-0673@gregkh/
> And I'd recommend that we not take any more bindings without real users,
> as there seems to be just a collection of these and it's hard to
> actually review them to see how they are used...

You've misunderstood Greg.  He's saying (effectively) "No fs bindings
without a filesystem to use them".  And Al, myself and others are saying
"Your filesystem interfaces are wrong because they're not usable for real
filesystems".  And you're saying "But I'm not allowed to change them".
And that's not true.  Change them to be laid out how a real filesystem
would need them to be.  Or argue that your current interfaces are the
right ones (they aren't).

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-09 18:25           ` Wedson Almeida Filho
@ 2024-01-09 19:30             ` Matthew Wilcox
  0 siblings, 0 replies; 125+ messages in thread
From: Matthew Wilcox @ 2024-01-09 19:30 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Tue, Jan 09, 2024 at 03:25:11PM -0300, Wedson Almeida Filho wrote:
> On Wed, 3 Jan 2024 at 22:49, Matthew Wilcox <willy@infradead.org> wrote:
> > > What makes you characterize these filesystems as toys? The fact that
> > > they only use the file's inode in iterate_shared?
> >
> > They're not real filesystems.  You can't put, eg, root or your home
> > directory on one of these filesystems.
> 
> tarfs is a real file system, we use it to mount read-only container
> layers on top of dm-verity for integrity.

You're using it in production?  Oh dear.

> > > I'm trying to understand the argument here. Are saying that Rust
> > > cannot have different APIs with the same performance characteristics
> > > as C's, unless we also fix the C apis?
> > >
> > > That isn't even a requirement when introducing new C apis, why would
> > > it be a requirement for Rust apis?
> >
> > I'm saying that we have the current object orientation (eg each inode
> > is an object with inode methods) for a reason.  Don't change it without
> > understanding what that reason is.  And moving, eg iterate_shared() from
> > file_operations to struct file_system_type (effectively what you've done)
> > is something we obviously wouldn't want to do.
> 
> I don't think I'm changing anything. AFAICT, I'm adding a way to write
> file systems in Rust. It uses the C API faithfully -- if you find ways
> in which it doesn't, I'd be happy to fix them.

You are changing the _object model_.  The C API has separate objects
for inodes, files, filesystems, superblocks, dentries, etc, etc.  You've
just smashed all of it together into a FileSystem which implements all
of the inode, file, address_space, etc, etc ops.  And this is the wrong
approach.


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-09 19:25             ` Matthew Wilcox
@ 2024-01-09 19:32               ` Greg Kroah-Hartman
  2024-01-10  7:49                 ` Wedson Almeida Filho
  2024-01-09 22:19               ` Dave Chinner
  1 sibling, 1 reply; 125+ messages in thread
From: Greg Kroah-Hartman @ 2024-01-09 19:32 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Wedson Almeida Filho, Al Viro, Kent Overstreet, Christian Brauner,
	Kent Overstreet, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Tue, Jan 09, 2024 at 07:25:38PM +0000, Matthew Wilcox wrote:
> On Tue, Jan 09, 2024 at 04:13:15PM -0300, Wedson Almeida Filho wrote:
> > On Wed, 3 Jan 2024 at 17:41, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > No.  This "cleaner version on the Rust side" is nothing of that sort;
> > > this "readdir doesn't need any state that might be different for different
> > > file instances beyond the current position, because none of our examples
> > > have needed that so far" is a good example of the garbage we really do
> > > not need to deal with.
> > 
> > What you're calling garbage is what Greg KH asked us to do, namely,
> > not introduce anything for which there are no users. See a couple of
> > quotes below.
> > 
> > https://lore.kernel.org/rust-for-linux/2023081411-apache-tubeless-7bb3@gregkh/
> > The best feedback is "who will use these new interfaces?"  Without that,
> > it's really hard to review a patchset as it's difficult to see how the
> > bindings will be used, right?
> > 
> > https://lore.kernel.org/rust-for-linux/2023071049-gigabyte-timing-0673@gregkh/
> > And I'd recommend that we not take any more bindings without real users,
> > as there seems to be just a collection of these and it's hard to
> > actually review them to see how they are used...
> 
> You've misunderstood Greg.  He's saying (effectively) "No fs bindings
> without a filesystem to use them".  And Al, myself and others are saying
> "Your filesystem interfaces are wrong because they're not usable for real
> filesystems".  And you're saying "But I'm not allowed to change them".
> And that's not true.  Change them to be laid out how a real filesystem
> would need them to be.

Note, I agree, change them to work our a "real" filesystem would need
them and then, automatically, all of the "fake" filesystems like
currently underway (i.e. tarfs) will work just fine too, right?  That
way we can drop the .c code for binderfs at the same time, also a nice
win.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-09 19:25             ` Matthew Wilcox
  2024-01-09 19:32               ` Greg Kroah-Hartman
@ 2024-01-09 22:19               ` Dave Chinner
  2024-01-10 19:19                 ` Kent Overstreet
  1 sibling, 1 reply; 125+ messages in thread
From: Dave Chinner @ 2024-01-09 22:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Wedson Almeida Filho, Al Viro, Greg Kroah-Hartman,
	Kent Overstreet, Christian Brauner, Kent Overstreet,
	linux-fsdevel, rust-for-linux, Wedson Almeida Filho

On Tue, Jan 09, 2024 at 07:25:38PM +0000, Matthew Wilcox wrote:
> On Tue, Jan 09, 2024 at 04:13:15PM -0300, Wedson Almeida Filho wrote:
> > On Wed, 3 Jan 2024 at 17:41, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > No.  This "cleaner version on the Rust side" is nothing of that sort;
> > > this "readdir doesn't need any state that might be different for different
> > > file instances beyond the current position, because none of our examples
> > > have needed that so far" is a good example of the garbage we really do
> > > not need to deal with.
> > 
> > What you're calling garbage is what Greg KH asked us to do, namely,
> > not introduce anything for which there are no users. See a couple of
> > quotes below.
> > 
> > https://lore.kernel.org/rust-for-linux/2023081411-apache-tubeless-7bb3@gregkh/
> > The best feedback is "who will use these new interfaces?"  Without that,
> > it's really hard to review a patchset as it's difficult to see how the
> > bindings will be used, right?
> > 
> > https://lore.kernel.org/rust-for-linux/2023071049-gigabyte-timing-0673@gregkh/
> > And I'd recommend that we not take any more bindings without real users,
> > as there seems to be just a collection of these and it's hard to
> > actually review them to see how they are used...
> 
> You've misunderstood Greg.  He's saying (effectively) "No fs bindings
> without a filesystem to use them".  And Al, myself and others are saying
> "Your filesystem interfaces are wrong because they're not usable for real
> filesystems".

And that's why I've been saying that the first Rust filesystem that
should be implemented is an ext2 clone. That's our "reference
filesystem" for people who want to learn how filesystems should be
implemented in Linux - it's relatively simple but fully featured and
uses much of the generic abstractions and infrastructure.

At minimum, we need a filesystem implementation that is fully
read-write, supports truncate and rename, and has a fully functional
userspace and test infrastructure so that we can actually verify
that the Rust code does what it says on the label. ext2 ticks all of
these boxes....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-09 19:32               ` Greg Kroah-Hartman
@ 2024-01-10  7:49                 ` Wedson Almeida Filho
  2024-01-10  7:57                   ` Greg Kroah-Hartman
  2024-01-10 12:56                   ` Matthew Wilcox
  0 siblings, 2 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2024-01-10  7:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Matthew Wilcox, Al Viro, Kent Overstreet, Christian Brauner,
	Kent Overstreet, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Tue, 9 Jan 2024 at 16:32, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Tue, Jan 09, 2024 at 07:25:38PM +0000, Matthew Wilcox wrote:
> > You've misunderstood Greg.  He's saying (effectively) "No fs bindings
> > without a filesystem to use them".  And Al, myself and others are saying
> > "Your filesystem interfaces are wrong because they're not usable for real
> > filesystems".  And you're saying "But I'm not allowed to change them".
> > And that's not true.  Change them to be laid out how a real filesystem
> > would need them to be.

Ok, then I'll update the code to have 3 additional traits:

FileOperations
INodeOperations
AddressSpaceOperations

When one initialises an inode, one gets to pick all three.

And FileOperations::read_dir will take a File<T> as its first argument
(instead of an INode<T>).

Does this sound reasonable?

> Note, I agree, change them to work our a "real" filesystem would need
> them and then, automatically, all of the "fake" filesystems like
> currently underway (i.e. tarfs) will work just fine too, right?  That
> way we can drop the .c code for binderfs at the same time, also a nice
> win.

Are you volunteering to rewrite binderfs once rust bindings are available? :)

Cheers,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-10  7:49                 ` Wedson Almeida Filho
@ 2024-01-10  7:57                   ` Greg Kroah-Hartman
  2024-01-10 12:56                   ` Matthew Wilcox
  1 sibling, 0 replies; 125+ messages in thread
From: Greg Kroah-Hartman @ 2024-01-10  7:57 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Matthew Wilcox, Al Viro, Kent Overstreet, Christian Brauner,
	Kent Overstreet, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Jan 10, 2024 at 04:49:02AM -0300, Wedson Almeida Filho wrote:
> > Note, I agree, change them to work our a "real" filesystem would need
> > them and then, automatically, all of the "fake" filesystems like
> > currently underway (i.e. tarfs) will work just fine too, right?  That
> > way we can drop the .c code for binderfs at the same time, also a nice
> > win.
> 
> Are you volunteering to rewrite binderfs once rust bindings are available? :)

Sure, would be glad to do so, after the binder conversion to rust is
merged :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 05/19] rust: fs: introduce `INode<T>`
  2024-01-03 12:54   ` Andreas Hindborg (Samsung)
  2024-01-04  5:20     ` Darrick J. Wong
@ 2024-01-10  9:45     ` Benno Lossin
  1 sibling, 0 replies; 125+ messages in thread
From: Benno Lossin @ 2024-01-10  9:45 UTC (permalink / raw)
  To: Andreas Hindborg (Samsung), Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 03.01.24 13:54, Andreas Hindborg (Samsung) wrote:
> Wedson Almeida Filho <wedsonaf@gmail.com> writes:
>> +    /// Returns the super-block that owns the inode.
>> +    pub fn super_block(&self) -> &SuperBlock<T> {
>> +        // SAFETY: `i_sb` is immutable, and `self` is guaranteed to be valid by the existence of a
>> +        // shared reference (&self) to it.
>> +        unsafe { &*(*self.0.get()).i_sb.cast() }
>> +    }
> 
> I think the safety comment should talk about the pointee rather than the
> pointer? "The pointee of `i_sb` is immutable, and ..."

I think in this case it would be a very good idea to just split
the `unsafe` block into two parts. That would solve the issue
of "what does this safety comment justify?".

-- 
Cheers,
Benno


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-10  7:49                 ` Wedson Almeida Filho
  2024-01-10  7:57                   ` Greg Kroah-Hartman
@ 2024-01-10 12:56                   ` Matthew Wilcox
  1 sibling, 0 replies; 125+ messages in thread
From: Matthew Wilcox @ 2024-01-10 12:56 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Greg Kroah-Hartman, Al Viro, Kent Overstreet, Christian Brauner,
	Kent Overstreet, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Jan 10, 2024 at 04:49:02AM -0300, Wedson Almeida Filho wrote:
> On Tue, 9 Jan 2024 at 16:32, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Tue, Jan 09, 2024 at 07:25:38PM +0000, Matthew Wilcox wrote:
> > > You've misunderstood Greg.  He's saying (effectively) "No fs bindings
> > > without a filesystem to use them".  And Al, myself and others are saying
> > > "Your filesystem interfaces are wrong because they're not usable for real
> > > filesystems".  And you're saying "But I'm not allowed to change them".
> > > And that's not true.  Change them to be laid out how a real filesystem
> > > would need them to be.
> 
> Ok, then I'll update the code to have 3 additional traits:
> 
> FileOperations
> INodeOperations
> AddressSpaceOperations
> 
> When one initialises an inode, one gets to pick all three.

That makes sense, yes.

> And FileOperations::read_dir will take a File<T> as its first argument
> (instead of an INode<T>).
> 
> Does this sound reasonable?

yep!


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 03/19] samples: rust: add initial ro file system sample
  2023-10-28 16:18   ` Alice Ryhl
@ 2024-01-10 18:25     ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2024-01-10 18:25 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho, Alexander Viro,
	Christian Brauner, Matthew Wilcox

On Sat, 28 Oct 2023 at 13:17, Alice Ryhl <alice@ryhl.io> wrote:
>
> On 10/18/23 14:25, Wedson Almeida Filho wrote:> +kernel::module_fs! {
> > +    type: RoFs,
> > +    name: "rust_rofs",
> > +    author: "Rust for Linux Contributors",
> > +    description: "Rust read-only file system sample",
> > +    license: "GPL",
> > +}
> > +
> > +struct RoFs;
> > +impl fs::FileSystem for RoFs {
> > +    const NAME: &'static CStr = c_str!("rust-fs");
> > +}
>
> Why use two different names here?

I actually wanted the same name, but the string in the module macros
don't accept dashes (they need to be identifiers).

I discussed this with Miguel a couple of years ago but we decided to
wait and see before doing anything. Since then I noticed that Rust
automatically converts dashes to underscores in crate names so that
they can be used as identifiers in the language. So I guess there's a
precedent if we decide to do something similar.

For now I'll change rust-fs to rust_rofs as the fs name.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 01/19] rust: fs: add registration/unregistration of file systems
  2023-10-18 15:38   ` Benno Lossin
@ 2024-01-10 18:32     ` Wedson Almeida Filho
  2024-01-25  9:15       ` Benno Lossin
  0 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2024-01-10 18:32 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Wed, 18 Oct 2023 at 12:38, Benno Lossin <benno.lossin@proton.me> wrote:
> On 18.10.23 14:25, Wedson Almeida Filho wrote:
> > +/// A registration of a file system.
> > +#[pin_data(PinnedDrop)]
> > +pub struct Registration {
> > +    #[pin]
> > +    fs: Opaque<bindings::file_system_type>,
> > +    #[pin]
> > +    _pin: PhantomPinned,
>
> Note that since commit 0b4e3b6f6b79 ("rust: types: make `Opaque` be
> `!Unpin`") you do not need an extra pinned `PhantomPinned` in your struct
> (if you already have a pinned `Opaque`), since `Opaque` already is
> `!Unpin`.

Will remove in v2.

> > +impl Registration {
> > +    /// Creates the initialiser of a new file system registration.
> > +    pub fn new<T: FileSystem + ?Sized>(module: &'static ThisModule) -> impl PinInit<Self, Error> {
>
> I am a bit curious why you specify `?Sized` here, is it common
> for types that implement `FileSystem` to not be `Sized`?
>
> Or do you want to use `dyn FileSystem`?

No reason beyond `Sized` being a restriction I don't need.

For something I was doing early on in binder, I ended up having to
change a bunch of generic type decls to allow !Sized, so here I'm
doing it preemptively as I don't lose anything.

> > +        try_pin_init!(Self {
> > +            _pin: PhantomPinned,
> > +            fs <- Opaque::try_ffi_init(|fs_ptr: *mut bindings::file_system_type| {
> > +                // SAFETY: `try_ffi_init` guarantees that `fs_ptr` is valid for write.
> > +                unsafe { fs_ptr.write(bindings::file_system_type::default()) };
> > +
> > +                // SAFETY: `try_ffi_init` guarantees that `fs_ptr` is valid for write, and it has
> > +                // just been initialised above, so it's also valid for read.
> > +                let fs = unsafe { &mut *fs_ptr };
> > +                fs.owner = module.0;
> > +                fs.name = T::NAME.as_char_ptr();
> > +                fs.init_fs_context = Some(Self::init_fs_context_callback);
> > +                fs.kill_sb = Some(Self::kill_sb_callback);
> > +                fs.fs_flags = 0;
> > +
> > +                // SAFETY: Pointers stored in `fs` are static so will live for as long as the
> > +                // registration is active (it is undone in `drop`).
> > +                to_result(unsafe { bindings::register_filesystem(fs_ptr) })
> > +            }),
> > +        })
> > +    }
> > +
> > +    unsafe extern "C" fn init_fs_context_callback(
> > +        _fc_ptr: *mut bindings::fs_context,
> > +    ) -> core::ffi::c_int {
> > +        from_result(|| Err(ENOTSUPP))
> > +    }
> > +
> > +    unsafe extern "C" fn kill_sb_callback(_sb_ptr: *mut bindings::super_block) {}
> > +}
> > +
> > +#[pinned_drop]
> > +impl PinnedDrop for Registration {
> > +    fn drop(self: Pin<&mut Self>) {
> > +        // SAFETY: If an instance of `Self` has been successfully created, a call to
> > +        // `register_filesystem` has necessarily succeeded. So it's ok to call
> > +        // `unregister_filesystem` on the previously registered fs.
>
> I would simply add an invariant on `Registration` that `self.fs` is
> registered, then you do not need such a lengthy explanation here.

Since this is the only place I need this explanation, I prefer to
leave it here because it's exactly where I need it.

Thanks,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-09 22:19               ` Dave Chinner
@ 2024-01-10 19:19                 ` Kent Overstreet
  2024-01-24 13:08                   ` FUJITA Tomonori
  0 siblings, 1 reply; 125+ messages in thread
From: Kent Overstreet @ 2024-01-10 19:19 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Matthew Wilcox, Wedson Almeida Filho, Al Viro, Greg Kroah-Hartman,
	Christian Brauner, Kent Overstreet, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Jan 10, 2024 at 09:19:37AM +1100, Dave Chinner wrote:
> On Tue, Jan 09, 2024 at 07:25:38PM +0000, Matthew Wilcox wrote:
> > On Tue, Jan 09, 2024 at 04:13:15PM -0300, Wedson Almeida Filho wrote:
> > > On Wed, 3 Jan 2024 at 17:41, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > > No.  This "cleaner version on the Rust side" is nothing of that sort;
> > > > this "readdir doesn't need any state that might be different for different
> > > > file instances beyond the current position, because none of our examples
> > > > have needed that so far" is a good example of the garbage we really do
> > > > not need to deal with.
> > > 
> > > What you're calling garbage is what Greg KH asked us to do, namely,
> > > not introduce anything for which there are no users. See a couple of
> > > quotes below.
> > > 
> > > https://lore.kernel.org/rust-for-linux/2023081411-apache-tubeless-7bb3@gregkh/
> > > The best feedback is "who will use these new interfaces?"  Without that,
> > > it's really hard to review a patchset as it's difficult to see how the
> > > bindings will be used, right?
> > > 
> > > https://lore.kernel.org/rust-for-linux/2023071049-gigabyte-timing-0673@gregkh/
> > > And I'd recommend that we not take any more bindings without real users,
> > > as there seems to be just a collection of these and it's hard to
> > > actually review them to see how they are used...
> > 
> > You've misunderstood Greg.  He's saying (effectively) "No fs bindings
> > without a filesystem to use them".  And Al, myself and others are saying
> > "Your filesystem interfaces are wrong because they're not usable for real
> > filesystems".
> 
> And that's why I've been saying that the first Rust filesystem that
> should be implemented is an ext2 clone. That's our "reference
> filesystem" for people who want to learn how filesystems should be
> implemented in Linux - it's relatively simple but fully featured and
> uses much of the generic abstractions and infrastructure.
> 
> At minimum, we need a filesystem implementation that is fully
> read-write, supports truncate and rename, and has a fully functional
> userspace and test infrastructure so that we can actually verify
> that the Rust code does what it says on the label. ext2 ticks all of
> these boxes....

I think someone was working on that? But I'd prefer that not to be a
condition of merging the VFS interfaces; we've got multiple new Rust
filesystems being implemented and I'm also planning on merging Rust
bcachefs code next merge window.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir`
  2023-10-18 12:25 ` [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir` Wedson Almeida Filho
  2023-10-21  8:33   ` Benno Lossin
  2024-01-03 14:09   ` Andreas Hindborg (Samsung)
@ 2024-01-21 21:00   ` Askar Safin
  2024-01-21 21:51     ` Dave Chinner
  2 siblings, 1 reply; 125+ messages in thread
From: Askar Safin @ 2024-01-21 21:00 UTC (permalink / raw)
  To: wedsonaf
  Cc: brauner, gregkh, kent.overstreet, linux-fsdevel, rust-for-linux,
	viro, walmeida, willy

Wedson Almeida Filho:
> +    /// White-out type.
> +    Wht = bindings::DT_WHT,

As well as I understand, filesystems supposed not to return
DT_WHT from readdir to user space. But I'm not sure. Please,
do expirement! Create whiteout on ext4 and see what readdir
will return. As well as I understand, it will return DT_CHR.

So, I think DT_WHT should be deleted here.

Askar Safin

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir`
  2024-01-21 21:00   ` Askar Safin
@ 2024-01-21 21:51     ` Dave Chinner
  0 siblings, 0 replies; 125+ messages in thread
From: Dave Chinner @ 2024-01-21 21:51 UTC (permalink / raw)
  To: Askar Safin
  Cc: wedsonaf, brauner, gregkh, kent.overstreet, linux-fsdevel,
	rust-for-linux, viro, walmeida, willy

On Mon, Jan 22, 2024 at 12:00:49AM +0300, Askar Safin wrote:
> Wedson Almeida Filho:
> > +    /// White-out type.
> > +    Wht = bindings::DT_WHT,
> 
> As well as I understand, filesystems supposed not to return
> DT_WHT from readdir to user space. But I'm not sure. Please,
> do expirement! Create whiteout on ext4 and see what readdir
> will return. As well as I understand, it will return DT_CHR.

DT_WHT is defined in /usr/include/dirent.h, so it is actually
present in the userspace support for readdir. If the kernel returns
DT_WHT to userspace, applications should know what it is.

However, filesystems like ext4 and btrfs don't have DT_WHT on disk
and few userspace applications support it. Way back when overlay
required whiteout support to be added, the magical char device
representation was invented for filesystems without DT_WHT and that
was exposed to userspace.

We're kind of stuck with it now, though there is nothign stopping
filesysetms from returning DT_WHT to userspace instead of DT_CHR and
requiring userspace to stat the inode to look at the major/minor
numbers to determine if the dirent is a whiteout or not.  Indeed, it
would be more optimal for overlay if filesystems returned DT_WHT
instead of DT_CHR for whiteouts.

Put simply: DT_WHT is part of the readdir kernel and userspace API
and therefore should be present in the Rust interfaces.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
  2024-01-03 13:29   ` Andreas Hindborg (Samsung)
@ 2024-01-24  4:07     ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2024-01-24  4:07 UTC (permalink / raw)
  To: Andreas Hindborg (Samsung)
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Wed, Jan 03, 2024 at 02:29:33PM +0100, Andreas Hindborg (Samsung) wrote:
> 
> Wedson Almeida Filho <wedsonaf@gmail.com> writes:
> 
> [...]
> 
> >  
> > +/// An inode that is locked and hasn't been initialised yet.
> > +#[repr(transparent)]
> > +pub struct NewINode<T: FileSystem + ?Sized>(ARef<INode<T>>);
> > +
> > +impl<T: FileSystem + ?Sized> NewINode<T> {
> > +    /// Initialises the new inode with the given parameters.
> > +    pub fn init(self, params: INodeParams) -> Result<ARef<INode<T>>> {
> > +        // SAFETY: This is a new inode, so it's safe to manipulate it mutably.
> > +        let inode = unsafe { &mut *self.0 .0.get() };
> 
> Perhaps it would make sense with a `UniqueARef` that guarantees
> uniqueness, in line with `alloc::UniqueRc`?

We do have something like that in the kernel crate for Rust-allocated
ref-counted memory, namely, UniqueArc.

But in this case, this is slightly different: the ref-count may be >1, it's just
that the other holders of pointers will refrain from accessing the object (for
some unspecified reason). We do have another case like this for folios. Perhaps
it does make sense to generalise the concept with a type; I'll look into this.

> 
> [...]
> 
> >  
> > +impl<T: FileSystem + ?Sized> SuperBlock<T> {
> > +    /// Tries to get an existing inode or create a new one if it doesn't exist yet.
> > +    pub fn get_or_create_inode(&self, ino: Ino) -> Result<Either<ARef<INode<T>>, NewINode<T>>> {
> > +        // SAFETY: The only initialisation missing from the superblock is the root, and this
> > +        // function is needed to create the root, so it's safe to call it.
> > +        let inode =
> > +            ptr::NonNull::new(unsafe { bindings::iget_locked(self.0.get(), ino) }).ok_or(ENOMEM)?;
> 
> I can't parse this safety comment properly.

Fixed in v2.

> > +
> > +        // SAFETY: `inode` is valid for read, but there could be concurrent writers (e.g., if it's
> > +        // an already-initialised inode), so we use `read_volatile` to read its current state.
> > +        let state = unsafe { ptr::read_volatile(ptr::addr_of!((*inode.as_ptr()).i_state)) };
> > +        if state & u64::from(bindings::I_NEW) == 0 {
> > +            // The inode is cached. Just return it.
> > +            //
> > +            // SAFETY: `inode` had its refcount incremented by `iget_locked`; this increment is now
> > +            // owned by `ARef`.
> > +            Ok(Either::Left(unsafe { ARef::from_raw(inode.cast()) }))
> > +        } else {
> > +            // SAFETY: The new inode is valid but not fully initialised yet, so it's ok to create a
> > +            // `NewINode`.
> > +            Ok(Either::Right(NewINode(unsafe {
> > +                ARef::from_raw(inode.cast())
> 
> I would suggest making the destination type explicit for the cast.

Done in v2.

> 
> > +            })))
> > +        }
> > +    }
> > +}
> > +
> >  /// Required superblock parameters.
> >  ///
> >  /// This is returned by implementations of [`FileSystem::super_params`].
> > @@ -215,41 +345,28 @@ impl<T: FileSystem + ?Sized> Tables<T> {
> >              sb.0.s_blocksize = 1 << sb.0.s_blocksize_bits;
> >              sb.0.s_flags |= bindings::SB_RDONLY;
> >  
> > -            // The following is scaffolding code that will be removed in a subsequent patch. It is
> > -            // needed to build a root dentry, otherwise core code will BUG().
> > -            // SAFETY: `sb` is the superblock being initialised, it is valid for read and write.
> > -            let inode = unsafe { bindings::new_inode(&mut sb.0) };
> > -            if inode.is_null() {
> > -                return Err(ENOMEM);
> > -            }
> > -
> > -            // SAFETY: `inode` is valid for write.
> > -            unsafe { bindings::set_nlink(inode, 2) };
> > -
> > -            {
> > -                // SAFETY: This is a newly-created inode. No other references to it exist, so it is
> > -                // safe to mutably dereference it.
> > -                let inode = unsafe { &mut *inode };
> > -                inode.i_ino = 1;
> > -                inode.i_mode = (bindings::S_IFDIR | 0o755) as _;
> > -
> > -                // SAFETY: `simple_dir_operations` never changes, it's safe to reference it.
> > -                inode.__bindgen_anon_3.i_fop = unsafe { &bindings::simple_dir_operations };
> > +            // SAFETY: The callback contract guarantees that `sb_ptr` is a unique pointer to a
> > +            // newly-created (and initialised above) superblock.
> > +            let sb = unsafe { &mut *sb_ptr.cast() };
> 
> Again, I would suggest an explicit destination type for the cast.

Done in v2.

> 
> > +            let root = T::init_root(sb)?;
> >  
> > -                // SAFETY: `simple_dir_inode_operations` never changes, it's safe to reference it.
> > -                inode.i_op = unsafe { &bindings::simple_dir_inode_operations };
> > +            // Reject root inode if it belongs to a different superblock.
> 
> I am curious how this would happen?

If a user mounts two instances of a file system and the implementation allocates
root inodes and swap them before returning. The types will match because they
are the same file system, but they'll have the wrong super-block.

Thanks,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 16/19] rust: fs: allow file systems backed by a block device
  2023-10-21 13:39   ` Benno Lossin
@ 2024-01-24  4:14     ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2024-01-24  4:14 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Sat, Oct 21, 2023 at 01:39:20PM +0000, Benno Lossin wrote:
> On 18.10.23 14:25, Wedson Almeida Filho wrote:
> > +    /// Reads a block from the block device.
> > +    #[cfg(CONFIG_BUFFER_HEAD)]
> > +    pub fn bread(&self, block: u64) -> Result<ARef<buffer::Head>> {
> > +        // Fail requests for non-blockdev file systems. This is a compile-time check.
> > +        match T::SUPER_TYPE {
> > +            Super::BlockDev => {}
> > +            _ => return Err(EIO),
> > +        }
> 
> Would it make sense to use `build_error` instead of returning an error
> here?

Yes, I changed these to `build_errors` in v2.

> Also, do you think that separating this into a trait, `BlockDevFS` would
> make sense?

We actually have several types; we happen to only support two cases now, but will add more over time.

> > +
> > +        // SAFETY: This function is only valid after the `NeedsInit` typestate, so the block size
> > +        // is known and the superblock can be used to read blocks.
> 
> Stale SAFETY comment, there are not typestates in this patch?

Fixed in v2.

> 
> > +        let ptr =
> > +            ptr::NonNull::new(unsafe { bindings::sb_bread(self.0.get(), block) }).ok_or(EIO)?;
> > +        // SAFETY: `sb_bread` returns a referenced buffer head. Ownership of the increment is
> > +        // passed to the `ARef` instance.
> > +        Ok(unsafe { ARef::from_raw(ptr.cast()) })
> > +    }
> > +
> > +    /// Reads `size` bytes starting from `offset` bytes.
> > +    ///
> > +    /// Returns an iterator that returns slices based on blocks.
> > +    #[cfg(CONFIG_BUFFER_HEAD)]
> > +    pub fn read(
> > +        &self,
> > +        offset: u64,
> > +        size: u64,
> > +    ) -> Result<impl Iterator<Item = Result<buffer::View>> + '_> {
> > +        struct BlockIter<'a, T: FileSystem + ?Sized> {
> > +            sb: &'a SuperBlock<T>,
> > +            next_offset: u64,
> > +            end: u64,
> > +        }
> > +        impl<'a, T: FileSystem + ?Sized> Iterator for BlockIter<'a, T> {
> > +            type Item = Result<buffer::View>;
> > +
> > +            fn next(&mut self) -> Option<Self::Item> {
> > +                if self.next_offset >= self.end {
> > +                    return None;
> > +                }
> > +
> > +                // SAFETY: The superblock is valid and has had its block size initialised.
> > +                let block_size = unsafe { (*self.sb.0.get()).s_blocksize };
> > +                let bh = match self.sb.bread(self.next_offset / block_size) {
> > +                    Ok(bh) => bh,
> > +                    Err(e) => return Some(Err(e)),
> > +                };
> > +                let boffset = self.next_offset & (block_size - 1);
> > +                let bsize = core::cmp::min(self.end - self.next_offset, block_size - boffset);
> > +                self.next_offset += bsize;
> > +                Some(Ok(buffer::View::new(bh, boffset as usize, bsize as usize)))
> > +            }
> > +        }
> > +        Ok(BlockIter {
> > +            sb: self,
> > +            next_offset: offset,
> > +            end: offset.checked_add(size).ok_or(ERANGE)?,
> > +        })
> > +    }
> >   }
> > 
> >   /// Required superblock parameters.
> > @@ -511,6 +591,70 @@ pub struct SuperParams<T: ForeignOwnable + Send + Sync> {
> >   #[repr(transparent)]
> >   pub struct NewSuperBlock<T: FileSystem + ?Sized>(bindings::super_block, PhantomData<T>);
> > 
> > +impl<T: FileSystem + ?Sized> NewSuperBlock<T> {
> > +    /// Reads sectors.
> > +    ///
> > +    /// `count` must be such that the total size doesn't exceed a page.
> > +    pub fn sread(&self, sector: u64, count: usize, folio: &mut UniqueFolio) -> Result {
> > +        // Fail requests for non-blockdev file systems. This is a compile-time check.
> > +        match T::SUPER_TYPE {
> > +            // The superblock is valid and given that it's a blockdev superblock it must have a
> > +            // valid `s_bdev`.
> > +            Super::BlockDev => {}
> > +            _ => return Err(EIO),
> > +        }
> > +
> > +        crate::build_assert!(count * (bindings::SECTOR_SIZE as usize) <= bindings::PAGE_SIZE);
> 
> Maybe add an error message that explains why this is not ok?
> 
> > +
> > +        // Read the sectors.
> > +        let mut bio = bindings::bio::default();
> > +        let bvec = Opaque::<bindings::bio_vec>::uninit();
> > +
> > +        // SAFETY: `bio` and `bvec` are allocated on the stack, they're both valid.
> > +        unsafe {
> > +            bindings::bio_init(
> > +                &mut bio,
> > +                self.0.s_bdev,
> > +                bvec.get(),
> > +                1,
> > +                bindings::req_op_REQ_OP_READ,
> > +            )
> > +        };
> > +
> > +        // SAFETY: `bio` was just initialised with `bio_init` above, so it's safe to call
> > +        // `bio_uninit` on the way out.
> > +        let mut bio =
> > +            ScopeGuard::new_with_data(bio, |mut b| unsafe { bindings::bio_uninit(&mut b) });
> > +
> > +        // SAFETY: We have one free `bvec` (initialsied above). We also know that size won't exceed
> > +        // a page size (build_assert above).
> 
> I think you should move the `build_assert` above this line.

Sure, moved in v2.

Thanks,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 12/19] rust: fs: introduce `FileSystem::statfs`
  2024-01-04  5:33   ` Darrick J. Wong
@ 2024-01-24  4:24     ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2024-01-24  4:24 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Wed, Jan 03, 2024 at 09:33:15PM -0800, Darrick J. Wong wrote:
> On Wed, Oct 18, 2023 at 09:25:11AM -0300, Wedson Almeida Filho wrote:
> > From: Wedson Almeida Filho <walmeida@microsoft.com>
> > 
> > +/// File system stats.
> > +///
> > +/// A subset of C's `kstatfs`.
> > +pub struct Stat {
> > +    /// Magic number of the file system.
> > +    pub magic: u32,
> > +
> > +    /// The maximum length of a file name.
> > +    pub namelen: i64,
> 
> Yikes, I hope I never see an 8EB filename.  The C side doesn't handle
> names longer than 255 bytes.

kstatfs::f_namelen is defined as a long in C.

> 
> > +
> > +    /// Block size.
> > +    pub bsize: i64,
> 
> Or an 8EB block size.  SMR notwithstanding, I think this could be u32.
> 
> Why are these values signed?  Nobody has a -1k block filesystem.

I agree, but they're signed in C, I'm just mimicking that. See kstatfs::f_bsize
for this particular case, it's also a long.

> 
> > +    /// Number of files in the file system.
> > +    pub files: u64,
> > +
> > +    /// Number of blocks in the file system.
> > +    pub blocks: u64,
> >  }
> >  
> > +    unsafe extern "C" fn statfs_callback(
> > +        dentry: *mut bindings::dentry,
> > +        buf: *mut bindings::kstatfs,
> > +    ) -> core::ffi::c_int {
> > +        from_result(|| {
> > +            // SAFETY: The C API guarantees that `dentry` is valid for read. `d_sb` is
> > +            // immutable, so it's safe to read it. The superblock is guaranteed to be valid dor
> > +            // the duration of the call.
> > +            let sb = unsafe { &*(*dentry).d_sb.cast::<SuperBlock<T>>() };
> > +            let s = T::statfs(sb)?;
> > +
> > +            // SAFETY: The C API guarantees that `buf` is valid for read and write.
> > +            let buf = unsafe { &mut *buf };
> > +            buf.f_type = s.magic.into();
> > +            buf.f_namelen = s.namelen;
> > +            buf.f_bsize = s.bsize;
> > +            buf.f_files = s.files;
> > +            buf.f_blocks = s.blocks;
> > +            buf.f_bfree = 0;
> > +            buf.f_bavail = 0;
> > +            buf.f_ffree = 0;
> 
> Why is it necessary to fill out the C structure with zeroes?
> statfs_by_dentry zeroes the buffer contents before calling ->statfs.

I didn't know they were zeroed before calling statfs. Removed this from v2.

Thanks,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 19/19] tarfs: introduce tar fs
  2023-10-18 12:25 ` [RFC PATCH 19/19] tarfs: introduce tar fs Wedson Almeida Filho
  2023-10-18 16:57   ` Matthew Wilcox
@ 2024-01-24  5:05   ` Matthew Wilcox
  2024-01-24  5:23     ` Matthew Wilcox
  2024-01-24  5:34     ` Gao Xiang
  1 sibling, 2 replies; 125+ messages in thread
From: Matthew Wilcox @ 2024-01-24  5:05 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Oct 18, 2023 at 09:25:18AM -0300, Wedson Almeida Filho wrote:
> +config TARFS_FS
> +	tristate "TAR file system support"
> +	depends on RUST && BLOCK
> +	select BUFFER_HEAD

I didn't spot anywhere in this that actually uses buffer_heads.  Why
did you add this select?

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 19/19] tarfs: introduce tar fs
  2024-01-24  5:05   ` Matthew Wilcox
@ 2024-01-24  5:23     ` Matthew Wilcox
  2024-01-24 18:26       ` Wedson Almeida Filho
  2024-01-24  5:34     ` Gao Xiang
  1 sibling, 1 reply; 125+ messages in thread
From: Matthew Wilcox @ 2024-01-24  5:23 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, Jan 24, 2024 at 05:05:43AM +0000, Matthew Wilcox wrote:
> On Wed, Oct 18, 2023 at 09:25:18AM -0300, Wedson Almeida Filho wrote:
> > +config TARFS_FS
> > +	tristate "TAR file system support"
> > +	depends on RUST && BLOCK
> > +	select BUFFER_HEAD
> 
> I didn't spot anywhere in this that actually uses buffer_heads.  Why
> did you add this select?

Oh, never mind.  I found bread().

I'm not thrilled that you're adding buffer_head wrappers.  We're trying
to move away from buffer_heads.  Any chance you could use the page cache
directly to read your superblock?

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 19/19] tarfs: introduce tar fs
  2024-01-24  5:05   ` Matthew Wilcox
  2024-01-24  5:23     ` Matthew Wilcox
@ 2024-01-24  5:34     ` Gao Xiang
  1 sibling, 0 replies; 125+ messages in thread
From: Gao Xiang @ 2024-01-24  5:34 UTC (permalink / raw)
  To: Matthew Wilcox, Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

Hi,

On 2024/1/24 13:05, Matthew Wilcox wrote:
> On Wed, Oct 18, 2023 at 09:25:18AM -0300, Wedson Almeida Filho wrote:
>> +config TARFS_FS
>> +	tristate "TAR file system support"
>> +	depends on RUST && BLOCK
>> +	select BUFFER_HEAD
> 
> I didn't spot anywhere in this that actually uses buffer_heads.  Why
> did you add this select?

Side node: I think sb.read() relies on
"[RFC PATCH 15/19] rust: fs: add basic support for fs buffer heads"

More background:

Although I'm unintended to join a new language interface
discussion, which I'm pretty neutral. But I might need to add some
backgrounds about the "tarfs" itself since I'm indirectly joined
some discussion.

The TarFS use cases has been discussed many times in
"confidential containers" for almost a year in their community
meeting to passthrough OCI images to guests but without an
agreement.

And this "TarFS" once had a C version which was archived at
https://github.com/kata-containers/tardev-snapshotter/blob/main/tarfs/tarfs.c

and the discussion was directly in a Kata container PR:
https://github.com/kata-containers/kata-containers/pull/7106#issuecomment-1592192981

IMHO, this "tarfs" implementation have no relationship with the
real tar on-disk format since it defines a new customized index
format rather than just parse tar in kernel.

IOWs, "tarfs" mode can be directly supported by using EROFS since
Linux v6.3 by using 512-byte block size addressing, see
https://git.kernel.org/torvalds/c/61d325dcbc05

And I think any local fs which supports 512-byte block size can
have a "tarfs" mode without any compatibility issue.

BTW, in addition to incompatiable with on-disk tar format, this
tarfs does not seem to support tar xattrs too.

Thanks,
Gao Xiang

> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-10 19:19                 ` Kent Overstreet
@ 2024-01-24 13:08                   ` FUJITA Tomonori
  2024-01-24 19:49                     ` Kent Overstreet
  0 siblings, 1 reply; 125+ messages in thread
From: FUJITA Tomonori @ 2024-01-24 13:08 UTC (permalink / raw)
  To: kent.overstreet
  Cc: david, willy, wedsonaf, viro, gregkh, brauner, kent.overstreet,
	linux-fsdevel, rust-for-linux, walmeida

On Wed, 10 Jan 2024 14:19:41 -0500
Kent Overstreet <kent.overstreet@linux.dev> wrote:

> On Wed, Jan 10, 2024 at 09:19:37AM +1100, Dave Chinner wrote:
>> On Tue, Jan 09, 2024 at 07:25:38PM +0000, Matthew Wilcox wrote:
>> > On Tue, Jan 09, 2024 at 04:13:15PM -0300, Wedson Almeida Filho wrote:
>> > > On Wed, 3 Jan 2024 at 17:41, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> > > > No.  This "cleaner version on the Rust side" is nothing of that sort;
>> > > > this "readdir doesn't need any state that might be different for different
>> > > > file instances beyond the current position, because none of our examples
>> > > > have needed that so far" is a good example of the garbage we really do
>> > > > not need to deal with.
>> > > 
>> > > What you're calling garbage is what Greg KH asked us to do, namely,
>> > > not introduce anything for which there are no users. See a couple of
>> > > quotes below.
>> > > 
>> > > https://lore.kernel.org/rust-for-linux/2023081411-apache-tubeless-7bb3@gregkh/
>> > > The best feedback is "who will use these new interfaces?"  Without that,
>> > > it's really hard to review a patchset as it's difficult to see how the
>> > > bindings will be used, right?
>> > > 
>> > > https://lore.kernel.org/rust-for-linux/2023071049-gigabyte-timing-0673@gregkh/
>> > > And I'd recommend that we not take any more bindings without real users,
>> > > as there seems to be just a collection of these and it's hard to
>> > > actually review them to see how they are used...
>> > 
>> > You've misunderstood Greg.  He's saying (effectively) "No fs bindings
>> > without a filesystem to use them".  And Al, myself and others are saying
>> > "Your filesystem interfaces are wrong because they're not usable for real
>> > filesystems".
>> 
>> And that's why I've been saying that the first Rust filesystem that
>> should be implemented is an ext2 clone. That's our "reference
>> filesystem" for people who want to learn how filesystems should be
>> implemented in Linux - it's relatively simple but fully featured and
>> uses much of the generic abstractions and infrastructure.
>> 
>> At minimum, we need a filesystem implementation that is fully
>> read-write, supports truncate and rename, and has a fully functional
>> userspace and test infrastructure so that we can actually verify
>> that the Rust code does what it says on the label. ext2 ticks all of
>> these boxes....
> 
> I think someone was working on that? But I'd prefer that not to be a
> condition of merging the VFS interfaces; we've got multiple new Rust
> filesystems being implemented and I'm also planning on merging Rust
> bcachefs code next merge window.

It's very far from a fully functional clone of ext2 but the following
can do simple read-write to/from files and directories:

https://github.com/fujita/linux/tree/ext2-rust/fs/ext2rust

For now, all of the code is unsafe Rust, using C structures directly
but I could update the code to see how well Rust VFS abstractions for
real file systems work.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 05/19] rust: fs: introduce `INode<T>`
  2024-01-04  5:14   ` Darrick J. Wong
@ 2024-01-24 18:17     ` Wedson Almeida Filho
  0 siblings, 0 replies; 125+ messages in thread
From: Wedson Almeida Filho @ 2024-01-24 18:17 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Thu, 4 Jan 2024 at 02:14, Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Wed, Oct 18, 2023 at 09:25:04AM -0300, Wedson Almeida Filho wrote:
> > From: Wedson Almeida Filho <walmeida@microsoft.com>
> >
> > Allow Rust file systems to handle typed and ref-counted inodes.
> >
> > This is in preparation for creating new inodes (for example, to create
> > the root inode of a new superblock), which comes in the next patch in
> > the series.
> >
> > Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
> > ---
> >  rust/helpers.c    |  7 +++++++
> >  rust/kernel/fs.rs | 53 +++++++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 58 insertions(+), 2 deletions(-)
> >
> > diff --git a/rust/helpers.c b/rust/helpers.c
> > index 4c86fe4a7e05..fe45f8ddb31f 100644
> > --- a/rust/helpers.c
> > +++ b/rust/helpers.c
> > @@ -25,6 +25,7 @@
> >  #include <linux/build_bug.h>
> >  #include <linux/err.h>
> >  #include <linux/errname.h>
> > +#include <linux/fs.h>
> >  #include <linux/mutex.h>
> >  #include <linux/refcount.h>
> >  #include <linux/sched/signal.h>
> > @@ -144,6 +145,12 @@ struct kunit *rust_helper_kunit_get_current_test(void)
> >  }
> >  EXPORT_SYMBOL_GPL(rust_helper_kunit_get_current_test);
> >
> > +off_t rust_helper_i_size_read(const struct inode *inode)
>
> i_size_read returns a loff_t (aka __kernel_loff_t (aka long long)),
> but this returns    an off_t (aka __kernel_off_t (aka long)).
>
> Won't that cause truncation issues for files larger than 4GB on some
> architectures?

This is indeed a bug, thanks for catching it! Fixed in v2.

> > +{
> > +     return i_size_read(inode);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_i_size_read);
> > +
> >  /*
> >   * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
> >   * use it in contexts where Rust expects a `usize` like slice (array) indices.
> > diff --git a/rust/kernel/fs.rs b/rust/kernel/fs.rs
> > index 31cf643aaded..30fa1f312f33 100644
> > --- a/rust/kernel/fs.rs
> > +++ b/rust/kernel/fs.rs
> > @@ -7,9 +7,9 @@
> >  //! C headers: [`include/linux/fs.h`](../../include/linux/fs.h)
> >
> >  use crate::error::{code::*, from_result, to_result, Error, Result};
> > -use crate::types::Opaque;
> > +use crate::types::{AlwaysRefCounted, Opaque};
> >  use crate::{bindings, init::PinInit, str::CStr, try_pin_init, ThisModule};
> > -use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin};
> > +use core::{marker::PhantomData, marker::PhantomPinned, pin::Pin, ptr};
> >  use macros::{pin_data, pinned_drop};
> >
> >  /// Maximum size of an inode.
> > @@ -94,6 +94,55 @@ fn drop(self: Pin<&mut Self>) {
> >      }
> >  }
> >
> > +/// The number of an inode.
> > +pub type Ino = u64;
> > +
> > +/// A node in the file system index (inode).
> > +///
> > +/// Wraps the kernel's `struct inode`.
> > +///
> > +/// # Invariants
> > +///
> > +/// Instances of this type are always ref-counted, that is, a call to `ihold` ensures that the
> > +/// allocation remains valid at least until the matching call to `iput`.
> > +#[repr(transparent)]
> > +pub struct INode<T: FileSystem + ?Sized>(Opaque<bindings::inode>, PhantomData<T>);
> > +
> > +impl<T: FileSystem + ?Sized> INode<T> {
> > +    /// Returns the number of the inode.
> > +    pub fn ino(&self) -> Ino {
> > +        // SAFETY: `i_ino` is immutable, and `self` is guaranteed to be valid by the existence of a
> > +        // shared reference (&self) to it.
> > +        unsafe { (*self.0.get()).i_ino }
>
> Is "*self.0.get()" the means by which the Rust bindings get at the
> actual C object?

It gets a pointer to the C object.

self's type is INode, which is a pair. `self.0` get the first element
of the pair, which has type `Opaque<bindings::inode>`. It has a method
called `get` that returns a pointer to the object it wraps.

> (Forgive me, I've barely finished drying the primer coat on my rust-fu.)
>
> > +    }
> > +
> > +    /// Returns the super-block that owns the inode.
> > +    pub fn super_block(&self) -> &SuperBlock<T> {
> > +        // SAFETY: `i_sb` is immutable, and `self` is guaranteed to be valid by the existence of a
> > +        // shared reference (&self) to it.
> > +        unsafe { &*(*self.0.get()).i_sb.cast() }
> > +    }
> > +
> > +    /// Returns the size of the inode contents.
> > +    pub fn size(&self) -> i64 {
>
> I'm a little surprised I didn't see a
>
> pub type loff_t = i64
>
> followed by this function returning a loff_t.  Or maybe it would be
> better to define it as:

This was suggested by Matthew Wilcox as well, which I did for v2,
though I call it `Offset`.

> struct loff_t(i64);
>
> So that dopey fs developers like me cannot so easily assign a file
> position (bytes) to a pgoff_t (page index) without either supplying an
> actual conversion operator or seeing complaints from the compiler.

We may want to eventually do this, but for now I'm only doing the type alias.

The disadvantage of doing a new type is that we lose all arithmetic
operators as well, though we can redefine them by implementing the
appropriate traits.

Thanks,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 19/19] tarfs: introduce tar fs
  2024-01-24  5:23     ` Matthew Wilcox
@ 2024-01-24 18:26       ` Wedson Almeida Filho
  2024-01-24 21:05         ` Dave Chinner
  0 siblings, 1 reply; 125+ messages in thread
From: Wedson Almeida Filho @ 2024-01-24 18:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Christian Brauner, Kent Overstreet,
	Greg Kroah-Hartman, linux-fsdevel, rust-for-linux,
	Wedson Almeida Filho

On Wed, 24 Jan 2024 at 02:23, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Jan 24, 2024 at 05:05:43AM +0000, Matthew Wilcox wrote:
> > On Wed, Oct 18, 2023 at 09:25:18AM -0300, Wedson Almeida Filho wrote:
> > > +config TARFS_FS
> > > +   tristate "TAR file system support"
> > > +   depends on RUST && BLOCK
> > > +   select BUFFER_HEAD
> >
> > I didn't spot anywhere in this that actually uses buffer_heads.  Why
> > did you add this select?
>
> Oh, never mind.  I found bread().
>
> I'm not thrilled that you're adding buffer_head wrappers.  We're trying
> to move away from buffer_heads.  Any chance you could use the page cache
> directly to read your superblock?

I used it because I saw it in ext4 and assumed that it was the
recommended way of doing it. I'm fine to remove it.

So what is the recommended way? Which file systems are using it (so I
can do something similar)?

Cheers,
-Wedson

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 00/19] Rust abstractions for VFS
  2024-01-24 13:08                   ` FUJITA Tomonori
@ 2024-01-24 19:49                     ` Kent Overstreet
  0 siblings, 0 replies; 125+ messages in thread
From: Kent Overstreet @ 2024-01-24 19:49 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: david, willy, wedsonaf, viro, gregkh, brauner, kent.overstreet,
	linux-fsdevel, rust-for-linux, walmeida

On Wed, Jan 24, 2024 at 10:08:35PM +0900, FUJITA Tomonori wrote:
> On Wed, 10 Jan 2024 14:19:41 -0500
> Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
> > On Wed, Jan 10, 2024 at 09:19:37AM +1100, Dave Chinner wrote:
> >> On Tue, Jan 09, 2024 at 07:25:38PM +0000, Matthew Wilcox wrote:
> >> > On Tue, Jan 09, 2024 at 04:13:15PM -0300, Wedson Almeida Filho wrote:
> >> > > On Wed, 3 Jan 2024 at 17:41, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >> > > > No.  This "cleaner version on the Rust side" is nothing of that sort;
> >> > > > this "readdir doesn't need any state that might be different for different
> >> > > > file instances beyond the current position, because none of our examples
> >> > > > have needed that so far" is a good example of the garbage we really do
> >> > > > not need to deal with.
> >> > > 
> >> > > What you're calling garbage is what Greg KH asked us to do, namely,
> >> > > not introduce anything for which there are no users. See a couple of
> >> > > quotes below.
> >> > > 
> >> > > https://lore.kernel.org/rust-for-linux/2023081411-apache-tubeless-7bb3@gregkh/
> >> > > The best feedback is "who will use these new interfaces?"  Without that,
> >> > > it's really hard to review a patchset as it's difficult to see how the
> >> > > bindings will be used, right?
> >> > > 
> >> > > https://lore.kernel.org/rust-for-linux/2023071049-gigabyte-timing-0673@gregkh/
> >> > > And I'd recommend that we not take any more bindings without real users,
> >> > > as there seems to be just a collection of these and it's hard to
> >> > > actually review them to see how they are used...
> >> > 
> >> > You've misunderstood Greg.  He's saying (effectively) "No fs bindings
> >> > without a filesystem to use them".  And Al, myself and others are saying
> >> > "Your filesystem interfaces are wrong because they're not usable for real
> >> > filesystems".
> >> 
> >> And that's why I've been saying that the first Rust filesystem that
> >> should be implemented is an ext2 clone. That's our "reference
> >> filesystem" for people who want to learn how filesystems should be
> >> implemented in Linux - it's relatively simple but fully featured and
> >> uses much of the generic abstractions and infrastructure.
> >> 
> >> At minimum, we need a filesystem implementation that is fully
> >> read-write, supports truncate and rename, and has a fully functional
> >> userspace and test infrastructure so that we can actually verify
> >> that the Rust code does what it says on the label. ext2 ticks all of
> >> these boxes....
> > 
> > I think someone was working on that? But I'd prefer that not to be a
> > condition of merging the VFS interfaces; we've got multiple new Rust
> > filesystems being implemented and I'm also planning on merging Rust
> > bcachefs code next merge window.
> 
> It's very far from a fully functional clone of ext2 but the following
> can do simple read-write to/from files and directories:
> 
> https://github.com/fujita/linux/tree/ext2-rust/fs/ext2rust
> 
> For now, all of the code is unsafe Rust, using C structures directly
> but I could update the code to see how well Rust VFS abstractions for
> real file systems work.

I think that would be well received. I think the biggest hurdle for a
lot of people is going to be figuring out the patterns for expressing
old idioms in safe rust - a version of ext2 in safe Rust would be the
perfect gentle introduction for filesystem people.

And if it achieved feature parity with fs/ext2, there'd be a strong
argument for it eventually replacing fs/ext2 so that we can more safely
mount untrusted filesystem images.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 19/19] tarfs: introduce tar fs
  2024-01-24 18:26       ` Wedson Almeida Filho
@ 2024-01-24 21:05         ` Dave Chinner
  2024-01-24 21:28           ` Matthew Wilcox
  0 siblings, 1 reply; 125+ messages in thread
From: Dave Chinner @ 2024-01-24 21:05 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Matthew Wilcox, Alexander Viro, Christian Brauner,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Wed, Jan 24, 2024 at 03:26:03PM -0300, Wedson Almeida Filho wrote:
> On Wed, 24 Jan 2024 at 02:23, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Jan 24, 2024 at 05:05:43AM +0000, Matthew Wilcox wrote:
> > > On Wed, Oct 18, 2023 at 09:25:18AM -0300, Wedson Almeida Filho wrote:
> > > > +config TARFS_FS
> > > > +   tristate "TAR file system support"
> > > > +   depends on RUST && BLOCK
> > > > +   select BUFFER_HEAD
> > >
> > > I didn't spot anywhere in this that actually uses buffer_heads.  Why
> > > did you add this select?
> >
> > Oh, never mind.  I found bread().
> >
> > I'm not thrilled that you're adding buffer_head wrappers.  We're trying
> > to move away from buffer_heads.  Any chance you could use the page cache
> > directly to read your superblock?
> 
> I used it because I saw it in ext4 and assumed that it was the
> recommended way of doing it. I'm fine to remove it.
> 
> So what is the recommended way? Which file systems are using it (so I
> can do something similar)?

e.g. btrfs_read_dev_one_super(). Essentially, if your superblock is
at block zero in the block device:

	struct address_space *mapping = bdev->bd_inode->i_mapping;

	......

	page = read_cache_page_gfp(mapping, 0, GFP_NOFS);
        if (IS_ERR(page))
                return ERR_CAST(page);

        super = page_address(page);

And now you have a pointer to your in memory buffer containing the
on-disk superblock. If the sueprblock is not at block zero, then
replace the '0' passed to read_cache_page_gfp() with whatever page
cache index the superblock can be found at....

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 19/19] tarfs: introduce tar fs
  2024-01-24 21:05         ` Dave Chinner
@ 2024-01-24 21:28           ` Matthew Wilcox
  0 siblings, 0 replies; 125+ messages in thread
From: Matthew Wilcox @ 2024-01-24 21:28 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Wedson Almeida Filho, Alexander Viro, Christian Brauner,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On Thu, Jan 25, 2024 at 08:05:25AM +1100, Dave Chinner wrote:
> On Wed, Jan 24, 2024 at 03:26:03PM -0300, Wedson Almeida Filho wrote:
> > So what is the recommended way? Which file systems are using it (so I
> > can do something similar)?
> 
> e.g. btrfs_read_dev_one_super(). Essentially, if your superblock is
> at block zero in the block device:
> 
> 	struct address_space *mapping = bdev->bd_inode->i_mapping;
> 
> 	......
> 
> 	page = read_cache_page_gfp(mapping, 0, GFP_NOFS);
>         if (IS_ERR(page))
>                 return ERR_CAST(page);
> 
>         super = page_address(page);
> 
> And now you have a pointer to your in memory buffer containing the
> on-disk superblock. If the sueprblock is not at block zero, then
> replace the '0' passed to read_cache_page_gfp() with whatever page
> cache index the superblock can be found at....

Just to modify this slightly ...

	folio = read_mapping_folio(mapping, pos / PAGE_SIZE);
	if (IS_ERR(folio))
		return ERR_CAST(folio);
	super = folio_address(folio) + offset_in_folio(folio, pos);

... and then in your shutdown path, you'll need to call folio_put().
Maybe that's easiest done in Rust by "leaking" the folio into the
in-memory super block so it doesn't get dropped at the end of the
function?

I don't think you need the GFP_NOFS.  We don't have a superblock yet, so
we can't call back into the filesystem.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 01/19] rust: fs: add registration/unregistration of file systems
  2024-01-10 18:32     ` Wedson Almeida Filho
@ 2024-01-25  9:15       ` Benno Lossin
  0 siblings, 0 replies; 125+ messages in thread
From: Benno Lossin @ 2024-01-25  9:15 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alexander Viro, Christian Brauner, Matthew Wilcox,
	Kent Overstreet, Greg Kroah-Hartman, linux-fsdevel,
	rust-for-linux, Wedson Almeida Filho

On 10.01.24 19:32, Wedson Almeida Filho wrote:
>>> +#[pinned_drop]
>>> +impl PinnedDrop for Registration {
>>> +    fn drop(self: Pin<&mut Self>) {
>>> +        // SAFETY: If an instance of `Self` has been successfully created, a call to
>>> +        // `register_filesystem` has necessarily succeeded. So it's ok to call
>>> +        // `unregister_filesystem` on the previously registered fs.
>>
>> I would simply add an invariant on `Registration` that `self.fs` is
>> registered, then you do not need such a lengthy explanation here.
> 
> Since this is the only place I need this explanation, I prefer to
> leave it here because it's exactly where I need it.

I get why you want this, but consider this: someone adds a another
`new` function, but forgets to call `register_filesystem`. They have
no indication except for this comment in the `Drop` impl, that they
are doing something wrong.

I took a look at the implement ion of `unregister_filesystem` and
found that you can pass an unregistered filesystem, in that case
the function just returns an error. I think the only safety
requirement of `unregister_filesystem` is that if the supplied
pointer is a registered filesystem, the pointee is valid.

-- 
Cheers,
Benno



^ permalink raw reply	[flat|nested] 125+ messages in thread

end of thread, other threads:[~2024-01-25  9:16 UTC | newest]

Thread overview: 125+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 01/19] rust: fs: add registration/unregistration of file systems Wedson Almeida Filho
2023-10-18 15:38   ` Benno Lossin
2024-01-10 18:32     ` Wedson Almeida Filho
2024-01-25  9:15       ` Benno Lossin
2023-10-18 12:25 ` [RFC PATCH 02/19] rust: fs: introduce the `module_fs` macro Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 03/19] samples: rust: add initial ro file system sample Wedson Almeida Filho
2023-10-28 16:18   ` Alice Ryhl
2024-01-10 18:25     ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params` Wedson Almeida Filho
2023-10-18 16:34   ` Benno Lossin
2023-10-28 16:39     ` Alice Ryhl
2023-10-30  8:21       ` Benno Lossin
2023-10-30 21:36         ` Alice Ryhl
2023-10-20 15:04   ` Ariel Miculas (amiculas)
2024-01-03 12:25   ` Andreas Hindborg (Samsung)
2023-10-18 12:25 ` [RFC PATCH 05/19] rust: fs: introduce `INode<T>` Wedson Almeida Filho
2023-10-28 18:00   ` Alice Ryhl
2024-01-03 12:45     ` Andreas Hindborg (Samsung)
2024-01-03 12:54   ` Andreas Hindborg (Samsung)
2024-01-04  5:20     ` Darrick J. Wong
2024-01-04  9:57       ` Andreas Hindborg (Samsung)
2024-01-10  9:45     ` Benno Lossin
2024-01-04  5:14   ` Darrick J. Wong
2024-01-24 18:17     ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root` Wedson Almeida Filho
2023-10-19 14:30   ` Benno Lossin
2023-10-20  0:52     ` Boqun Feng
2023-10-21 13:48       ` Benno Lossin
2023-10-21 15:57         ` Boqun Feng
2023-10-21 17:01           ` Matthew Wilcox
2023-10-21 19:33             ` Boqun Feng
2023-10-23  5:29               ` Dave Chinner
2023-10-23 12:55                 ` Wedson Almeida Filho
2023-10-30  2:29                   ` Dave Chinner
2023-10-31 20:49                     ` Wedson Almeida Filho
2023-11-08  4:54                       ` Dave Chinner
2023-11-08  6:15                         ` Wedson Almeida Filho
2023-10-20  0:30   ` Boqun Feng
2023-10-23 12:36     ` Wedson Almeida Filho
2024-01-03 13:29   ` Andreas Hindborg (Samsung)
2024-01-24  4:07     ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir` Wedson Almeida Filho
2023-10-21  8:33   ` Benno Lossin
2024-01-03 14:09   ` Andreas Hindborg (Samsung)
2024-01-21 21:00   ` Askar Safin
2024-01-21 21:51     ` Dave Chinner
2023-10-18 12:25 ` [RFC PATCH 08/19] rust: fs: introduce `FileSystem::lookup` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 09/19] rust: folio: introduce basic support for folios Wedson Almeida Filho
2023-10-18 17:17   ` Matthew Wilcox
2023-10-18 18:32     ` Wedson Almeida Filho
2023-10-18 19:21       ` Matthew Wilcox
2023-10-19 13:25         ` Wedson Almeida Filho
2023-10-20  4:11           ` Matthew Wilcox
2023-10-20 15:17             ` Matthew Wilcox
2023-10-23 12:32               ` Wedson Almeida Filho
2023-10-23 10:48             ` Andreas Hindborg (Samsung)
2023-10-23 14:28               ` Matthew Wilcox
2023-10-24 15:04                 ` Ariel Miculas (amiculas)
2023-10-23 12:29             ` Wedson Almeida Filho
2023-10-21  9:21   ` Benno Lossin
2023-10-18 12:25 ` [RFC PATCH 10/19] rust: fs: introduce `FileSystem::read_folio` Wedson Almeida Filho
2023-11-07 22:18   ` Matthew Wilcox
2023-11-07 22:22     ` Al Viro
2023-11-08  0:35       ` Wedson Almeida Filho
2023-11-08  0:56         ` Al Viro
2023-11-08  2:39           ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 11/19] rust: fs: introduce `FileSystem::read_xattr` Wedson Almeida Filho
2023-10-18 13:06   ` Ariel Miculas (amiculas)
2023-10-19 13:35     ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 12/19] rust: fs: introduce `FileSystem::statfs` Wedson Almeida Filho
2024-01-03 14:13   ` Andreas Hindborg (Samsung)
2024-01-04  5:33   ` Darrick J. Wong
2024-01-24  4:24     ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 13/19] rust: fs: introduce more inode types Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 14/19] rust: fs: add per-superblock data Wedson Almeida Filho
2023-10-25 15:51   ` Ariel Miculas (amiculas)
2023-10-26 13:46   ` Ariel Miculas (amiculas)
2024-01-03 14:16   ` Andreas Hindborg (Samsung)
2023-10-18 12:25 ` [RFC PATCH 15/19] rust: fs: add basic support for fs buffer heads Wedson Almeida Filho
2024-01-03 14:17   ` Andreas Hindborg (Samsung)
2023-10-18 12:25 ` [RFC PATCH 16/19] rust: fs: allow file systems backed by a block device Wedson Almeida Filho
2023-10-21 13:39   ` Benno Lossin
2024-01-24  4:14     ` Wedson Almeida Filho
2024-01-03 14:38   ` Andreas Hindborg (Samsung)
2023-10-18 12:25 ` [RFC PATCH 17/19] rust: fs: allow per-inode data Wedson Almeida Filho
2023-10-21 13:57   ` Benno Lossin
2024-01-03 14:39   ` Andreas Hindborg (Samsung)
2023-10-18 12:25 ` [RFC PATCH 18/19] rust: fs: export file type from mode constants Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 19/19] tarfs: introduce tar fs Wedson Almeida Filho
2023-10-18 16:57   ` Matthew Wilcox
2023-10-18 17:05     ` Wedson Almeida Filho
2023-10-18 17:20       ` Matthew Wilcox
2023-10-18 18:07         ` Wedson Almeida Filho
2024-01-24  5:05   ` Matthew Wilcox
2024-01-24  5:23     ` Matthew Wilcox
2024-01-24 18:26       ` Wedson Almeida Filho
2024-01-24 21:05         ` Dave Chinner
2024-01-24 21:28           ` Matthew Wilcox
2024-01-24  5:34     ` Gao Xiang
2023-10-18 13:40 ` [RFC PATCH 00/19] Rust abstractions for VFS Ariel Miculas (amiculas)
2023-10-18 17:12   ` Wedson Almeida Filho
2023-10-29 20:31 ` Matthew Wilcox
2023-10-31 20:14   ` Wedson Almeida Filho
2024-01-03 18:02     ` Matthew Wilcox
2024-01-03 19:04       ` Wedson Almeida Filho
2024-01-03 19:53         ` Al Viro
2024-01-03 20:38           ` Kent Overstreet
2024-01-04  1:49         ` Matthew Wilcox
2024-01-09 18:25           ` Wedson Almeida Filho
2024-01-09 19:30             ` Matthew Wilcox
2024-01-03 19:14       ` Kent Overstreet
2024-01-03 20:41         ` Al Viro
2024-01-09 19:13           ` Wedson Almeida Filho
2024-01-09 19:25             ` Matthew Wilcox
2024-01-09 19:32               ` Greg Kroah-Hartman
2024-01-10  7:49                 ` Wedson Almeida Filho
2024-01-10  7:57                   ` Greg Kroah-Hartman
2024-01-10 12:56                   ` Matthew Wilcox
2024-01-09 22:19               ` Dave Chinner
2024-01-10 19:19                 ` Kent Overstreet
2024-01-24 13:08                   ` FUJITA Tomonori
2024-01-24 19:49                     ` Kent Overstreet
2024-01-05  0:04       ` David Howells
2024-01-05 15:54         ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).