rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/24] erofs: introduce Rust implementation
@ 2024-09-16 13:56 Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 01/24] erofs: lift up erofs_fill_inode to global Yiyang Wu
                   ` (23 more replies)
  0 siblings, 24 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

Greetings,

So here is a patchset to add Rust skeleton codes to the current EROFS
implementation. The implementation is deeply inspired by the current C 
implementation, and it's based on a generic erofs_sys crate[1] written
by me. The purpose is to potentially replace some of C codes to make
to make full use of Rust's safety features and better
optimization guarantees.

Many of the features (like compression inodes) still
fall back to C implementation because of my limited time and lack of
Rust counterparts. However, the Extended Attributes work purely in Rust.

Some changes are done to the original C code.
1) Some of superblock operations are modified slightly to make sure
memory allocation and deallocation are done correctly when interacting
with Rust.
2) A new rust_helpers.c file is introduced to help Rust to deal with
self-included EROFS API without exporting types that are not
interpreted in Rust.
3) A new rust_bindings.h is introduced to provide Rust functions
externs with the same function signature as Rust side so that
C-side code can use the bindings easily.
4) CONFIG_EROFS_FS_RUST is added in dir.c, inode.c, super.c, data.c,
and xattr.c to allow C code to be opt out and uses Rust implementation.
5) Some macros and function signatures are tweaked in internal.h
with the compilation options.

Note that, since currently there is no mature Rust VFS implementation
landed upstream, this patchset only uses C bindings internally and
each unsafe operation is examined. This implementation only offers
C-ABI-compatible functions impls and gets its exposed to original C
implementation as either hooks or function pointers.

Also note that, this patchset only uses already-present self-included
EROFS API and it uses as few C bindings generated from bindgen as
possible, only inode, dentry, file and dir_context related are used,
to be precise.

Since the EROFS community is pretty interested in giving Rust a try,
I think this patchset will be a good start for Rust EROFS.

This patchset is based on the latest EROFS development tree.
And the current codebase can also be found on my github repo[2].

[1]: https://github.com/ToolmanP/erofs-rs
[2]: https://github.com/ToolmanP/erofs-rs-linux

Yiyang Wu (24):
  erofs: lift up erofs_fill_inode to global
  erofs: add superblock data structure in Rust
  erofs: add Errno in Rust
  erofs: add xattrs data structure in Rust
  erofs: add inode data structure in Rust
  erofs: add alloc_helper in Rust
  erofs: add data abstraction in Rust
  erofs: add device data structure in Rust
  erofs: add continuous iterators in Rust
  erofs: add device_infos implementation in Rust
  erofs: add map data structure in Rust
  erofs: add directory entry data structure in Rust
  erofs: add runtime filesystem and inode in Rust
  erofs: add block mapping capability in Rust
  erofs: add iter methods in filesystem in Rust
  erofs: implement dir and inode operations in Rust
  erofs: introduce Rust SBI to C
  erofs: introduce iget alternative to C
  erofs: introduce namei alternative to C
  erofs: introduce readdir alternative to C
  erofs: introduce erofs_map_blocks alternative to C
  erofs: add skippable iters in Rust
  erofs: implement xattrs operations in Rust
  erofs: introduce xattrs replacement to C

 fs/erofs/Kconfig                              |  10 +
 fs/erofs/Makefile                             |   4 +
 fs/erofs/data.c                               |   5 +
 fs/erofs/data_rs.rs                           |  63 +++
 fs/erofs/dir.c                                |   2 +
 fs/erofs/dir_rs.rs                            |  57 ++
 fs/erofs/inode.c                              |  10 +-
 fs/erofs/inode_rs.rs                          |  64 +++
 fs/erofs/internal.h                           |  47 ++
 fs/erofs/namei.c                              |   2 +
 fs/erofs/namei_rs.rs                          |  56 ++
 fs/erofs/rust/erofs_sys.rs                    |  47 ++
 fs/erofs/rust/erofs_sys/alloc_helper.rs       |  35 ++
 fs/erofs/rust/erofs_sys/data.rs               |  70 +++
 fs/erofs/rust/erofs_sys/data/backends.rs      |   4 +
 .../erofs_sys/data/backends/uncompressed.rs   |  39 ++
 fs/erofs/rust/erofs_sys/data/raw_iters.rs     | 127 +++++
 .../rust/erofs_sys/data/raw_iters/ref_iter.rs | 131 +++++
 .../rust/erofs_sys/data/raw_iters/traits.rs   |  17 +
 fs/erofs/rust/erofs_sys/devices.rs            |  75 +++
 fs/erofs/rust/erofs_sys/dir.rs                |  98 ++++
 fs/erofs/rust/erofs_sys/errnos.rs             | 191 +++++++
 fs/erofs/rust/erofs_sys/inode.rs              | 398 ++++++++++++++
 fs/erofs/rust/erofs_sys/map.rs                |  99 ++++
 fs/erofs/rust/erofs_sys/operations.rs         |  62 +++
 fs/erofs/rust/erofs_sys/superblock.rs         | 514 ++++++++++++++++++
 fs/erofs/rust/erofs_sys/superblock/mem.rs     |  94 ++++
 fs/erofs/rust/erofs_sys/xattrs.rs             | 272 +++++++++
 fs/erofs/rust/kinode.rs                       |  76 +++
 fs/erofs/rust/ksources.rs                     |  66 +++
 fs/erofs/rust/ksuperblock.rs                  |  30 +
 fs/erofs/rust/mod.rs                          |   7 +
 fs/erofs/rust_bindings.h                      |  39 ++
 fs/erofs/rust_helpers.c                       |  86 +++
 fs/erofs/rust_helpers.h                       |  23 +
 fs/erofs/super.c                              |  51 +-
 fs/erofs/super_rs.rs                          |  59 ++
 fs/erofs/xattr.c                              |  31 +-
 fs/erofs/xattr.h                              |   7 +
 fs/erofs/xattr_rs.rs                          | 106 ++++
 40 files changed, 3153 insertions(+), 21 deletions(-)
 create mode 100644 fs/erofs/data_rs.rs
 create mode 100644 fs/erofs/dir_rs.rs
 create mode 100644 fs/erofs/inode_rs.rs
 create mode 100644 fs/erofs/namei_rs.rs
 create mode 100644 fs/erofs/rust/erofs_sys.rs
 create mode 100644 fs/erofs/rust/erofs_sys/alloc_helper.rs
 create mode 100644 fs/erofs/rust/erofs_sys/data.rs
 create mode 100644 fs/erofs/rust/erofs_sys/data/backends.rs
 create mode 100644 fs/erofs/rust/erofs_sys/data/backends/uncompressed.rs
 create mode 100644 fs/erofs/rust/erofs_sys/data/raw_iters.rs
 create mode 100644 fs/erofs/rust/erofs_sys/data/raw_iters/ref_iter.rs
 create mode 100644 fs/erofs/rust/erofs_sys/data/raw_iters/traits.rs
 create mode 100644 fs/erofs/rust/erofs_sys/devices.rs
 create mode 100644 fs/erofs/rust/erofs_sys/dir.rs
 create mode 100644 fs/erofs/rust/erofs_sys/errnos.rs
 create mode 100644 fs/erofs/rust/erofs_sys/inode.rs
 create mode 100644 fs/erofs/rust/erofs_sys/map.rs
 create mode 100644 fs/erofs/rust/erofs_sys/operations.rs
 create mode 100644 fs/erofs/rust/erofs_sys/superblock.rs
 create mode 100644 fs/erofs/rust/erofs_sys/superblock/mem.rs
 create mode 100644 fs/erofs/rust/erofs_sys/xattrs.rs
 create mode 100644 fs/erofs/rust/kinode.rs
 create mode 100644 fs/erofs/rust/ksources.rs
 create mode 100644 fs/erofs/rust/ksuperblock.rs
 create mode 100644 fs/erofs/rust/mod.rs
 create mode 100644 fs/erofs/rust_bindings.h
 create mode 100644 fs/erofs/rust_helpers.c
 create mode 100644 fs/erofs/rust_helpers.h
 create mode 100644 fs/erofs/super_rs.rs
 create mode 100644 fs/erofs/xattr_rs.rs

-- 
2.46.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [RFC PATCH 01/24] erofs: lift up erofs_fill_inode to global
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 02/24] erofs: add superblock data structure in Rust Yiyang Wu
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

Lift up erofs_fill_inode as a global symbol so that
rust_helpers can use it for better compatibility.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/inode.c    | 2 +-
 fs/erofs/internal.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index db29190656eb..d2fd51fcebd2 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -196,7 +196,7 @@ static int erofs_read_inode(struct inode *inode)
 	return err;
 }
 
-static int erofs_fill_inode(struct inode *inode)
+int erofs_fill_inode(struct inode *inode)
 {
 	struct erofs_inode *vi = EROFS_I(inode);
 	int err;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 4efd578d7c62..8674a4cb9d39 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -416,6 +416,7 @@ int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map);
 void erofs_onlinefolio_init(struct folio *folio);
 void erofs_onlinefolio_split(struct folio *folio);
 void erofs_onlinefolio_end(struct folio *folio, int err);
+int erofs_fill_inode(struct inode *inode);
 struct inode *erofs_iget(struct super_block *sb, erofs_nid_t nid);
 int erofs_getattr(struct mnt_idmap *idmap, const struct path *path,
 		  struct kstat *stat, u32 request_mask,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 02/24] erofs: add superblock data structure in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 01/24] erofs: lift up erofs_fill_inode to global Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 17:55   ` Greg KH
  2024-09-16 13:56 ` [RFC PATCH 03/24] erofs: add Errno " Yiyang Wu
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch adds a compilable super_rs.rs and introduces superblock
data structure in Rust. Note that this patch leaves C-side code
untouched.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/Kconfig                      |  10 ++
 fs/erofs/Makefile                     |   1 +
 fs/erofs/rust/erofs_sys.rs            |  22 +++++
 fs/erofs/rust/erofs_sys/superblock.rs | 132 ++++++++++++++++++++++++++
 fs/erofs/rust/mod.rs                  |   4 +
 fs/erofs/super_rs.rs                  |   9 ++
 6 files changed, 178 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys.rs
 create mode 100644 fs/erofs/rust/erofs_sys/superblock.rs
 create mode 100644 fs/erofs/rust/mod.rs
 create mode 100644 fs/erofs/super_rs.rs

diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index 6ea60661fa55..e2883efbf497 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -178,3 +178,13 @@ config EROFS_FS_PCPU_KTHREAD_HIPRI
 	  at higher priority.
 
 	  If unsure, say N.
+
+config EROFS_FS_RUST
+	bool "EROFS use RUST Replacement (EXPERIMENTAL)"
+	depends on EROFS_FS && RUST
+	help
+	  This permits EROFS to use EXPERIMENTAL Rust implementation
+	  for EROFS. This should be considered as an experimental
+	  feature for now.
+
+	  If unsure, say N.
diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 4331d53c7109..fb46a2c7fb50 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -9,3 +9,4 @@ erofs-$(CONFIG_EROFS_FS_ZIP_DEFLATE) += decompressor_deflate.o
 erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o
 erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o
 erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
+erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o
diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
new file mode 100644
index 000000000000..0f1400175fc2
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys.rs
@@ -0,0 +1,22 @@
+#![allow(dead_code)]
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+//! A pure Rust implementation of the EROFS filesystem.
+//! Technical Details are documented in the [EROFS Documentation](https://erofs.docs.kernel.org/en/latest/)
+
+// It's unavoidable to import alloc here. Since there are so many backends there and if we want to
+// to use trait object to export Filesystem pointer. The alloc crate here is necessary.
+
+#[cfg(not(CONFIG_EROFS_FS = "y"))]
+extern crate alloc;
+
+/// Erofs requires block index to a 32 bit unsigned integer.
+pub(crate) type Blk = u32;
+/// Erofs requires normal offset to be a 64bit unsigned integer.
+pub(crate) type Off = u64;
+/// Erofs requires inode nid to be a 64bit unsigned integer.
+pub(crate) type Nid = u64;
+/// Erofs Super Offset to read the ondisk superblock
+pub(crate) const EROFS_SUPER_OFFSET: Off = 1024;
+pub(crate) mod superblock;
diff --git a/fs/erofs/rust/erofs_sys/superblock.rs b/fs/erofs/rust/erofs_sys/superblock.rs
new file mode 100644
index 000000000000..213be6dbc553
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/superblock.rs
@@ -0,0 +1,132 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use super::*;
+use core::mem::size_of;
+
+/// The ondisk superblock structure.
+#[derive(Debug, Clone, Copy, Default)]
+#[repr(C)]
+pub(crate) struct SuperBlock {
+    pub(crate) magic: u32,
+    pub(crate) checksum: i32,
+    pub(crate) feature_compat: i32,
+    pub(crate) blkszbits: u8,
+    pub(crate) sb_extslots: u8,
+    pub(crate) root_nid: i16,
+    pub(crate) inos: i64,
+    pub(crate) build_time: i64,
+    pub(crate) build_time_nsec: i32,
+    pub(crate) blocks: i32,
+    pub(crate) meta_blkaddr: u32,
+    pub(crate) xattr_blkaddr: u32,
+    pub(crate) uuid: [u8; 16],
+    pub(crate) volume_name: [u8; 16],
+    pub(crate) feature_incompat: i32,
+    pub(crate) compression: i16,
+    pub(crate) extra_devices: i16,
+    pub(crate) devt_slotoff: i16,
+    pub(crate) dirblkbits: u8,
+    pub(crate) xattr_prefix_count: u8,
+    pub(crate) xattr_prefix_start: i32,
+    pub(crate) packed_nid: i64,
+    pub(crate) xattr_filter_reserved: u8,
+    pub(crate) reserved: [u8; 23],
+}
+
+impl TryFrom<&[u8]> for SuperBlock {
+    type Error = core::array::TryFromSliceError;
+    fn try_from(value: &[u8]) -> Result<Self, Self::Error> {
+        value[0..128].try_into()
+    }
+}
+
+impl From<[u8; 128]> for SuperBlock {
+    fn from(value: [u8; 128]) -> Self {
+        Self {
+            magic: u32::from_le_bytes([value[0], value[1], value[2], value[3]]),
+            checksum: i32::from_le_bytes([value[4], value[5], value[6], value[7]]),
+            feature_compat: i32::from_le_bytes([value[8], value[9], value[10], value[11]]),
+            blkszbits: value[12],
+            sb_extslots: value[13],
+            root_nid: i16::from_le_bytes([value[14], value[15]]),
+            inos: i64::from_le_bytes([
+                value[16], value[17], value[18], value[19], value[20], value[21], value[22],
+                value[23],
+            ]),
+            build_time: i64::from_le_bytes([
+                value[24], value[25], value[26], value[27], value[28], value[29], value[30],
+                value[31],
+            ]),
+            build_time_nsec: i32::from_le_bytes([value[32], value[33], value[34], value[35]]),
+            blocks: i32::from_le_bytes([value[36], value[37], value[38], value[39]]),
+            meta_blkaddr: u32::from_le_bytes([value[40], value[41], value[42], value[43]]),
+            xattr_blkaddr: u32::from_le_bytes([value[44], value[45], value[46], value[47]]),
+            uuid: value[48..64].try_into().unwrap(),
+            volume_name: value[64..80].try_into().unwrap(),
+            feature_incompat: i32::from_le_bytes([value[80], value[81], value[82], value[83]]),
+            compression: i16::from_le_bytes([value[84], value[85]]),
+            extra_devices: i16::from_le_bytes([value[86], value[87]]),
+            devt_slotoff: i16::from_le_bytes([value[88], value[89]]),
+            dirblkbits: value[90],
+            xattr_prefix_count: value[91],
+            xattr_prefix_start: i32::from_le_bytes([value[92], value[93], value[94], value[95]]),
+            packed_nid: i64::from_le_bytes([
+                value[96], value[97], value[98], value[99], value[100], value[101], value[102],
+                value[103],
+            ]),
+            xattr_filter_reserved: value[104],
+            reserved: value[105..128].try_into().unwrap(),
+        }
+    }
+}
+
+pub(crate) type SuperBlockBuf = [u8; size_of::<SuperBlock>()];
+pub(crate) const SUPERBLOCK_EMPTY_BUF: SuperBlockBuf = [0; size_of::<SuperBlock>()];
+
+/// Used for external address calculation.
+pub(crate) struct Accessor {
+    pub(crate) base: Off,
+    pub(crate) off: Off,
+    pub(crate) len: Off,
+    pub(crate) nr: Off,
+}
+
+impl Accessor {
+    pub(crate) fn new(address: Off, bits: Off) -> Self {
+        let sz = 1 << bits;
+        let mask = sz - 1;
+        Accessor {
+            base: (address >> bits) << bits,
+            off: address & mask,
+            len: sz - (address & mask),
+            nr: address >> bits,
+        }
+    }
+}
+
+impl SuperBlock {
+    pub(crate) fn blk_access(&self, address: Off) -> Accessor {
+        Accessor::new(address, self.blkszbits as Off)
+    }
+
+    pub(crate) fn blknr(&self, pos: Off) -> Blk {
+        (pos >> self.blkszbits) as Blk
+    }
+
+    pub(crate) fn blkpos(&self, blk: Blk) -> Off {
+        (blk as Off) << self.blkszbits
+    }
+
+    pub(crate) fn blksz(&self) -> Off {
+        1 << self.blkszbits
+    }
+
+    pub(crate) fn blk_round_up(&self, addr: Off) -> Blk {
+        ((addr + self.blksz() - 1) >> self.blkszbits) as Blk
+    }
+
+    pub(crate) fn iloc(&self, nid: Nid) -> Off {
+        self.blkpos(self.meta_blkaddr) + ((nid as Off) << (5 as Off))
+    }
+}
diff --git a/fs/erofs/rust/mod.rs b/fs/erofs/rust/mod.rs
new file mode 100644
index 000000000000..e6c0731f2533
--- /dev/null
+++ b/fs/erofs/rust/mod.rs
@@ -0,0 +1,4 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+pub(crate) mod erofs_sys;
diff --git a/fs/erofs/super_rs.rs b/fs/erofs/super_rs.rs
new file mode 100644
index 000000000000..4b8cbef507e3
--- /dev/null
+++ b/fs/erofs/super_rs.rs
@@ -0,0 +1,9 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+//! EROFS Rust Kernel Module Helpers Implementation
+//! This is only for experimental purpose. Feedback is always welcome.
+
+#[allow(dead_code)]
+#[allow(missing_docs)]
+pub(crate) mod rust;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 01/24] erofs: lift up erofs_fill_inode to global Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 02/24] erofs: add superblock data structure in Rust Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 17:51   ` Greg KH
  2024-09-16 20:01   ` Gary Guo
  2024-09-16 13:56 ` [RFC PATCH 04/24] erofs: add xattrs data structure " Yiyang Wu
                   ` (20 subsequent siblings)
  23 siblings, 2 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

Introduce Errno to Rust side code. Note that in current Rust For Linux,
Errnos are implemented as core::ffi::c_uint unit structs.
However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.

Since the errno_base hasn't changed for over 13 years,
This patch merely serves as a temporary workaround for the missing
errno in the Rust For Linux.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys.rs        |   6 +
 fs/erofs/rust/erofs_sys/errnos.rs | 191 ++++++++++++++++++++++++++++++
 2 files changed, 197 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys/errnos.rs

diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
index 0f1400175fc2..2bd1381da5ab 100644
--- a/fs/erofs/rust/erofs_sys.rs
+++ b/fs/erofs/rust/erofs_sys.rs
@@ -19,4 +19,10 @@
 pub(crate) type Nid = u64;
 /// Erofs Super Offset to read the ondisk superblock
 pub(crate) const EROFS_SUPER_OFFSET: Off = 1024;
+/// PosixResult as a type alias to kernel::error::Result
+/// to avoid naming conflicts.
+pub(crate) type PosixResult<T> = Result<T, Errno>;
+
+pub(crate) mod errnos;
 pub(crate) mod superblock;
+pub(crate) use errnos::Errno;
diff --git a/fs/erofs/rust/erofs_sys/errnos.rs b/fs/erofs/rust/erofs_sys/errnos.rs
new file mode 100644
index 000000000000..40e5cdbcb353
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/errnos.rs
@@ -0,0 +1,191 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+#[repr(i32)]
+#[non_exhaustive]
+#[allow(clippy::upper_case_acronyms)]
+#[derive(Debug, Copy, Clone, PartialEq)]
+pub(crate) enum Errno {
+    NONE = 0,
+    EPERM,
+    ENOENT,
+    ESRCH,
+    EINTR,
+    EIO,
+    ENXIO,
+    E2BIG,
+    ENOEXEC,
+    EBADF,
+    ECHILD,
+    EAGAIN,
+    ENOMEM,
+    EACCES,
+    EFAULT,
+    ENOTBLK,
+    EBUSY,
+    EEXIST,
+    EXDEV,
+    ENODEV,
+    ENOTDIR,
+    EISDIR,
+    EINVAL,
+    ENFILE,
+    EMFILE,
+    ENOTTY,
+    ETXTBSY,
+    EFBIG,
+    ENOSPC,
+    ESPIPE,
+    EROFS,
+    EMLINK,
+    EPIPE,
+    EDOM,
+    ERANGE,
+    EDEADLK,
+    ENAMETOOLONG,
+    ENOLCK,
+    ENOSYS,
+    ENOTEMPTY,
+    ELOOP,
+    ENOMSG = 42,
+    EIDRM,
+    ECHRNG,
+    EL2NSYNC,
+    EL3HLT,
+    EL3RST,
+    ELNRNG,
+    EUNATCH,
+    ENOCSI,
+    EL2HLT,
+    EBADE,
+    EBADR,
+    EXFULL,
+    ENOANO,
+    EBADRQC,
+    EBADSLT,
+    EBFONT = 59,
+    ENOSTR,
+    ENODATA,
+    ETIME,
+    ENOSR,
+    ENONET,
+    ENOPKG,
+    EREMOTE,
+    ENOLINK,
+    EADV,
+    ESRMNT,
+    ECOMM,
+    EPROTO,
+    EMULTIHOP,
+    EDOTDOT,
+    EBADMSG,
+    EOVERFLOW,
+    ENOTUNIQ,
+    EBADFD,
+    EREMCHG,
+    ELIBACC,
+    ELIBBAD,
+    ELIBSCN,
+    ELIBMAX,
+    ELIBEXEC,
+    EILSEQ,
+    ERESTART,
+    ESTRPIPE,
+    EUSERS,
+    ENOTSOCK,
+    EDESTADDRREQ,
+    EMSGSIZE,
+    EPROTOTYPE,
+    ENOPROTOOPT,
+    EPROTONOSUPPORT,
+    ESOCKTNOSUPPORT,
+    EOPNOTSUPP,
+    EPFNOSUPPORT,
+    EAFNOSUPPORT,
+    EADDRINUSE,
+    EADDRNOTAVAIL,
+    ENETDOWN,
+    ENETUNREACH,
+    ENETRESET,
+    ECONNABORTED,
+    ECONNRESET,
+    ENOBUFS,
+    EISCONN,
+    ENOTCONN,
+    ESHUTDOWN,
+    ETOOMANYREFS,
+    ETIMEDOUT,
+    ECONNREFUSED,
+    EHOSTDOWN,
+    EHOSTUNREACH,
+    EALREADY,
+    EINPROGRESS,
+    ESTALE,
+    EUCLEAN,
+    ENOTNAM,
+    ENAVAIL,
+    EISNAM,
+    EREMOTEIO,
+    EDQUOT,
+    ENOMEDIUM,
+    EMEDIUMTYPE,
+    ECANCELED,
+    ENOKEY,
+    EKEYEXPIRED,
+    EKEYREVOKED,
+    EKEYREJECTED,
+    EOWNERDEAD,
+    ENOTRECOVERABLE,
+    ERFKILL,
+    EHWPOISON,
+    EUNKNOWN,
+}
+
+impl From<i32> for Errno {
+    fn from(value: i32) -> Self {
+        if (-value) <= 0 || (-value) > Errno::EUNKNOWN as i32 {
+            Errno::EUNKNOWN
+        } else {
+            // Safety: The value is guaranteed to be a valid errno and the memory
+            // layout is the same for both types.
+            unsafe { core::mem::transmute(value) }
+        }
+    }
+}
+
+impl From<Errno> for i32 {
+    fn from(value: Errno) -> Self {
+        -(value as i32)
+    }
+}
+
+/// Replacement for ERR_PTR in Linux Kernel.
+impl From<Errno> for *const core::ffi::c_void {
+    fn from(value: Errno) -> Self {
+        (-(value as core::ffi::c_long)) as *const core::ffi::c_void
+    }
+}
+
+impl From<Errno> for *mut core::ffi::c_void {
+    fn from(value: Errno) -> Self {
+        (-(value as core::ffi::c_long)) as *mut core::ffi::c_void
+    }
+}
+
+/// Replacement for PTR_ERR in Linux Kernel.
+impl From<*const core::ffi::c_void> for Errno {
+    fn from(value: *const core::ffi::c_void) -> Self {
+        (-(value as i32)).into()
+    }
+}
+
+impl From<*mut core::ffi::c_void> for Errno {
+    fn from(value: *mut core::ffi::c_void) -> Self {
+        (-(value as i32)).into()
+    }
+}
+/// Replacement for IS_ERR in Linux Kernel.
+#[inline(always)]
+pub(crate) fn is_value_err(value: *const core::ffi::c_void) -> bool {
+    (value as core::ffi::c_ulong) >= (-4095 as core::ffi::c_long) as core::ffi::c_ulong
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 04/24] erofs: add xattrs data structure in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (2 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 03/24] erofs: add Errno " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 05/24] erofs: add inode " Yiyang Wu
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduces on-disk and runtime data structure of Extended
Attributes implementation in erofs_sys crate. This will be later used to
implement the op handler.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys.rs        |  12 +++
 fs/erofs/rust/erofs_sys/xattrs.rs | 124 ++++++++++++++++++++++++++++++
 2 files changed, 136 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys/xattrs.rs

diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
index 2bd1381da5ab..6f3c12665ed6 100644
--- a/fs/erofs/rust/erofs_sys.rs
+++ b/fs/erofs/rust/erofs_sys.rs
@@ -25,4 +25,16 @@
 
 pub(crate) mod errnos;
 pub(crate) mod superblock;
+pub(crate) mod xattrs;
 pub(crate) use errnos::Errno;
+
+/// Helper macro to round up or down a number.
+#[macro_export]
+macro_rules! round {
+    (UP, $x: expr, $y: expr) => {
+        ($x + $y - 1) / $y * $y
+    };
+    (DOWN, $x: expr, $y: expr) => {
+        ($x / $y) * $y
+    };
+}
diff --git a/fs/erofs/rust/erofs_sys/xattrs.rs b/fs/erofs/rust/erofs_sys/xattrs.rs
new file mode 100644
index 000000000000..d1a110ef10dd
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/xattrs.rs
@@ -0,0 +1,124 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use alloc::vec::Vec;
+
+/// The header of the xattr entry index.
+/// This is used to describe the superblock's xattrs collection.
+#[derive(Clone, Copy)]
+#[repr(C)]
+pub(crate) struct XAttrSharedEntrySummary {
+    pub(crate) name_filter: u32,
+    pub(crate) shared_count: u8,
+    pub(crate) reserved: [u8; 7],
+}
+
+impl From<[u8; 12]> for XAttrSharedEntrySummary {
+    fn from(value: [u8; 12]) -> Self {
+        Self {
+            name_filter: u32::from_le_bytes([value[0], value[1], value[2], value[3]]),
+            shared_count: value[4],
+            reserved: value[5..12].try_into().unwrap(),
+        }
+    }
+}
+
+pub(crate) const XATTR_ENTRY_SUMMARY_BUF: [u8; 12] = [0u8; 12];
+
+/// Represented as a inmemory memory entry index header used by SuperBlockInfo.
+pub(crate) struct XAttrSharedEntries {
+    pub(crate) name_filter: u32,
+    pub(crate) shared_indexes: Vec<u32>,
+}
+
+/// Represents the name index for infixes or prefixes.
+#[repr(C)]
+#[derive(Clone, Copy)]
+pub(crate) struct XattrNameIndex(u8);
+
+impl core::cmp::PartialEq<u8> for XattrNameIndex {
+    fn eq(&self, other: &u8) -> bool {
+        if self.0 & EROFS_XATTR_LONG_PREFIX != 0 {
+            self.0 & EROFS_XATTR_LONG_MASK == *other
+        } else {
+            self.0 == *other
+        }
+    }
+}
+
+impl XattrNameIndex {
+    pub(crate) fn is_long(&self) -> bool {
+        self.0 & EROFS_XATTR_LONG_PREFIX != 0
+    }
+}
+
+impl From<u8> for XattrNameIndex {
+    fn from(value: u8) -> Self {
+        Self(value)
+    }
+}
+
+#[allow(clippy::from_over_into)]
+impl Into<usize> for XattrNameIndex {
+    fn into(self) -> usize {
+        if self.0 & EROFS_XATTR_LONG_PREFIX != 0 {
+            (self.0 & EROFS_XATTR_LONG_MASK) as usize
+        } else {
+            self.0 as usize
+        }
+    }
+}
+
+/// This is on-disk representation of xattrs entry header.
+/// This is used to describe one extended attribute.
+#[repr(C)]
+#[derive(Clone, Copy)]
+pub(crate) struct XAttrEntryHeader {
+    pub(crate) suffix_len: u8,
+    pub(crate) name_index: XattrNameIndex,
+    pub(crate) value_len: u16,
+}
+
+impl From<[u8; 4]> for XAttrEntryHeader {
+    fn from(value: [u8; 4]) -> Self {
+        Self {
+            suffix_len: value[0],
+            name_index: value[1].into(),
+            value_len: u16::from_le_bytes(value[2..4].try_into().unwrap()),
+        }
+    }
+}
+
+/// Xattr Common Infix holds the prefix index in the first byte and all the common infix data in
+/// the rest of the bytes.
+pub(crate) struct XAttrInfix(pub(crate) Vec<u8>);
+
+impl XAttrInfix {
+    fn prefix_index(&self) -> u8 {
+        self.0[0]
+    }
+    fn name(&self) -> &[u8] {
+        &self.0[1..]
+    }
+}
+
+pub(crate) const EROFS_XATTR_LONG_PREFIX: u8 = 0x80;
+pub(crate) const EROFS_XATTR_LONG_MASK: u8 = EROFS_XATTR_LONG_PREFIX - 1;
+
+/// Supported xattr prefixes
+pub(crate) const EROFS_XATTRS_PREFIXS: [&[u8]; 7] = [
+    b"",
+    b"user.",
+    b"system.posix_acl_access",
+    b"system.posix_acl_default",
+    b"trusted.",
+    b"",
+    b"security.",
+];
+
+/// Represents the value of an xattr entry or the size of it if the buffer is present in the query.
+#[derive(Debug)]
+pub(crate) enum XAttrValue {
+    Buffer(usize),
+    Vec(Vec<u8>),
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 05/24] erofs: add inode data structure in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (3 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 04/24] erofs: add xattrs data structure " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-18 13:04   ` [External Mail][RFC " Huang Jianan
  2024-09-16 13:56 ` [RFC PATCH 06/24] erofs: add alloc_helper " Yiyang Wu
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduces the same on-disk erofs data structure
in rust and also introduces multiple helpers for inode i_format
and chunk_indexing and later can be used to implement map_blocks.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys.rs       |   1 +
 fs/erofs/rust/erofs_sys/inode.rs | 291 +++++++++++++++++++++++++++++++
 2 files changed, 292 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys/inode.rs

diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
index 6f3c12665ed6..34267ec7772d 100644
--- a/fs/erofs/rust/erofs_sys.rs
+++ b/fs/erofs/rust/erofs_sys.rs
@@ -24,6 +24,7 @@
 pub(crate) type PosixResult<T> = Result<T, Errno>;
 
 pub(crate) mod errnos;
+pub(crate) mod inode;
 pub(crate) mod superblock;
 pub(crate) mod xattrs;
 pub(crate) use errnos::Errno;
diff --git a/fs/erofs/rust/erofs_sys/inode.rs b/fs/erofs/rust/erofs_sys/inode.rs
new file mode 100644
index 000000000000..1762023e97f8
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/inode.rs
@@ -0,0 +1,291 @@
+use super::xattrs::*;
+use super::*;
+use core::ffi::*;
+use core::mem::size_of;
+
+/// Represents the compact bitfield of the Erofs Inode format.
+#[repr(transparent)]
+#[derive(Clone, Copy)]
+pub(crate) struct Format(u16);
+
+pub(crate) const INODE_VERSION_MASK: u16 = 0x1;
+pub(crate) const INODE_VERSION_BIT: u16 = 0;
+
+pub(crate) const INODE_LAYOUT_BIT: u16 = 1;
+pub(crate) const INODE_LAYOUT_MASK: u16 = 0x7;
+
+/// Helper macro to extract property from the bitfield.
+macro_rules! extract {
+    ($name: expr, $bit: expr, $mask: expr) => {
+        ($name >> $bit) & ($mask)
+    };
+}
+
+/// The Version of the Inode which represents whether this inode is extended or compact.
+/// Extended inodes have more infos about nlinks + mtime.
+/// This is documented in https://erofs.docs.kernel.org/en/latest/core_ondisk.html#inodes
+#[repr(C)]
+#[derive(Clone, Copy)]
+pub(crate) enum Version {
+    Compat,
+    Extended,
+    Unknown,
+}
+
+/// Represents the data layout backed by the Inode.
+/// As Documented in https://erofs.docs.kernel.org/en/latest/core_ondisk.html#inode-data-layouts
+#[repr(C)]
+#[derive(Clone, Copy, PartialEq)]
+pub(crate) enum Layout {
+    FlatPlain,
+    CompressedFull,
+    FlatInline,
+    CompressedCompact,
+    Chunk,
+    Unknown,
+}
+
+#[repr(C)]
+#[allow(non_camel_case_types)]
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub(crate) enum Type {
+    Regular,
+    Directory,
+    Link,
+    Character,
+    Block,
+    Fifo,
+    Socket,
+    Unknown,
+}
+
+/// This is format extracted from i_format bit representation.
+/// This includes various infos and specs about the inode.
+impl Format {
+    pub(crate) fn version(&self) -> Version {
+        match extract!(self.0, INODE_VERSION_BIT, INODE_VERSION_MASK) {
+            0 => Version::Compat,
+            1 => Version::Extended,
+            _ => Version::Unknown,
+        }
+    }
+
+    pub(crate) fn layout(&self) -> Layout {
+        match extract!(self.0, INODE_LAYOUT_BIT, INODE_LAYOUT_MASK) {
+            0 => Layout::FlatPlain,
+            1 => Layout::CompressedFull,
+            2 => Layout::FlatInline,
+            3 => Layout::CompressedCompact,
+            4 => Layout::Chunk,
+            _ => Layout::Unknown,
+        }
+    }
+}
+
+/// Represents the compact inode which resides on-disk.
+/// This is documented in https://erofs.docs.kernel.org/en/latest/core_ondisk.html#inodes
+#[repr(C)]
+#[derive(Clone, Copy)]
+pub(crate) struct CompactInodeInfo {
+    pub(crate) i_format: Format,
+    pub(crate) i_xattr_icount: u16,
+    pub(crate) i_mode: u16,
+    pub(crate) i_nlink: u16,
+    pub(crate) i_size: u32,
+    pub(crate) i_reserved: [u8; 4],
+    pub(crate) i_u: [u8; 4],
+    pub(crate) i_ino: u32,
+    pub(crate) i_uid: u16,
+    pub(crate) i_gid: u16,
+    pub(crate) i_reserved2: [u8; 4],
+}
+
+/// Represents the extended inode which resides on-disk.
+/// This is documented in https://erofs.docs.kernel.org/en/latest/core_ondisk.html#inodes
+#[repr(C)]
+#[derive(Clone, Copy)]
+pub(crate) struct ExtendedInodeInfo {
+    pub(crate) i_format: Format,
+    pub(crate) i_xattr_icount: u16,
+    pub(crate) i_mode: u16,
+    pub(crate) i_reserved: [u8; 2],
+    pub(crate) i_size: u64,
+    pub(crate) i_u: [u8; 4],
+    pub(crate) i_ino: u32,
+    pub(crate) i_uid: u32,
+    pub(crate) i_gid: u32,
+    pub(crate) i_mtime: u64,
+    pub(crate) i_mtime_nsec: u32,
+    pub(crate) i_nlink: u32,
+    pub(crate) i_reserved2: [u8; 16],
+}
+
+/// Represents the inode info which is either compact or extended.
+#[derive(Clone, Copy)]
+pub(crate) enum InodeInfo {
+    Extended(ExtendedInodeInfo),
+    Compact(CompactInodeInfo),
+}
+
+pub(crate) const CHUNK_BLKBITS_MASK: u16 = 0x1f;
+pub(crate) const CHUNK_FORMAT_INDEX_BIT: u16 = 0x20;
+
+/// Represents on-disk chunk index of the file backing inode.
+#[repr(C)]
+#[derive(Clone, Copy, Debug)]
+pub(crate) struct ChunkIndex {
+    pub(crate) advise: u16,
+    pub(crate) device_id: u16,
+    pub(crate) blkaddr: u32,
+}
+
+impl From<[u8; 8]> for ChunkIndex {
+    fn from(u: [u8; 8]) -> Self {
+        let advise = u16::from_le_bytes([u[0], u[1]]);
+        let device_id = u16::from_le_bytes([u[2], u[3]]);
+        let blkaddr = u32::from_le_bytes([u[4], u[5], u[6], u[7]]);
+        ChunkIndex {
+            advise,
+            device_id,
+            blkaddr,
+        }
+    }
+}
+
+/// Chunk format used for indicating the chunkbits and chunkindex.
+#[repr(C)]
+#[derive(Clone, Copy, Debug)]
+pub(crate) struct ChunkFormat(pub(crate) u16);
+
+impl ChunkFormat {
+    pub(crate) fn is_chunkindex(&self) -> bool {
+        self.0 & CHUNK_FORMAT_INDEX_BIT != 0
+    }
+    pub(crate) fn chunkbits(&self) -> u16 {
+        self.0 & CHUNK_BLKBITS_MASK
+    }
+}
+
+/// Represents the inode spec which is either data or device.
+#[derive(Clone, Copy, Debug)]
+#[repr(u32)]
+pub(crate) enum Spec {
+    Chunk(ChunkFormat),
+    RawBlk(u32),
+    Device(u32),
+    CompressedBlocks(u32),
+    Unknown,
+}
+
+/// Convert the spec from the format of the inode based on the layout.
+impl From<(&[u8; 4], Layout)> for Spec {
+    fn from(value: (&[u8; 4], Layout)) -> Self {
+        match value.1 {
+            Layout::FlatInline | Layout::FlatPlain => Spec::RawBlk(u32::from_le_bytes(*value.0)),
+            Layout::CompressedFull | Layout::CompressedCompact => {
+                Spec::CompressedBlocks(u32::from_le_bytes(*value.0))
+            }
+            Layout::Chunk => Self::Chunk(ChunkFormat(u16::from_le_bytes([value.0[0], value.0[1]]))),
+            // We don't support compressed inlines or compressed chunks currently.
+            _ => Spec::Unknown,
+        }
+    }
+}
+
+/// Helper functions for Inode Info.
+impl InodeInfo {
+    const S_IFMT: u16 = 0o170000;
+    const S_IFSOCK: u16 = 0o140000;
+    const S_IFLNK: u16 = 0o120000;
+    const S_IFREG: u16 = 0o100000;
+    const S_IFBLK: u16 = 0o60000;
+    const S_IFDIR: u16 = 0o40000;
+    const S_IFCHR: u16 = 0o20000;
+    const S_IFIFO: u16 = 0o10000;
+    const S_ISUID: u16 = 0o4000;
+    const S_ISGID: u16 = 0o2000;
+    const S_ISVTX: u16 = 0o1000;
+    pub(crate) fn ino(&self) -> u32 {
+        match self {
+            Self::Extended(extended) => extended.i_ino,
+            Self::Compact(compact) => compact.i_ino,
+        }
+    }
+
+    pub(crate) fn format(&self) -> Format {
+        match self {
+            Self::Extended(extended) => extended.i_format,
+            Self::Compact(compact) => compact.i_format,
+        }
+    }
+
+    pub(crate) fn file_size(&self) -> Off {
+        match self {
+            Self::Extended(extended) => extended.i_size,
+            Self::Compact(compact) => compact.i_size as u64,
+        }
+    }
+
+    pub(crate) fn inode_size(&self) -> Off {
+        match self {
+            Self::Extended(_) => 64,
+            Self::Compact(_) => 32,
+        }
+    }
+
+    pub(crate) fn spec(&self) -> Spec {
+        let mode = match self {
+            Self::Extended(extended) => extended.i_mode,
+            Self::Compact(compact) => compact.i_mode,
+        };
+
+        let u = match self {
+            Self::Extended(extended) => &extended.i_u,
+            Self::Compact(compact) => &compact.i_u,
+        };
+
+        match mode & 0o170000 {
+            0o40000 | 0o100000 | 0o120000 => Spec::from((u, self.format().layout())),
+            // We don't support device inodes currently.
+            _ => Spec::Unknown,
+        }
+    }
+
+    pub(crate) fn inode_type(&self) -> Type {
+        let mode = match self {
+            Self::Extended(extended) => extended.i_mode,
+            Self::Compact(compact) => compact.i_mode,
+        };
+        match mode & Self::S_IFMT {
+            Self::S_IFDIR => Type::Directory, // Directory
+            Self::S_IFREG => Type::Regular,   // Regular File
+            Self::S_IFLNK => Type::Link,      // Symbolic Link
+            Self::S_IFIFO => Type::Fifo,      // FIFO
+            Self::S_IFSOCK => Type::Socket,   // Socket
+            Self::S_IFBLK => Type::Block,     // Block
+            Self::S_IFCHR => Type::Character, // Character
+            _ => Type::Unknown,
+        }
+    }
+
+    pub(crate) fn xattr_size(&self) -> Off {
+        match self {
+            Self::Extended(extended) => {
+                size_of::<XAttrSharedEntrySummary>() as Off
+                    + (size_of::<c_int>() as Off) * (extended.i_xattr_icount as Off - 1)
+            }
+            Self::Compact(_) => 0,
+        }
+    }
+
+    pub(crate) fn xattr_count(&self) -> u16 {
+        match self {
+            Self::Extended(extended) => extended.i_xattr_icount,
+            Self::Compact(compact) => compact.i_xattr_icount,
+        }
+    }
+}
+
+pub(crate) type CompactInodeInfoBuf = [u8; size_of::<CompactInodeInfo>()];
+pub(crate) type ExtendedInodeInfoBuf = [u8; size_of::<ExtendedInodeInfo>()];
+pub(crate) const DEFAULT_INODE_BUF: ExtendedInodeInfoBuf = [0; size_of::<ExtendedInodeInfo>()];
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 06/24] erofs: add alloc_helper in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (4 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 05/24] erofs: add inode " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 07/24] erofs: add data abstraction " Yiyang Wu
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

In normal rust, heap related operations are infallible meaning
that they do not throw errors and Rust will panic in usermode instead.
However in kernel, it will throw AllocError this module helps to
bridge the gaps and returns Errno universally.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys.rs              |  1 +
 fs/erofs/rust/erofs_sys/alloc_helper.rs | 35 +++++++++++++++++++++++++
 2 files changed, 36 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys/alloc_helper.rs

diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
index 34267ec7772d..c6fd7f78ac97 100644
--- a/fs/erofs/rust/erofs_sys.rs
+++ b/fs/erofs/rust/erofs_sys.rs
@@ -23,6 +23,7 @@
 /// to avoid naming conflicts.
 pub(crate) type PosixResult<T> = Result<T, Errno>;
 
+pub(crate) mod alloc_helper;
 pub(crate) mod errnos;
 pub(crate) mod inode;
 pub(crate) mod superblock;
diff --git a/fs/erofs/rust/erofs_sys/alloc_helper.rs b/fs/erofs/rust/erofs_sys/alloc_helper.rs
new file mode 100644
index 000000000000..05ef2018d379
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/alloc_helper.rs
@@ -0,0 +1,35 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+/// This module provides helper functions for the alloc crate
+/// Note that in linux kernel, the allocation is fallible however in userland it is not.
+/// Since most of the functions depend on infallible allocation, here we provide helper functions
+/// so that most of codes don't need to be changed.
+
+#[cfg(CONFIG_EROFS_FS = "y")]
+use kernel::prelude::*;
+
+#[cfg(not(CONFIG_EROFS_FS = "y"))]
+use alloc::vec;
+
+use super::*;
+use alloc::boxed::Box;
+use alloc::vec::Vec;
+
+pub(crate) fn push_vec<T>(v: &mut Vec<T>, value: T) -> PosixResult<()> {
+    v.push(value, GFP_KERNEL)
+        .map_or_else(|_| Err(Errno::ENOMEM), |_| Ok(()))
+}
+
+pub(crate) fn extend_from_slice<T: Clone>(v: &mut Vec<T>, slice: &[T]) -> PosixResult<()> {
+    v.extend_from_slice(slice, GFP_KERNEL)
+        .map_or_else(|_| Err(Errno::ENOMEM), |_| Ok(()))
+}
+
+pub(crate) fn heap_alloc<T>(value: T) -> PosixResult<Box<T>> {
+    Box::new(value, GFP_KERNEL).map_or_else(|_| Err(Errno::ENOMEM), |v| Ok(v))
+}
+
+pub(crate) fn vec_with_capacity<T: Default + Clone>(capacity: usize) -> PosixResult<Vec<T>> {
+    Vec::with_capacity(capacity, GFP_KERNEL).map_or_else(|_| Err(Errno::ENOMEM), |v| Ok(v))
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 07/24] erofs: add data abstraction in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (5 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 06/24] erofs: add alloc_helper " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 08/24] erofs: add device data structure " Yiyang Wu
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

Introduce Buffer, Source, Backend traits.

Implement Uncompressed Backend and RefBuffer to be
used in future data operations.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys.rs                    |  1 +
 fs/erofs/rust/erofs_sys/data.rs               | 62 +++++++++++++++++++
 fs/erofs/rust/erofs_sys/data/backends.rs      |  4 ++
 .../erofs_sys/data/backends/uncompressed.rs   | 39 ++++++++++++
 4 files changed, 106 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys/data.rs
 create mode 100644 fs/erofs/rust/erofs_sys/data/backends.rs
 create mode 100644 fs/erofs/rust/erofs_sys/data/backends/uncompressed.rs

diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
index c6fd7f78ac97..8cca2cd9b75f 100644
--- a/fs/erofs/rust/erofs_sys.rs
+++ b/fs/erofs/rust/erofs_sys.rs
@@ -24,6 +24,7 @@
 pub(crate) type PosixResult<T> = Result<T, Errno>;
 
 pub(crate) mod alloc_helper;
+pub(crate) mod data;
 pub(crate) mod errnos;
 pub(crate) mod inode;
 pub(crate) mod superblock;
diff --git a/fs/erofs/rust/erofs_sys/data.rs b/fs/erofs/rust/erofs_sys/data.rs
new file mode 100644
index 000000000000..284c8b1f3bd4
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/data.rs
@@ -0,0 +1,62 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+pub(crate) mod backends;
+use super::*;
+
+/// Represent some sort of generic data source. This cound be file, memory or even network.
+/// Note that users should never use this directly please use backends instead.
+pub(crate) trait Source {
+    fn fill(&self, data: &mut [u8], offset: Off) -> PosixResult<u64>;
+    fn as_buf<'a>(&'a self, offset: Off, len: Off) -> PosixResult<RefBuffer<'a>>;
+}
+
+/// Represents a generic data access backend that is backed by some sort of data source.
+/// This often has temporary buffers to decompress the data from the data source.
+/// The method signatures are the same as those of the Source trait.
+pub(crate) trait Backend {
+    fn fill(&self, data: &mut [u8], offset: Off) -> PosixResult<u64>;
+    fn as_buf<'a>(&'a self, offset: Off, len: Off) -> PosixResult<RefBuffer<'a>>;
+}
+
+/// Represents a buffer trait which can yield its internal reference or be casted as an iterator of
+/// DirEntries.
+pub(crate) trait Buffer {
+    fn content(&self) -> &[u8];
+}
+
+/// Represents a buffer that holds a reference to a slice of data that
+/// is borrowed from the thin air.
+pub(crate) struct RefBuffer<'a> {
+    buf: &'a [u8],
+    start: usize,
+    len: usize,
+    put_buf: fn(*mut core::ffi::c_void),
+}
+
+impl<'a> Buffer for RefBuffer<'a> {
+    fn content(&self) -> &[u8] {
+        &self.buf[self.start..self.start + self.len]
+    }
+}
+
+impl<'a> RefBuffer<'a> {
+    pub(crate) fn new(
+        buf: &'a [u8],
+        start: usize,
+        len: usize,
+        put_buf: fn(*mut core::ffi::c_void),
+    ) -> Self {
+        Self {
+            buf,
+            start,
+            len,
+            put_buf,
+        }
+    }
+}
+
+impl<'a> Drop for RefBuffer<'a> {
+    fn drop(&mut self) {
+        (self.put_buf)(self.buf.as_ptr() as *mut core::ffi::c_void)
+    }
+}
diff --git a/fs/erofs/rust/erofs_sys/data/backends.rs b/fs/erofs/rust/erofs_sys/data/backends.rs
new file mode 100644
index 000000000000..3249f1af8be7
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/data/backends.rs
@@ -0,0 +1,4 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+pub(crate) mod uncompressed;
diff --git a/fs/erofs/rust/erofs_sys/data/backends/uncompressed.rs b/fs/erofs/rust/erofs_sys/data/backends/uncompressed.rs
new file mode 100644
index 000000000000..c1b1a60258f8
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/data/backends/uncompressed.rs
@@ -0,0 +1,39 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use super::super::*;
+
+pub(crate) struct UncompressedBackend<T>
+where
+    T: Source,
+{
+    source: T,
+}
+
+impl<T> Backend for UncompressedBackend<T>
+where
+    T: Source,
+{
+    fn fill(&self, data: &mut [u8], offset: Off) -> PosixResult<u64> {
+        self.source.fill(data, offset)
+    }
+
+    fn as_buf<'a>(&'a self, offset: Off, len: Off) -> PosixResult<RefBuffer<'a>> {
+        self.source.as_buf(offset, len)
+    }
+}
+
+impl<T: Source> UncompressedBackend<T> {
+    pub(crate) fn new(source: T) -> Self {
+        Self { source }
+    }
+}
+
+impl<T> From<T> for UncompressedBackend<T>
+where
+    T: Source,
+{
+    fn from(value: T) -> Self {
+        Self::new(value)
+    }
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 08/24] erofs: add device data structure in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (6 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 07/24] erofs: add data abstraction " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 09/24] erofs: add continuous iterators " Yiyang Wu
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduce device data structure in Rust.
It can later support chunk based block maps.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys.rs         |  1 +
 fs/erofs/rust/erofs_sys/devices.rs | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys/devices.rs

diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
index 8cca2cd9b75f..f1a1e491caec 100644
--- a/fs/erofs/rust/erofs_sys.rs
+++ b/fs/erofs/rust/erofs_sys.rs
@@ -25,6 +25,7 @@
 
 pub(crate) mod alloc_helper;
 pub(crate) mod data;
+pub(crate) mod devices;
 pub(crate) mod errnos;
 pub(crate) mod inode;
 pub(crate) mod superblock;
diff --git a/fs/erofs/rust/erofs_sys/devices.rs b/fs/erofs/rust/erofs_sys/devices.rs
new file mode 100644
index 000000000000..097676ee8720
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/devices.rs
@@ -0,0 +1,28 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use alloc::vec::Vec;
+
+/// Device specification.
+#[derive(Copy, Clone, Debug)]
+pub(crate) struct DeviceSpec {
+    pub(crate) tags: [u8; 64],
+    pub(crate) blocks: u32,
+    pub(crate) mapped_blocks: u32,
+}
+
+/// Device slot.
+#[derive(Copy, Clone, Debug)]
+#[repr(C)]
+pub(crate) struct DeviceSlot {
+    tags: [u8; 64],
+    blocks: u32,
+    mapped_blocks: u32,
+    reserved: [u8; 56],
+}
+
+/// Device information.
+pub(crate) struct DeviceInfo {
+    pub(crate) mask: u16,
+    pub(crate) specs: Vec<DeviceSpec>,
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 09/24] erofs: add continuous iterators in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (7 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 08/24] erofs: add device data structure " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 10/24] erofs: add device_infos implementation " Yiyang Wu
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch adds a special iterator that is capable of iterating over a
memory region in the granularity of a common page. This can be later
used to read device buffer or fast symlink.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys/data.rs               |  2 +
 fs/erofs/rust/erofs_sys/data/raw_iters.rs     |  6 ++
 .../rust/erofs_sys/data/raw_iters/ref_iter.rs | 68 +++++++++++++++++++
 .../rust/erofs_sys/data/raw_iters/traits.rs   | 13 ++++
 4 files changed, 89 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys/data/raw_iters.rs
 create mode 100644 fs/erofs/rust/erofs_sys/data/raw_iters/ref_iter.rs
 create mode 100644 fs/erofs/rust/erofs_sys/data/raw_iters/traits.rs

diff --git a/fs/erofs/rust/erofs_sys/data.rs b/fs/erofs/rust/erofs_sys/data.rs
index 284c8b1f3bd4..483f3204ce42 100644
--- a/fs/erofs/rust/erofs_sys/data.rs
+++ b/fs/erofs/rust/erofs_sys/data.rs
@@ -1,6 +1,8 @@
 // Copyright 2024 Yiyang Wu
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 pub(crate) mod backends;
+pub(crate) mod raw_iters;
+use super::superblock::*;
 use super::*;
 
 /// Represent some sort of generic data source. This cound be file, memory or even network.
diff --git a/fs/erofs/rust/erofs_sys/data/raw_iters.rs b/fs/erofs/rust/erofs_sys/data/raw_iters.rs
new file mode 100644
index 000000000000..8f3bd250d252
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/data/raw_iters.rs
@@ -0,0 +1,6 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+pub(crate) mod ref_iter;
+mod traits;
+pub(crate) use traits::*;
diff --git a/fs/erofs/rust/erofs_sys/data/raw_iters/ref_iter.rs b/fs/erofs/rust/erofs_sys/data/raw_iters/ref_iter.rs
new file mode 100644
index 000000000000..5aa2b7f44f3d
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/data/raw_iters/ref_iter.rs
@@ -0,0 +1,68 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use super::super::*;
+use super::*;
+
+/// Continous Ref Buffer Iterator which iterates over a range of disk addresses within the
+/// the temp block size. Since the temp block is always the same size as page and it will not
+/// overflow.
+pub(crate) struct ContinuousRefIter<'a, B>
+where
+    B: Backend,
+{
+    sb: &'a SuperBlock,
+    backend: &'a B,
+    offset: Off,
+    len: Off,
+}
+
+impl<'a, B> ContinuousRefIter<'a, B>
+where
+    B: Backend,
+{
+    pub(crate) fn new(sb: &'a SuperBlock, backend: &'a B, offset: Off, len: Off) -> Self {
+        Self {
+            sb,
+            backend,
+            offset,
+            len,
+        }
+    }
+}
+
+impl<'a, B> Iterator for ContinuousRefIter<'a, B>
+where
+    B: Backend,
+{
+    type Item = PosixResult<RefBuffer<'a>>;
+    fn next(&mut self) -> Option<Self::Item> {
+        if self.len == 0 {
+            return None;
+        }
+        let accessor = self.sb.blk_access(self.offset);
+        let len = accessor.len.min(self.len);
+        let result: Option<Self::Item> = self.backend.as_buf(self.offset, len).map_or_else(
+            |e| Some(Err(e)),
+            |buf| {
+                self.offset += len;
+                self.len -= len;
+                Some(Ok(buf))
+            },
+        );
+        result
+    }
+}
+
+impl<'a, B> ContinuousBufferIter<'a> for ContinuousRefIter<'a, B>
+where
+    B: Backend,
+{
+    fn advance_off(&mut self, offset: Off) {
+        self.offset += offset;
+        self.len -= offset
+    }
+    fn eof(&self) -> bool {
+        self.len == 0
+    }
+}
diff --git a/fs/erofs/rust/erofs_sys/data/raw_iters/traits.rs b/fs/erofs/rust/erofs_sys/data/raw_iters/traits.rs
new file mode 100644
index 000000000000..90b6a51658a9
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/data/raw_iters/traits.rs
@@ -0,0 +1,13 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use super::super::*;
+
+/// Represents a basic iterator over a range of bytes from data backends.
+/// Note that this is skippable and can be used to move the iterator's cursor forward.
+pub(crate) trait ContinuousBufferIter<'a>:
+    Iterator<Item = PosixResult<RefBuffer<'a>>>
+{
+    fn advance_off(&mut self, offset: Off);
+    fn eof(&self) -> bool;
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 10/24] erofs: add device_infos implementation in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (8 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 09/24] erofs: add continuous iterators " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-21  9:44   ` Jianan Huang
  2024-09-16 13:56 ` [RFC PATCH 11/24] erofs: add map data structure " Yiyang Wu
                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

Add device_infos implementation in rust. It will later be used
to be put inside the SuperblockInfo. This mask and spec can later
be used to chunk-based image file block mapping.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys/devices.rs | 47 ++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/fs/erofs/rust/erofs_sys/devices.rs b/fs/erofs/rust/erofs_sys/devices.rs
index 097676ee8720..7495164c7bd0 100644
--- a/fs/erofs/rust/erofs_sys/devices.rs
+++ b/fs/erofs/rust/erofs_sys/devices.rs
@@ -1,6 +1,10 @@
 // Copyright 2024 Yiyang Wu
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 
+use super::alloc_helper::*;
+use super::data::raw_iters::*;
+use super::data::*;
+use super::*;
 use alloc::vec::Vec;
 
 /// Device specification.
@@ -21,8 +25,51 @@ pub(crate) struct DeviceSlot {
     reserved: [u8; 56],
 }
 
+impl From<[u8; 128]> for DeviceSlot {
+    fn from(data: [u8; 128]) -> Self {
+        Self {
+            tags: data[0..64].try_into().unwrap(),
+            blocks: u32::from_le_bytes([data[64], data[65], data[66], data[67]]),
+            mapped_blocks: u32::from_le_bytes([data[68], data[69], data[70], data[71]]),
+            reserved: data[72..128].try_into().unwrap(),
+        }
+    }
+}
+
 /// Device information.
 pub(crate) struct DeviceInfo {
     pub(crate) mask: u16,
     pub(crate) specs: Vec<DeviceSpec>,
 }
+
+pub(crate) fn get_device_infos<'a>(
+    iter: &mut (dyn ContinuousBufferIter<'a> + 'a),
+) -> PosixResult<DeviceInfo> {
+    let mut specs = Vec::new();
+    for data in iter {
+        let buffer = data?;
+        let mut cur: usize = 0;
+        let len = buffer.content().len();
+        while cur + 128 <= len {
+            let slot_data: [u8; 128] = buffer.content()[cur..cur + 128].try_into().unwrap();
+            let slot = DeviceSlot::from(slot_data);
+            cur += 128;
+            push_vec(
+                &mut specs,
+                DeviceSpec {
+                    tags: slot.tags,
+                    blocks: slot.blocks,
+                    mapped_blocks: slot.mapped_blocks,
+                },
+            )?;
+        }
+    }
+
+    let mask = if specs.is_empty() {
+        0
+    } else {
+        (1 << (specs.len().ilog2() + 1)) - 1
+    };
+
+    Ok(DeviceInfo { mask, specs })
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 11/24] erofs: add map data structure in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (9 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 10/24] erofs: add device_infos implementation " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 12/24] erofs: add directory entry " Yiyang Wu
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduce core map flags and runtime map data structure
in Rust. This will later be used to do iomapping.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys.rs     |  1 +
 fs/erofs/rust/erofs_sys/map.rs | 45 ++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys/map.rs

diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
index f1a1e491caec..15ed65866097 100644
--- a/fs/erofs/rust/erofs_sys.rs
+++ b/fs/erofs/rust/erofs_sys.rs
@@ -28,6 +28,7 @@
 pub(crate) mod devices;
 pub(crate) mod errnos;
 pub(crate) mod inode;
+pub(crate) mod map;
 pub(crate) mod superblock;
 pub(crate) mod xattrs;
 pub(crate) use errnos::Errno;
diff --git a/fs/erofs/rust/erofs_sys/map.rs b/fs/erofs/rust/erofs_sys/map.rs
new file mode 100644
index 000000000000..757e8083c8f1
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/map.rs
@@ -0,0 +1,45 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use super::*;
+pub(crate) const MAP_MAPPED: u32 = 0x0001;
+pub(crate) const MAP_META: u32 = 0x0002;
+pub(crate) const MAP_ENCODED: u32 = 0x0004;
+pub(crate) const MAP_FULL_MAPPED: u32 = 0x0008;
+pub(crate) const MAP_FRAGMENT: u32 = 0x0010;
+pub(crate) const MAP_PARTIAL_REF: u32 = 0x0020;
+
+#[derive(Debug, Default)]
+#[repr(C)]
+pub(crate) struct Segment {
+    pub(crate) start: Off,
+    pub(crate) len: Off,
+}
+
+#[derive(Debug, Default)]
+#[repr(C)]
+pub(crate) struct Map {
+    pub(crate) logical: Segment,
+    pub(crate) physical: Segment,
+    pub(crate) device_id: u16,
+    pub(crate) algorithm_format: u16,
+    pub(crate) map_type: MapType,
+}
+
+#[derive(Debug, Default)]
+pub(crate) enum MapType {
+    Meta,
+    #[default]
+    Normal,
+}
+
+impl From<MapType> for u32 {
+    fn from(value: MapType) -> Self {
+        match value {
+            MapType::Meta => MAP_META | MAP_MAPPED,
+            MapType::Normal => MAP_MAPPED,
+        }
+    }
+}
+
+pub(crate) type MapResult = PosixResult<Map>;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 12/24] erofs: add directory entry data structure in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (10 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 11/24] erofs: add map data structure " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 13/24] erofs: add runtime filesystem and inode " Yiyang Wu
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch adds DirentDesc and DirCollection in Rust.
It will later be used as helper to read_dir and lookup operations.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys.rs     |  1 +
 fs/erofs/rust/erofs_sys/dir.rs | 98 ++++++++++++++++++++++++++++++++++
 2 files changed, 99 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys/dir.rs

diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
index 15ed65866097..65dc563986c3 100644
--- a/fs/erofs/rust/erofs_sys.rs
+++ b/fs/erofs/rust/erofs_sys.rs
@@ -26,6 +26,7 @@
 pub(crate) mod alloc_helper;
 pub(crate) mod data;
 pub(crate) mod devices;
+pub(crate) mod dir;
 pub(crate) mod errnos;
 pub(crate) mod inode;
 pub(crate) mod map;
diff --git a/fs/erofs/rust/erofs_sys/dir.rs b/fs/erofs/rust/erofs_sys/dir.rs
new file mode 100644
index 000000000000..d4255582b7c0
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/dir.rs
@@ -0,0 +1,98 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+/// On-disk Directory Descriptor Format for EROFS
+/// Documented on [EROFS Directory](https://erofs.docs.kernel.org/en/latest/core_ondisk.html#directories)
+use core::mem::size_of;
+
+#[repr(C, packed)]
+#[derive(Debug, Clone, Copy)]
+pub(crate) struct DirentDesc {
+    pub(crate) nid: u64,
+    pub(crate) nameoff: u16,
+    pub(crate) file_type: u8,
+    pub(crate) reserved: u8,
+}
+
+/// In memory representation of a real directory entry.
+#[derive(Debug, Clone, Copy)]
+pub(crate) struct Dirent<'a> {
+    pub(crate) desc: DirentDesc,
+    pub(crate) name: &'a [u8],
+}
+
+impl From<[u8; size_of::<DirentDesc>()]> for DirentDesc {
+    fn from(data: [u8; size_of::<DirentDesc>()]) -> Self {
+        Self {
+            nid: u64::from_le_bytes([
+                data[0], data[1], data[2], data[3], data[4], data[5], data[6], data[7],
+            ]),
+            nameoff: u16::from_le_bytes([data[8], data[9]]),
+            file_type: data[10],
+            reserved: data[11],
+        }
+    }
+}
+
+/// Create a collection of directory entries from a buffer.
+/// This is a helper struct to iterate over directory entries.
+pub(crate) struct DirCollection<'a> {
+    data: &'a [u8],
+    offset: usize,
+    total: usize,
+}
+
+impl<'a> DirCollection<'a> {
+    pub(crate) fn new(buffer: &'a [u8]) -> Self {
+        let desc: &DirentDesc = unsafe { &*(buffer.as_ptr() as *const DirentDesc) };
+        Self {
+            data: buffer,
+            offset: 0,
+            total: desc.nameoff as usize / core::mem::size_of::<DirentDesc>(),
+        }
+    }
+    pub(crate) fn dirent(&self, index: usize) -> Option<Dirent<'a>> {
+        let descs: &'a [[u8; size_of::<DirentDesc>()]] =
+            unsafe { core::slice::from_raw_parts(self.data.as_ptr().cast(), self.total) };
+        if index >= self.total {
+            None
+        } else if index == self.total - 1 {
+            let desc = DirentDesc::from(descs[index]);
+            let len = self.data.len() - desc.nameoff as usize;
+            Some(Dirent {
+                desc,
+                name: &self.data[desc.nameoff as usize..(desc.nameoff as usize) + len],
+            })
+        } else {
+            let desc = DirentDesc::from(descs[index]);
+            let next_desc = DirentDesc::from(descs[index + 1]);
+            let len = (next_desc.nameoff - desc.nameoff) as usize;
+            Some(Dirent {
+                desc,
+                name: &self.data[desc.nameoff as usize..(desc.nameoff as usize) + len],
+            })
+        }
+    }
+    pub(crate) fn skip_dir(&mut self, offset: usize) {
+        self.offset += offset;
+    }
+    pub(crate) fn total(&self) -> usize {
+        self.total
+    }
+}
+
+impl<'a> Iterator for DirCollection<'a> {
+    type Item = Dirent<'a>;
+    fn next(&mut self) -> Option<Self::Item> {
+        self.dirent(self.offset).map(|x| {
+            self.offset += 1;
+            x
+        })
+    }
+}
+
+impl<'a> Dirent<'a> {
+    pub(crate) fn dirname(&self) -> &'a [u8] {
+        self.name
+    }
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 13/24] erofs: add runtime filesystem and inode in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (11 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 12/24] erofs: add directory entry " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 14/24] erofs: add block mapping capability " Yiyang Wu
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduces Filesystem Trait and Inode trait in Rust.
It also implements a memory backed filesystem in Rust which can be later
hooks up the metabuf system in erofs.

This patch also comes with a InodeCollection trait which can be later be
hooked up with the iget5_locked.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys.rs                |   2 +-
 fs/erofs/rust/erofs_sys/inode.rs          | 106 ++++++++++++++++++++++
 fs/erofs/rust/erofs_sys/superblock.rs     |  42 ++++++++-
 fs/erofs/rust/erofs_sys/superblock/mem.rs |  61 +++++++++++++
 4 files changed, 209 insertions(+), 2 deletions(-)
 create mode 100644 fs/erofs/rust/erofs_sys/superblock/mem.rs

diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
index 65dc563986c3..20c0aa81a800 100644
--- a/fs/erofs/rust/erofs_sys.rs
+++ b/fs/erofs/rust/erofs_sys.rs
@@ -32,7 +32,7 @@
 pub(crate) mod map;
 pub(crate) mod superblock;
 pub(crate) mod xattrs;
-pub(crate) use errnos::Errno;
+pub(crate) use errnos::{Errno, Errno::*};
 
 /// Helper macro to round up or down a number.
 #[macro_export]
diff --git a/fs/erofs/rust/erofs_sys/inode.rs b/fs/erofs/rust/erofs_sys/inode.rs
index 1762023e97f8..1ecd6147a126 100644
--- a/fs/erofs/rust/erofs_sys/inode.rs
+++ b/fs/erofs/rust/erofs_sys/inode.rs
@@ -1,3 +1,7 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use super::superblock::*;
 use super::xattrs::*;
 use super::*;
 use core::ffi::*;
@@ -289,3 +293,105 @@ pub(crate) fn xattr_count(&self) -> u16 {
 pub(crate) type CompactInodeInfoBuf = [u8; size_of::<CompactInodeInfo>()];
 pub(crate) type ExtendedInodeInfoBuf = [u8; size_of::<ExtendedInodeInfo>()];
 pub(crate) const DEFAULT_INODE_BUF: ExtendedInodeInfoBuf = [0; size_of::<ExtendedInodeInfo>()];
+
+/// The inode trait which represents the inode in the filesystem.
+pub(crate) trait Inode: Sized {
+    fn new(_sb: &SuperBlock, info: InodeInfo, nid: Nid) -> Self;
+    fn info(&self) -> &InodeInfo;
+    fn nid(&self) -> Nid;
+}
+
+/// Represents the error which occurs when trying to convert the inode.
+#[derive(Debug)]
+pub(crate) enum InodeError {
+    VersionError,
+    PosixError(Errno),
+}
+
+impl TryFrom<CompactInodeInfoBuf> for CompactInodeInfo {
+    type Error = InodeError;
+    fn try_from(value: CompactInodeInfoBuf) -> Result<Self, Self::Error> {
+        let inode: CompactInodeInfo = Self {
+            i_format: Format(u16::from_le_bytes([value[0], value[1]])),
+            i_xattr_icount: u16::from_le_bytes([value[2], value[3]]),
+            i_mode: u16::from_le_bytes([value[4], value[5]]),
+            i_nlink: u16::from_le_bytes([value[6], value[7]]),
+            i_size: u32::from_le_bytes([value[8], value[9], value[10], value[11]]),
+            i_reserved: value[12..16].try_into().unwrap(),
+            i_u: value[16..20].try_into().unwrap(),
+            i_ino: u32::from_le_bytes([value[20], value[21], value[22], value[23]]),
+            i_uid: u16::from_le_bytes([value[24], value[25]]),
+            i_gid: u16::from_le_bytes([value[26], value[27]]),
+            i_reserved2: value[28..32].try_into().unwrap(),
+        };
+        let ifmt = &inode.i_format;
+        match ifmt.version() {
+            Version::Compat => Ok(inode),
+            Version::Extended => Err(InodeError::VersionError),
+            _ => Err(InodeError::PosixError(EOPNOTSUPP)),
+        }
+    }
+}
+
+impl<I> TryFrom<(&dyn FileSystem<I>, Nid)> for InodeInfo
+where
+    I: Inode,
+{
+    type Error = Errno;
+    fn try_from(value: (&dyn FileSystem<I>, Nid)) -> Result<Self, Self::Error> {
+        let f = value.0;
+        let sb = f.superblock();
+        let nid = value.1;
+        let offset = sb.iloc(nid);
+        let accessor = sb.blk_access(offset);
+        let mut buf: ExtendedInodeInfoBuf = DEFAULT_INODE_BUF;
+        f.backend().fill(&mut buf[0..32], offset)?;
+        let compact_buf: CompactInodeInfoBuf = buf[0..32].try_into().unwrap();
+        let r: Result<CompactInodeInfo, InodeError> = CompactInodeInfo::try_from(compact_buf);
+        match r {
+            Ok(compact) => Ok(InodeInfo::Compact(compact)),
+            Err(e) => match e {
+                InodeError::VersionError => {
+                    let gotten = (sb.blksz() - accessor.off + 32).min(64);
+                    f.backend()
+                        .fill(&mut buf[32..(32 + gotten).min(64) as usize], offset + 32)?;
+
+                    if gotten < 32 {
+                        f.backend().fill(
+                            &mut buf[(32 + gotten) as usize..64],
+                            sb.blkpos(sb.blknr(offset) + 1),
+                        )?;
+                    }
+                    Ok(InodeInfo::Extended(ExtendedInodeInfo {
+                        i_format: Format(u16::from_le_bytes([buf[0], buf[1]])),
+                        i_xattr_icount: u16::from_le_bytes([buf[2], buf[3]]),
+                        i_mode: u16::from_le_bytes([buf[4], buf[5]]),
+                        i_reserved: buf[6..8].try_into().unwrap(),
+                        i_size: u64::from_le_bytes([
+                            buf[8], buf[9], buf[10], buf[11], buf[12], buf[13], buf[14], buf[15],
+                        ]),
+                        i_u: buf[16..20].try_into().unwrap(),
+                        i_ino: u32::from_le_bytes([buf[20], buf[21], buf[22], buf[23]]),
+                        i_uid: u32::from_le_bytes([buf[24], buf[25], buf[26], buf[27]]),
+                        i_gid: u32::from_le_bytes([buf[28], buf[29], buf[30], buf[31]]),
+                        i_mtime: u64::from_le_bytes([
+                            buf[32], buf[33], buf[34], buf[35], buf[36], buf[37], buf[38], buf[39],
+                        ]),
+                        i_mtime_nsec: u32::from_le_bytes([buf[40], buf[41], buf[42], buf[43]]),
+                        i_nlink: u32::from_le_bytes([buf[44], buf[45], buf[46], buf[47]]),
+                        i_reserved2: buf[48..64].try_into().unwrap(),
+                    }))
+                }
+                InodeError::PosixError(e) => Err(e),
+            },
+        }
+    }
+}
+
+/// Represents the inode collection which is a hashmap of inodes.
+pub(crate) trait InodeCollection {
+    type I: Inode + Sized;
+
+    fn iget(&mut self, nid: Nid, filesystem: &dyn FileSystem<Self::I>)
+        -> PosixResult<&mut Self::I>;
+}
diff --git a/fs/erofs/rust/erofs_sys/superblock.rs b/fs/erofs/rust/erofs_sys/superblock.rs
index 213be6dbc553..940ab0b03a26 100644
--- a/fs/erofs/rust/erofs_sys/superblock.rs
+++ b/fs/erofs/rust/erofs_sys/superblock.rs
@@ -1,9 +1,15 @@
 // Copyright 2024 Yiyang Wu
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 
-use super::*;
+pub(crate) mod mem;
+use alloc::boxed::Box;
 use core::mem::size_of;
 
+use super::data::*;
+use super::devices::*;
+use super::inode::*;
+use super::*;
+
 /// The ondisk superblock structure.
 #[derive(Debug, Clone, Copy, Default)]
 #[repr(C)]
@@ -130,3 +136,37 @@ pub(crate) fn iloc(&self, nid: Nid) -> Off {
         self.blkpos(self.meta_blkaddr) + ((nid as Off) << (5 as Off))
     }
 }
+
+pub(crate) trait FileSystem<I>
+where
+    I: Inode,
+{
+    fn superblock(&self) -> &SuperBlock;
+    fn backend(&self) -> &dyn Backend;
+    fn as_filesystem(&self) -> &dyn FileSystem<I>;
+    fn device_info(&self) -> &DeviceInfo;
+}
+
+pub(crate) struct SuperblockInfo<I, C, T>
+where
+    I: Inode,
+    C: InodeCollection<I = I>,
+{
+    pub(crate) filesystem: Box<dyn FileSystem<I>>,
+    pub(crate) inodes: C,
+    pub(crate) opaque: T,
+}
+
+impl<I, C, T> SuperblockInfo<I, C, T>
+where
+    I: Inode,
+    C: InodeCollection<I = I>,
+{
+    pub(crate) fn new(fs: Box<dyn FileSystem<I>>, c: C, opaque: T) -> Self {
+        Self {
+            filesystem: fs,
+            inodes: c,
+            opaque,
+        }
+    }
+}
diff --git a/fs/erofs/rust/erofs_sys/superblock/mem.rs b/fs/erofs/rust/erofs_sys/superblock/mem.rs
new file mode 100644
index 000000000000..12bf797bd1e3
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/superblock/mem.rs
@@ -0,0 +1,61 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use super::data::raw_iters::ref_iter::*;
+use super::*;
+
+// Memory Mapped Device/File so we need to have some external lifetime on the backend trait.
+// Note that we do not want the lifetime to infect the MemFileSystem which may have a impact on
+// the content iter below. Just use HRTB to dodge the borrow checker.
+
+pub(crate) struct KernelFileSystem<B>
+where
+    B: Backend,
+{
+    backend: B,
+    sb: SuperBlock,
+    device_info: DeviceInfo,
+}
+
+impl<I, B> FileSystem<I> for KernelFileSystem<B>
+where
+    B: Backend,
+    I: Inode,
+{
+    fn superblock(&self) -> &SuperBlock {
+        &self.sb
+    }
+    fn backend(&self) -> &dyn Backend {
+        &self.backend
+    }
+
+    fn as_filesystem(&self) -> &dyn FileSystem<I> {
+        self
+    }
+
+    fn device_info(&self) -> &DeviceInfo {
+        &self.device_info
+    }
+}
+
+impl<B> KernelFileSystem<B>
+where
+    B: Backend,
+{
+    pub(crate) fn try_new(backend: B) -> PosixResult<Self> {
+        let mut buf = SUPERBLOCK_EMPTY_BUF;
+        backend.fill(&mut buf, EROFS_SUPER_OFFSET)?;
+        let sb: SuperBlock = buf.into();
+        let device_info = get_device_infos(&mut ContinuousRefIter::new(
+            &sb,
+            &backend,
+            sb.devt_slotoff as Off * 128,
+            sb.extra_devices as Off * 128,
+        ))?;
+        Ok(Self {
+            backend,
+            sb,
+            device_info,
+        })
+    }
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 14/24] erofs: add block mapping capability in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (12 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 13/24] erofs: add runtime filesystem and inode " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 15/24] erofs: add iter methods in filesystem " Yiyang Wu
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

Implement block mapping in rust and implement map iterators
over Inode which will be used in data access.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys/map.rs        |  54 +++++++++++
 fs/erofs/rust/erofs_sys/superblock.rs | 129 ++++++++++++++++++++++++++
 2 files changed, 183 insertions(+)

diff --git a/fs/erofs/rust/erofs_sys/map.rs b/fs/erofs/rust/erofs_sys/map.rs
index 757e8083c8f1..f56f31cefcd5 100644
--- a/fs/erofs/rust/erofs_sys/map.rs
+++ b/fs/erofs/rust/erofs_sys/map.rs
@@ -1,7 +1,10 @@
 // Copyright 2024 Yiyang Wu
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 
+use super::inode::*;
+use super::superblock::*;
 use super::*;
+
 pub(crate) const MAP_MAPPED: u32 = 0x0001;
 pub(crate) const MAP_META: u32 = 0x0002;
 pub(crate) const MAP_ENCODED: u32 = 0x0004;
@@ -43,3 +46,54 @@ fn from(value: MapType) -> Self {
 }
 
 pub(crate) type MapResult = PosixResult<Map>;
+
+/// Iterates over the data map represented by an inode.
+pub(crate) struct MapIter<'a, 'b, FS, I>
+where
+    FS: FileSystem<I>,
+    I: Inode,
+{
+    fs: &'a FS,
+    inode: &'b I,
+    offset: Off,
+    len: Off,
+}
+
+impl<'a, 'b, FS, I> MapIter<'a, 'b, FS, I>
+where
+    FS: FileSystem<I>,
+    I: Inode,
+{
+    pub(crate) fn new(fs: &'a FS, inode: &'b I, offset: Off) -> Self {
+        Self {
+            fs,
+            inode,
+            offset,
+            len: inode.info().file_size(),
+        }
+    }
+}
+
+impl<'a, 'b, FS, I> Iterator for MapIter<'a, 'b, FS, I>
+where
+    FS: FileSystem<I>,
+    I: Inode,
+{
+    type Item = MapResult;
+    fn next(&mut self) -> Option<Self::Item> {
+        if self.offset >= self.len {
+            None
+        } else {
+            let result = self.fs.map(self.inode, self.offset);
+            match result {
+                Ok(m) => {
+                    let accessor = self.fs.superblock().blk_access(m.physical.start);
+                    let len = m.physical.len.min(accessor.len);
+                    self.offset += len;
+                    Some(Ok(m))
+                }
+                Err(e) => Some(Err(e)),
+            }
+        }
+    }
+}
diff --git a/fs/erofs/rust/erofs_sys/superblock.rs b/fs/erofs/rust/erofs_sys/superblock.rs
index 940ab0b03a26..fc6b3cb00b18 100644
--- a/fs/erofs/rust/erofs_sys/superblock.rs
+++ b/fs/erofs/rust/erofs_sys/superblock.rs
@@ -8,8 +8,11 @@
 use super::data::*;
 use super::devices::*;
 use super::inode::*;
+use super::map::*;
 use super::*;
 
+use crate::round;
+
 /// The ondisk superblock structure.
 #[derive(Debug, Clone, Copy, Default)]
 #[repr(C)]
@@ -135,6 +138,10 @@ pub(crate) fn blk_round_up(&self, addr: Off) -> Blk {
     pub(crate) fn iloc(&self, nid: Nid) -> Off {
         self.blkpos(self.meta_blkaddr) + ((nid as Off) << (5 as Off))
     }
+    pub(crate) fn chunk_access(&self, format: ChunkFormat, address: Off) -> Accessor {
+        let chunkbits = format.chunkbits() + self.blkszbits as u16;
+        Accessor::new(address, chunkbits as Off)
+    }
 }
 
 pub(crate) trait FileSystem<I>
@@ -145,6 +152,128 @@ pub(crate) trait FileSystem<I>
     fn backend(&self) -> &dyn Backend;
     fn as_filesystem(&self) -> &dyn FileSystem<I>;
     fn device_info(&self) -> &DeviceInfo;
+    fn flatmap(&self, inode: &I, offset: Off, inline: bool) -> MapResult {
+        let sb = self.superblock();
+        let nblocks = sb.blk_round_up(inode.info().file_size());
+        let blkaddr = match inode.info().spec() {
+            Spec::RawBlk(blkaddr) => Ok(blkaddr),
+            _ => Err(EUCLEAN),
+        }?;
+
+        let lastblk = if inline { nblocks - 1 } else { nblocks };
+        if offset < sb.blkpos(lastblk) {
+            let len = inode.info().file_size().min(sb.blkpos(lastblk)) - offset;
+            Ok(Map {
+                logical: Segment { start: offset, len },
+                physical: Segment {
+                    start: sb.blkpos(blkaddr) + offset,
+                    len,
+                },
+                algorithm_format: 0,
+                device_id: 0,
+                map_type: MapType::Normal,
+            })
+        } else if inline {
+            let len = inode.info().file_size() - offset;
+            let accessor = sb.blk_access(offset);
+            Ok(Map {
+                logical: Segment { start: offset, len },
+                physical: Segment {
+                    start: sb.iloc(inode.nid())
+                        + inode.info().inode_size()
+                        + inode.info().xattr_size()
+                        + accessor.off,
+                    len,
+                },
+                algorithm_format: 0,
+                device_id: 0,
+                map_type: MapType::Meta,
+            })
+        } else {
+            Err(EUCLEAN)
+        }
+    }
+
+    fn chunk_map(&self, inode: &I, offset: Off) -> MapResult {
+        let sb = self.superblock();
+        let chunkformat = match inode.info().spec() {
+            Spec::Chunk(chunkformat) => Ok(chunkformat),
+            _ => Err(EUCLEAN),
+        }?;
+        let accessor = sb.chunk_access(chunkformat, offset);
+
+        if chunkformat.is_chunkindex() {
+            let unit = size_of::<ChunkIndex>() as Off;
+            let pos = round!(
+                UP,
+                self.superblock().iloc(inode.nid())
+                    + inode.info().inode_size()
+                    + inode.info().xattr_size()
+                    + unit * accessor.nr,
+                unit
+            );
+            let mut buf = [0u8; size_of::<ChunkIndex>()];
+            self.backend().fill(&mut buf, pos)?;
+            let chunk_index = ChunkIndex::from(buf);
+            if chunk_index.blkaddr == u32::MAX {
+                Err(EUCLEAN)
+            } else {
+                Ok(Map {
+                    logical: Segment {
+                        start: accessor.base + accessor.off,
+                        len: accessor.len,
+                    },
+                    physical: Segment {
+                        start: sb.blkpos(chunk_index.blkaddr) + accessor.off,
+                        len: accessor.len,
+                    },
+                    algorithm_format: 0,
+                    device_id: chunk_index.device_id & self.device_info().mask,
+                    map_type: MapType::Normal,
+                })
+            }
+        } else {
+            let unit = 4;
+            let pos = round!(
+                UP,
+                sb.iloc(inode.nid())
+                    + inode.info().inode_size()
+                    + inode.info().xattr_size()
+                    + unit * accessor.nr,
+                unit
+            );
+            let mut buf = [0u8; 4];
+            self.backend().fill(&mut buf, pos)?;
+            let blkaddr = u32::from_le_bytes(buf);
+            let len = accessor.len.min(inode.info().file_size() - offset);
+            if blkaddr == u32::MAX {
+                Err(EUCLEAN)
+            } else {
+                Ok(Map {
+                    logical: Segment {
+                        start: accessor.base + accessor.off,
+                        len,
+                    },
+                    physical: Segment {
+                        start: sb.blkpos(blkaddr) + accessor.off,
+                        len,
+                    },
+                    algorithm_format: 0,
+                    device_id: 0,
+                    map_type: MapType::Normal,
+                })
+            }
+        }
+    }
+
+    fn map(&self, inode: &I, offset: Off) -> MapResult {
+        match inode.info().format().layout() {
+            Layout::FlatInline => self.flatmap(inode, offset, true),
+            Layout::FlatPlain => self.flatmap(inode, offset, false),
+            Layout::Chunk => self.chunk_map(inode, offset),
+            _ => todo!(),
+        }
+    }
 }
 
 pub(crate) struct SuperblockInfo<I, C, T>
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 15/24] erofs: add iter methods in filesystem in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (13 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 14/24] erofs: add block mapping capability " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 16/24] erofs: implement dir and inode operations " Yiyang Wu
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

Implement mapped iter that uses the MapIter and can yield data that is
backed by EROFS inode.

Implement continuous_iter and mapped_iter for filesystem which can
returns an iterator that yields raw data.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys/data.rs               |  2 +
 .../rust/erofs_sys/data/raw_iters/ref_iter.rs | 63 +++++++++++++++++++
 .../rust/erofs_sys/data/raw_iters/traits.rs   |  4 ++
 fs/erofs/rust/erofs_sys/superblock.rs         | 13 ++++
 fs/erofs/rust/erofs_sys/superblock/mem.rs     | 22 +++++++
 5 files changed, 104 insertions(+)

diff --git a/fs/erofs/rust/erofs_sys/data.rs b/fs/erofs/rust/erofs_sys/data.rs
index 483f3204ce42..21630673c24e 100644
--- a/fs/erofs/rust/erofs_sys/data.rs
+++ b/fs/erofs/rust/erofs_sys/data.rs
@@ -2,6 +2,8 @@
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 pub(crate) mod backends;
 pub(crate) mod raw_iters;
+use super::inode::*;
+use super::map::*;
 use super::superblock::*;
 use super::*;
 
diff --git a/fs/erofs/rust/erofs_sys/data/raw_iters/ref_iter.rs b/fs/erofs/rust/erofs_sys/data/raw_iters/ref_iter.rs
index 5aa2b7f44f3d..d39c9523b628 100644
--- a/fs/erofs/rust/erofs_sys/data/raw_iters/ref_iter.rs
+++ b/fs/erofs/rust/erofs_sys/data/raw_iters/ref_iter.rs
@@ -4,6 +4,69 @@
 use super::super::*;
 use super::*;
 
+pub(crate) struct RefMapIter<'a, 'b, FS, B, I>
+where
+    FS: FileSystem<I>,
+    B: Backend,
+    I: Inode,
+{
+    sb: &'a SuperBlock,
+    backend: &'a B,
+    map_iter: MapIter<'a, 'b, FS, I>,
+}
+
+impl<'a, 'b, FS, B, I> RefMapIter<'a, 'b, FS, B, I>
+where
+    FS: FileSystem<I>,
+    B: Backend,
+    I: Inode,
+{
+    pub(crate) fn new(
+        sb: &'a SuperBlock,
+        backend: &'a B,
+        map_iter: MapIter<'a, 'b, FS, I>,
+    ) -> Self {
+        Self {
+            sb,
+            backend,
+            map_iter,
+        }
+    }
+}
+
+impl<'a, 'b, FS, B, I> Iterator for RefMapIter<'a, 'b, FS, B, I>
+where
+    FS: FileSystem<I>,
+    B: Backend,
+    I: Inode,
+{
+    type Item = PosixResult<RefBuffer<'a>>;
+    fn next(&mut self) -> Option<Self::Item> {
+        match self.map_iter.next() {
+            Some(map) => match map {
+                Ok(m) => {
+                    let accessor = self.sb.blk_access(m.physical.start);
+                    let len = m.physical.len.min(accessor.len);
+                    match self.backend.as_buf(m.physical.start, len) {
+                        Ok(buf) => Some(Ok(buf)),
+                        Err(e) => Some(Err(e)),
+                    }
+                }
+                Err(e) => Some(Err(e)),
+            },
+            None => None,
+        }
+    }
+}
+
+impl<'a, 'b, FS, B, I> BufferMapIter<'a> for RefMapIter<'a, 'b, FS, B, I>
+where
+    FS: FileSystem<I>,
+    B: Backend,
+    I: Inode,
+{
+}
+
 /// Continous Ref Buffer Iterator which iterates over a range of disk addresses within the
 /// the temp block size. Since the temp block is always the same size as page and it will not
 /// overflow.
diff --git a/fs/erofs/rust/erofs_sys/data/raw_iters/traits.rs b/fs/erofs/rust/erofs_sys/data/raw_iters/traits.rs
index 90b6a51658a9..531e970cdb49 100644
--- a/fs/erofs/rust/erofs_sys/data/raw_iters/traits.rs
+++ b/fs/erofs/rust/erofs_sys/data/raw_iters/traits.rs
@@ -3,6 +3,10 @@
 
 use super::super::*;
 
+/// Represents a basic iterator over a range of bytes from data backends.
+/// The access order is guided by the block maps from the filesystem.
+pub(crate) trait BufferMapIter<'a>: Iterator<Item = PosixResult<RefBuffer<'a>>> {}
+
 /// Represents a basic iterator over a range of bytes from data backends.
 /// Note that this is skippable and can be used to move the iterator's cursor forward.
 pub(crate) trait ContinuousBufferIter<'a>:
diff --git a/fs/erofs/rust/erofs_sys/superblock.rs b/fs/erofs/rust/erofs_sys/superblock.rs
index fc6b3cb00b18..f60657eff3d6 100644
--- a/fs/erofs/rust/erofs_sys/superblock.rs
+++ b/fs/erofs/rust/erofs_sys/superblock.rs
@@ -5,6 +5,7 @@
 use alloc::boxed::Box;
 use core::mem::size_of;
 
+use super::data::raw_iters::*;
 use super::data::*;
 use super::devices::*;
 use super::inode::*;
@@ -274,6 +275,18 @@ fn map(&self, inode: &I, offset: Off) -> MapResult {
             _ => todo!(),
         }
     }
+
+    fn mapped_iter<'b, 'a: 'b>(
+        &'a self,
+        inode: &'b I,
+        offset: Off,
+    ) -> PosixResult<Box<dyn BufferMapIter<'a> + 'b>>;
+
+    fn continuous_iter<'a>(
+        &'a self,
+        offset: Off,
+        len: Off,
+    ) -> PosixResult<Box<dyn ContinuousBufferIter<'a> + 'a>>;
 }
 
 pub(crate) struct SuperblockInfo<I, C, T>
diff --git a/fs/erofs/rust/erofs_sys/superblock/mem.rs b/fs/erofs/rust/erofs_sys/superblock/mem.rs
index 12bf797bd1e3..5756dc08744c 100644
--- a/fs/erofs/rust/erofs_sys/superblock/mem.rs
+++ b/fs/erofs/rust/erofs_sys/superblock/mem.rs
@@ -1,6 +1,7 @@
 // Copyright 2024 Yiyang Wu
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 
+use super::alloc_helper::*;
 use super::data::raw_iters::ref_iter::*;
 use super::*;
 
@@ -33,6 +34,27 @@ fn as_filesystem(&self) -> &dyn FileSystem<I> {
         self
     }
 
+    fn mapped_iter<'b, 'a: 'b>(
+        &'a self,
+        inode: &'b I,
+        offset: Off,
+    ) -> PosixResult<Box<dyn BufferMapIter<'a> + 'b>> {
+        heap_alloc(RefMapIter::new(
+            &self.sb,
+            &self.backend,
+            MapIter::new(self, inode, offset),
+        ))
+        .map(|v| v as Box<dyn BufferMapIter<'a> + 'b>)
+    }
+    fn continuous_iter<'a>(
+        &'a self,
+        offset: Off,
+        len: Off,
+    ) -> PosixResult<Box<dyn ContinuousBufferIter<'a> + 'a>> {
+        heap_alloc(ContinuousRefIter::new(&self.sb, &self.backend, offset, len))
+            .map(|v| v as Box<dyn ContinuousBufferIter<'a> + 'a>)
+    }
+
     fn device_info(&self) -> &DeviceInfo {
         &self.device_info
     }
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 16/24] erofs: implement dir and inode operations in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (14 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 15/24] erofs: add iter methods in filesystem " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 17/24] erofs: introduce Rust SBI to C Yiyang Wu
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

Implement dir ops and inode ops in Rust.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys.rs            |  1 +
 fs/erofs/rust/erofs_sys/data.rs       |  4 ++
 fs/erofs/rust/erofs_sys/operations.rs | 35 ++++++++++++++++
 fs/erofs/rust/erofs_sys/superblock.rs | 59 +++++++++++++++++++++++++++
 4 files changed, 99 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys/operations.rs

diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
index 20c0aa81a800..8c08ac347b2b 100644
--- a/fs/erofs/rust/erofs_sys.rs
+++ b/fs/erofs/rust/erofs_sys.rs
@@ -30,6 +30,7 @@
 pub(crate) mod errnos;
 pub(crate) mod inode;
 pub(crate) mod map;
+pub(crate) mod operations;
 pub(crate) mod superblock;
 pub(crate) mod xattrs;
 pub(crate) use errnos::{Errno, Errno::*};
diff --git a/fs/erofs/rust/erofs_sys/data.rs b/fs/erofs/rust/erofs_sys/data.rs
index 21630673c24e..67bb66ce9efb 100644
--- a/fs/erofs/rust/erofs_sys/data.rs
+++ b/fs/erofs/rust/erofs_sys/data.rs
@@ -2,6 +2,7 @@
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 pub(crate) mod backends;
 pub(crate) mod raw_iters;
+use super::dir::*;
 use super::inode::*;
 use super::map::*;
 use super::superblock::*;
@@ -26,6 +27,9 @@ pub(crate) trait Backend {
 /// DirEntries.
 pub(crate) trait Buffer {
     fn content(&self) -> &[u8];
+    fn iter_dir(&self) -> DirCollection<'_> {
+        DirCollection::new(self.content())
+    }
 }
 
 /// Represents a buffer that holds a reference to a slice of data that
diff --git a/fs/erofs/rust/erofs_sys/operations.rs b/fs/erofs/rust/erofs_sys/operations.rs
new file mode 100644
index 000000000000..070ba20908a2
--- /dev/null
+++ b/fs/erofs/rust/erofs_sys/operations.rs
@@ -0,0 +1,35 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use super::inode::*;
+use super::superblock::*;
+use super::*;
+
+pub(crate) fn read_inode<'a, I, C>(
+    filesystem: &'a dyn FileSystem<I>,
+    collection: &'a mut C,
+    nid: Nid,
+) -> PosixResult<&'a mut I>
+where
+    I: Inode,
+    C: InodeCollection<I = I>,
+{
+    collection.iget(nid, filesystem)
+}
+
+pub(crate) fn dir_lookup<'a, I, C>(
+    filesystem: &'a dyn FileSystem<I>,
+    collection: &'a mut C,
+    inode: &I,
+    name: &str,
+) -> PosixResult<&'a mut I>
+where
+    I: Inode,
+    C: InodeCollection<I = I>,
+{
+    filesystem
+        .find_nid(inode, name)?
+        .map_or(Err(Errno::ENOENT), |nid| {
+            read_inode(filesystem, collection, nid)
+        })
+}
diff --git a/fs/erofs/rust/erofs_sys/superblock.rs b/fs/erofs/rust/erofs_sys/superblock.rs
index f60657eff3d6..403ffdeb4573 100644
--- a/fs/erofs/rust/erofs_sys/superblock.rs
+++ b/fs/erofs/rust/erofs_sys/superblock.rs
@@ -8,6 +8,7 @@
 use super::data::raw_iters::*;
 use super::data::*;
 use super::devices::*;
+use super::dir::*;
 use super::inode::*;
 use super::map::*;
 use super::*;
@@ -287,6 +288,64 @@ fn continuous_iter<'a>(
         offset: Off,
         len: Off,
     ) -> PosixResult<Box<dyn ContinuousBufferIter<'a> + 'a>>;
+
+    // Inode related goes here.
+    fn read_inode_info(&self, nid: Nid) -> PosixResult<InodeInfo> {
+        (self.as_filesystem(), nid).try_into()
+    }
+
+    fn find_nid(&self, inode: &I, name: &str) -> PosixResult<Option<Nid>> {
+        for buf in self.mapped_iter(inode, 0)? {
+            for dirent in buf?.iter_dir() {
+                if dirent.dirname() == name.as_bytes() {
+                    return Ok(Some(dirent.desc.nid));
+                }
+            }
+        }
+        Ok(None)
+    }
+
+    // Readdir related goes here.
+    fn fill_dentries(
+        &self,
+        inode: &I,
+        offset: Off,
+        emitter: &mut dyn FnMut(Dirent<'_>, Off),
+    ) -> PosixResult<()> {
+        let sb = self.superblock();
+        let accessor = sb.blk_access(offset);
+        if offset > inode.info().file_size() {
+            return Err(EUCLEAN);
+        }
+
+        let map_offset = round!(DOWN, offset, sb.blksz());
+        let blk_offset = round!(UP, accessor.off, size_of::<DirentDesc>() as Off);
+
+        let mut map_iter = self.mapped_iter(inode, map_offset)?;
+        let first_buf = map_iter.next().unwrap()?;
+        let mut collection = first_buf.iter_dir();
+
+        let mut pos: Off = map_offset + blk_offset;
+
+        if blk_offset as usize / size_of::<DirentDesc>() <= collection.total() {
+            collection.skip_dir(blk_offset as usize / size_of::<DirentDesc>());
+            for dirent in collection {
+                emitter(dirent, pos);
+                pos += size_of::<DirentDesc>() as Off;
+            }
+        }
+
+        pos = round!(UP, pos, sb.blksz());
+
+        for buf in map_iter {
+            for dirent in buf?.iter_dir() {
+                emitter(dirent, pos);
+                pos += size_of::<DirentDesc>() as Off;
+            }
+            pos = round!(UP, pos, sb.blksz());
+        }
+        Ok(())
+    }
 }
 
 pub(crate) struct SuperblockInfo<I, C, T>
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 17/24] erofs: introduce Rust SBI to C
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (15 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 16/24] erofs: implement dir and inode operations " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 18/24] erofs: introduce iget alternative " Yiyang Wu
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduces Rust opaque superblock info to C as s_fs_info.
The original erofs_sb_info is embedded inside the rust opaque type and
reexported by rewriting the original EROFS_SB/EROFS_I_SB macros.

This patch also provides a prototype of KernelInode,
KernelInodeCollection so that the code can compile and it also
implements the Metabuf Data Source by hooking up the original metabuf
API defined in EROFS.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/Makefile            |  2 +-
 fs/erofs/internal.h          | 10 ++++++
 fs/erofs/rust/kinode.rs      | 49 ++++++++++++++++++++++++++
 fs/erofs/rust/ksources.rs    | 66 ++++++++++++++++++++++++++++++++++++
 fs/erofs/rust/ksuperblock.rs | 30 ++++++++++++++++
 fs/erofs/rust/mod.rs         |  3 ++
 fs/erofs/rust_bindings.h     | 12 +++++++
 fs/erofs/rust_helpers.c      | 31 +++++++++++++++++
 fs/erofs/rust_helpers.h      | 21 ++++++++++++
 fs/erofs/super.c             | 15 ++++++--
 fs/erofs/super_rs.rs         | 50 +++++++++++++++++++++++++++
 11 files changed, 285 insertions(+), 4 deletions(-)
 create mode 100644 fs/erofs/rust/kinode.rs
 create mode 100644 fs/erofs/rust/ksources.rs
 create mode 100644 fs/erofs/rust/ksuperblock.rs
 create mode 100644 fs/erofs/rust_bindings.h
 create mode 100644 fs/erofs/rust_helpers.c
 create mode 100644 fs/erofs/rust_helpers.h

diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index fb46a2c7fb50..dfa03edbe29a 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -9,4 +9,4 @@ erofs-$(CONFIG_EROFS_FS_ZIP_DEFLATE) += decompressor_deflate.o
 erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o
 erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o
 erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
-erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o
+erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o rust_helpers.o
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 8674a4cb9d39..18e67219fbc8 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -20,6 +20,10 @@
 #include <linux/iomap.h>
 #include "erofs_fs.h"
 
+#ifdef CONFIG_EROFS_FS_RUST
+#include "rust_bindings.h"
+#endif
+
 /* redefine pr_fmt "erofs: " */
 #undef pr_fmt
 #define pr_fmt(fmt) "erofs: " fmt
@@ -178,8 +182,14 @@ struct erofs_sb_info {
 	char *domain_id;
 };
 
+#ifdef CONFIG_EROFS_FS_RUST
+#define EROFS_SB(sb) (*(struct erofs_sb_info **)(((void *)((sb)->s_fs_info)) + \
+		      EROFS_SB_INFO_OFFSET_RUST))
+#define EROFS_I_SB(inode) EROFS_SB((inode)->i_sb)
+#else
 #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
 #define EROFS_I_SB(inode) ((struct erofs_sb_info *)(inode)->i_sb->s_fs_info)
+#endif
 
 /* Mount flags set via mount options or defaults */
 #define EROFS_MOUNT_XATTR_USER		0x00000010
diff --git a/fs/erofs/rust/kinode.rs b/fs/erofs/rust/kinode.rs
new file mode 100644
index 000000000000..df6de40d0594
--- /dev/null
+++ b/fs/erofs/rust/kinode.rs
@@ -0,0 +1,49 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use core::ffi::c_void;
+use core::mem::MaybeUninit;
+use core::ptr::NonNull;
+
+use kernel::bindings::{inode, super_block};
+
+use super::erofs_sys::inode::*;
+use super::erofs_sys::superblock::*;
+use super::erofs_sys::*;
+
+#[repr(C)]
+pub(crate) struct KernelInode {
+    pub(crate) info: MaybeUninit<InodeInfo>,
+    pub(crate) nid: MaybeUninit<Nid>,
+    pub(crate) k_inode: MaybeUninit<inode>,
+    pub(crate) k_opaque: MaybeUninit<*mut c_void>,
+}
+
+impl Inode for KernelInode {
+    fn new(_sb: &SuperBlock, _info: InodeInfo, _nid: Nid) -> Self {
+        unimplemented!();
+    }
+    fn nid(&self) -> Nid {
+        unsafe { self.nid.assume_init() }
+    }
+    fn info(&self) -> &InodeInfo {
+        unsafe { self.info.assume_init_ref() }
+    }
+}
+
+pub(crate) struct KernelInodeCollection {
+    sb: NonNull<super_block>,
+}
+
+impl InodeCollection for KernelInodeCollection {
+    type I = KernelInode;
+    fn iget(&mut self, _nid: Nid, _f: &dyn FileSystem<Self::I>) -> PosixResult<&mut Self::I> {
+        unimplemented!();
+    }
+}
+
+impl KernelInodeCollection {
+    pub(crate) fn new(sb: NonNull<super_block>) -> Self {
+        Self { sb }
+    }
+}
diff --git a/fs/erofs/rust/ksources.rs b/fs/erofs/rust/ksources.rs
new file mode 100644
index 000000000000..08213e11239c
--- /dev/null
+++ b/fs/erofs/rust/ksources.rs
@@ -0,0 +1,66 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+use core::ffi::*;
+use core::ptr::NonNull;
+
+use super::erofs_sys::data::*;
+use super::erofs_sys::errnos::*;
+use super::erofs_sys::*;
+
+use kernel::bindings::super_block;
+
+extern "C" {
+    #[link_name = "erofs_read_metabuf_rust_helper"]
+    pub(crate) fn read_metabuf(
+        sb: NonNull<c_void>,
+        sbi: NonNull<c_void>,
+        offset: c_ulonglong,
+    ) -> *mut c_void;
+    #[link_name = "erofs_put_metabuf_rust_helper"]
+    pub(crate) fn put_metabuf(addr: NonNull<c_void>);
+}
+
+fn try_read_metabuf(
+    sb: NonNull<super_block>,
+    sbi: NonNull<c_void>,
+    offset: c_ulonglong,
+) -> PosixResult<NonNull<c_void>> {
+    let ptr = unsafe { read_metabuf(sb.cast(), sbi.cast(), offset) };
+    if ptr.is_null() {
+        Err(Errno::ENOMEM)
+    } else if is_value_err(ptr) {
+        Err(Errno::from(ptr))
+    } else {
+        Ok(unsafe { NonNull::new_unchecked(ptr) })
+    }
+}
+
+pub(crate) struct MetabufSource {
+    sb: NonNull<super_block>,
+    opaque: NonNull<c_void>,
+}
+
+impl MetabufSource {
+    pub(crate) fn new(sb: NonNull<super_block>, opaque: NonNull<c_void>) -> Self {
+        Self { sb, opaque }
+    }
+}
+
+impl Source for MetabufSource {
+    fn fill(&self, data: &mut [u8], offset: Off) -> PosixResult<u64> {
+        self.as_buf(offset, data.len() as u64).map(|buf| {
+            data[..buf.content().len()].clone_from_slice(buf.content());
+            buf.content().len() as Off
+        })
+    }
+    fn as_buf<'a>(&'a self, offset: Off, len: Off) -> PosixResult<RefBuffer<'a>> {
+        try_read_metabuf(self.sb.clone(), self.opaque.clone(), offset).map(|ptr| {
+            let data: &'a [u8] =
+                unsafe { core::slice::from_raw_parts(ptr.as_ptr() as *const u8, len as usize) };
+            RefBuffer::new(data, 0, len as usize, |ptr| unsafe {
+                put_metabuf(NonNull::new_unchecked(ptr as *mut c_void))
+            })
+        })
+    }
+}
diff --git a/fs/erofs/rust/ksuperblock.rs b/fs/erofs/rust/ksuperblock.rs
new file mode 100644
index 000000000000..c1955fa136c6
--- /dev/null
+++ b/fs/erofs/rust/ksuperblock.rs
@@ -0,0 +1,30 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+use super::erofs_sys::superblock::*;
+use super::kinode::*;
+use alloc::boxed::Box;
+use core::{ffi::c_void, ptr::NonNull};
+use kernel::bindings::super_block;
+use kernel::types::ForeignOwnable;
+
+pub(crate) type KernelOpaque = NonNull<*mut c_void>;
+/// KernelSuperblockInfo defined by embedded Kernel Inode
+pub(crate) type KernelSuperblockInfo =
+    SuperblockInfo<KernelInode, KernelInodeCollection, KernelOpaque>;
+
+/// SAFETY:
+/// Cast the c_void back to KernelSuperblockInfo.
+/// This seems to be prune to some concurrency issues
+/// but the fact is that only KernelInodeCollection field can have mutability.
+/// However, it's backed by the original iget_locked5 and it's already preventing
+/// any concurrency issues. So it's safe to be casted mutable here even if it's not backed by
+/// Arc/Mutex instead of using generic method from Foreign Ownable which only provides
+/// immutable reference casting which is not enough.
+/// Since the pointer always live as long as this module exists, it's safe to declare it as static.
+pub(crate) fn erofs_sbi(sb: NonNull<super_block>) -> &'static mut KernelSuperblockInfo {
+    unsafe { &mut *(sb.as_ref().s_fs_info).cast::<KernelSuperblockInfo>() }
+}
+
+pub(crate) fn free_sbi(sb: NonNull<super_block>) {
+    unsafe { Box::<KernelSuperblockInfo>::from_foreign(sb.as_ref().s_fs_info) };
+}
diff --git a/fs/erofs/rust/mod.rs b/fs/erofs/rust/mod.rs
index e6c0731f2533..a8b66c95261c 100644
--- a/fs/erofs/rust/mod.rs
+++ b/fs/erofs/rust/mod.rs
@@ -2,3 +2,6 @@
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 
 pub(crate) mod erofs_sys;
+pub(crate) mod kinode;
+pub(crate) mod ksources;
+pub(crate) mod ksuperblock;
diff --git a/fs/erofs/rust_bindings.h b/fs/erofs/rust_bindings.h
new file mode 100644
index 000000000000..9695c5ed5a7c
--- /dev/null
+++ b/fs/erofs/rust_bindings.h
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0-later
+// EROFS Rust Bindings Before VFS Patch Sets for Rust
+
+#ifndef __EROFS_RUST_BINDINGS_H
+#define __EROFS_RUST_BINDINGS_H
+
+#include <linux/fs.h>
+
+extern const unsigned long EROFS_SB_INFO_OFFSET_RUST;
+extern void *erofs_alloc_sbi_rust(struct super_block *sb);
+extern void *erofs_free_sbi_rust(struct super_block *sb);
+#endif
diff --git a/fs/erofs/rust_helpers.c b/fs/erofs/rust_helpers.c
new file mode 100644
index 000000000000..5fdc158ed9ef
--- /dev/null
+++ b/fs/erofs/rust_helpers.c
@@ -0,0 +1,31 @@
+#include "rust_helpers.h"
+
+static void erofs_init_metabuf_rust_helper(struct erofs_buf *buf,
+					   struct super_block *sb,
+					   struct erofs_sb_info *sbi)
+{
+	if (erofs_is_fileio_mode(sbi))
+		buf->mapping = file_inode(sbi->fdev)->i_mapping;
+	else if (erofs_is_fscache_mode_rust_helper(sb, sbi))
+		buf->mapping = sbi->s_fscache->inode->i_mapping;
+	else
+		buf->mapping = sb->s_bdev->bd_mapping;
+}
+
+void *erofs_read_metabuf_rust_helper(struct super_block *sb,
+				     struct erofs_sb_info *sbi,
+				     erofs_off_t offset)
+{
+	struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
+	erofs_init_metabuf_rust_helper(&buf, sb, sbi);
+	return erofs_bread(&buf, offset, EROFS_KMAP);
+}
+
+void erofs_put_metabuf_rust_helper(void *addr)
+{
+	erofs_put_metabuf(&(struct erofs_buf){
+		.base = addr,
+		.page = kmap_to_page(addr),
+		.kmap_type = EROFS_KMAP,
+	});
+}
diff --git a/fs/erofs/rust_helpers.h b/fs/erofs/rust_helpers.h
new file mode 100644
index 000000000000..158b21438314
--- /dev/null
+++ b/fs/erofs/rust_helpers.h
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0-later
+// This is a helpers collection to dodge the missing macros or inline functions in bindgen
+
+#ifndef __EROFS_RUST_HELPERS_H
+#define __EROFS_RUST_HELPERS_H
+
+#include "internal.h"
+
+static inline bool erofs_is_fscache_mode_rust_helper(struct super_block *sb,
+						     struct erofs_sb_info *sbi)
+{
+	return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) &&
+	       !erofs_is_fileio_mode(sbi) && !sb->s_bdev;
+}
+
+void *erofs_read_metabuf_rust_helper(struct super_block *sb,
+				     struct erofs_sb_info *sbi,
+				     erofs_off_t offset);
+void erofs_put_metabuf_rust_helper(void *addr);
+
+#endif
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 666873f745da..61f138a7d8e2 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -586,9 +586,12 @@ static void erofs_set_sysfs_name(struct super_block *sb)
 static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
 {
 	struct inode *inode;
-	struct erofs_sb_info *sbi = EROFS_SB(sb);
+	struct erofs_sb_info *sbi;
 	int err;
-
+#ifdef CONFIG_EROFS_FS_RUST
+	sb->s_fs_info = erofs_alloc_sbi_rust(sb);
+#endif
+	sbi = EROFS_SB(sb);
 	sb->s_magic = EROFS_SUPER_MAGIC;
 	sb->s_flags |= SB_RDONLY | SB_NOATIME;
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
@@ -809,7 +812,13 @@ static int erofs_init_fs_context(struct fs_context *fc)
 
 static void erofs_kill_sb(struct super_block *sb)
 {
-	struct erofs_sb_info *sbi = EROFS_SB(sb);
+	struct erofs_sb_info *sbi;
+
+#ifdef CONFIG_EROFS_FS_RUST
+	sbi = erofs_free_sbi_rust(sb);
+#else
+	sbi = EROFS_SB(sb);
+#endif
 
 	if ((IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && sbi->fsid) || sbi->fdev)
 		kill_anon_super(sb);
diff --git a/fs/erofs/super_rs.rs b/fs/erofs/super_rs.rs
index 4b8cbef507e3..7041f4011d4c 100644
--- a/fs/erofs/super_rs.rs
+++ b/fs/erofs/super_rs.rs
@@ -7,3 +7,53 @@
 #[allow(dead_code)]
 #[allow(missing_docs)]
 pub(crate) mod rust;
+
+use core::ffi::*;
+use core::mem::offset_of;
+use core::ptr::NonNull;
+use kernel::{bindings::super_block, types::ForeignOwnable};
+use rust::{
+    erofs_sys::{
+        alloc_helper::*,
+        data::backends::uncompressed::*,
+        superblock::{mem::*, *},
+        *,
+    },
+    kinode::*,
+    ksources::*,
+    ksuperblock::*,
+};
+
+fn try_alloc_sbi(sb: NonNull<super_block>) -> PosixResult<*const c_void> {
+    //  We have to use heap_alloc here to erase the signature of MemFileSystem
+    let sbi = heap_alloc(SuperblockInfo::new(
+        heap_alloc(KernelFileSystem::try_new(UncompressedBackend::new(
+            MetabufSource::new(sb, unsafe { NonNull::new_unchecked(sb.as_ref().s_fs_info) }),
+        ))?)?,
+        KernelInodeCollection::new(sb),
+        // SAFETY: The super_block is initialized when the erofs_alloc_sbi_rust is called.
+        unsafe { NonNull::new_unchecked(sb.as_ref().s_fs_info) },
+    ))?;
+    Ok(sbi.into_foreign())
+}
+/// Allocating a rust implementation of super_block_info c_void when calling from fill_super
+/// operations. Though we still need to embed original superblock info inside rust implementation
+/// for compatibility. This is left as it is for now.
+#[no_mangle]
+pub unsafe extern "C" fn erofs_alloc_sbi_rust(sb: NonNull<super_block>) -> *const c_void {
+    try_alloc_sbi(sb).unwrap_or_else(|err| err.into())
+}
+
+/// Freeing a rust implementation of super_block_info c_void when calling from kill_super
+/// Returning the original c_void pointer for outer C code to free.
+#[no_mangle]
+pub unsafe extern "C" fn erofs_free_sbi_rust(sb: NonNull<super_block>) -> *const c_void {
+    let opaque: *const c_void = erofs_sbi(sb).opaque.as_ptr().cast();
+    // This will be freed as it goes out of the scope.
+    free_sbi(sb);
+    opaque
+}
+
+/// Used as a hint offset to be exported so that EROFS_SB can find the correct the s_fs_info.
+#[no_mangle]
+pub static EROFS_SB_INFO_OFFSET_RUST: c_ulong = offset_of!(KernelSuperblockInfo, opaque) as c_ulong;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 18/24] erofs: introduce iget alternative to C
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (16 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 17/24] erofs: introduce Rust SBI to C Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 19/24] erofs: introduce namei " Yiyang Wu
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduces iget and fast symlink alternative written in Rust.
After this patch, erofs_iget can be replaced with erofs_iget_rust.

Iget related test and set are lifted after this patch as
rust_helpers.c will also use it to port the iget_locked to Rust.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/Makefile        |  2 +-
 fs/erofs/inode.c         |  8 ++++--
 fs/erofs/inode_rs.rs     | 59 ++++++++++++++++++++++++++++++++++++++++
 fs/erofs/internal.h      | 33 ++++++++++++++++++++++
 fs/erofs/rust/kinode.rs  | 29 +++++++++++++++++---
 fs/erofs/rust_bindings.h | 12 ++++++++
 fs/erofs/rust_helpers.c  | 55 +++++++++++++++++++++++++++++++++++++
 fs/erofs/rust_helpers.h  |  4 ++-
 fs/erofs/super.c         | 34 ++++++++++++++++++-----
 9 files changed, 220 insertions(+), 16 deletions(-)
 create mode 100644 fs/erofs/inode_rs.rs

diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index dfa03edbe29a..46de6f490ca2 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -9,4 +9,4 @@ erofs-$(CONFIG_EROFS_FS_ZIP_DEFLATE) += decompressor_deflate.o
 erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o
 erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o
 erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
-erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o rust_helpers.o
+erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o inode_rs.o rust_helpers.o
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index d2fd51fcebd2..b8467272a670 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -269,7 +269,7 @@ int erofs_fill_inode(struct inode *inode)
  * ino_t is 32-bits on 32-bit arch. We have to squash the 64-bit value down
  * so that it will fit.
  */
-static ino_t erofs_squash_ino(erofs_nid_t nid)
+ino_t erofs_squash_ino(erofs_nid_t nid)
 {
 	ino_t ino = (ino_t)nid;
 
@@ -278,12 +278,12 @@ static ino_t erofs_squash_ino(erofs_nid_t nid)
 	return ino;
 }
 
-static int erofs_iget5_eq(struct inode *inode, void *opaque)
+int erofs_iget5_eq(struct inode *inode, void *opaque)
 {
 	return EROFS_I(inode)->nid == *(erofs_nid_t *)opaque;
 }
 
-static int erofs_iget5_set(struct inode *inode, void *opaque)
+int erofs_iget5_set(struct inode *inode, void *opaque)
 {
 	const erofs_nid_t nid = *(erofs_nid_t *)opaque;
 
@@ -292,6 +292,7 @@ static int erofs_iget5_set(struct inode *inode, void *opaque)
 	return 0;
 }
 
+#ifndef CONFIG_EROFS_FS_RUST
 struct inode *erofs_iget(struct super_block *sb, erofs_nid_t nid)
 {
 	struct inode *inode;
@@ -312,6 +313,7 @@ struct inode *erofs_iget(struct super_block *sb, erofs_nid_t nid)
 	}
 	return inode;
 }
+#endif
 
 int erofs_getattr(struct mnt_idmap *idmap, const struct path *path,
 		  struct kstat *stat, u32 request_mask,
diff --git a/fs/erofs/inode_rs.rs b/fs/erofs/inode_rs.rs
new file mode 100644
index 000000000000..5cca2ae581ac
--- /dev/null
+++ b/fs/erofs/inode_rs.rs
@@ -0,0 +1,59 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+//! EROFS Rust Kernel Module Helpers Implementation
+//! This is only for experimental purpose. Feedback is always welcome.
+
+#[allow(dead_code)]
+#[allow(missing_docs)]
+pub(crate) mod rust;
+
+use core::ffi::*;
+use core::mem::{offset_of, size_of};
+use core::ptr::NonNull;
+use kernel::bindings::{inode, super_block};
+use kernel::container_of;
+use rust::{
+    erofs_sys::{operations::*, *},
+    kinode::*,
+    ksuperblock::erofs_sbi,
+};
+
+/// Used as a size hint to be exported to kmem_caceh_create
+#[no_mangle]
+pub static EROFS_INODE_SIZE_RUST: c_uint = size_of::<KernelInode>() as c_uint;
+
+/// Used as a hint offset to be exported so EROFS_VFS_I to find the embedded the vfs inode.
+#[no_mangle]
+pub static EROFS_VFS_INODE_OFFSET_RUST: c_ulong = offset_of!(KernelInode, k_inode) as c_ulong;
+
+/// Used as a hint offset to be exported to EROFS_I to find the embedded c side erofs_inode.
+#[no_mangle]
+pub static EROFS_I_OFFSET_RUST: c_long =
+    offset_of!(KernelInode, k_opaque) as c_long - offset_of!(KernelInode, k_inode) as c_long;
+
+/// Exported as iget replacement
+#[no_mangle]
+pub unsafe extern "C" fn erofs_iget_rust(sb: NonNull<super_block>, nid: Nid) -> *mut c_void {
+    // SAFETY: The super_block is initialized when the erofs_alloc_sbi_rust is called.
+    let sbi = erofs_sbi(sb);
+    read_inode(sbi.filesystem.as_ref(), &mut sbi.inodes, nid)
+        .map_or_else(|e| e.into(), |inode| inode.k_inode.as_mut_ptr().cast())
+}
+
+fn try_fill_inode(k_inode: NonNull<inode>, nid: Nid) -> PosixResult<()> {
+    // SAFETY: The super_block is initialized when the erofs_fill_inode_rust is called.
+    let sbi = erofs_sbi(unsafe { NonNull::new(k_inode.as_ref().i_sb).unwrap() });
+    // SAFETY: k_inode is a part of KernelInode.
+    let erofs_inode: &mut KernelInode = unsafe {
+        &mut *(container_of!(k_inode.as_ptr(), KernelInode, k_inode) as *mut KernelInode)
+    };
+    erofs_inode.info.write(sbi.filesystem.read_inode_info(nid)?);
+    erofs_inode.nid.write(nid);
+    Ok(())
+}
+/// Exported as fill_inode additional fill inode
+#[no_mangle]
+pub unsafe extern "C" fn erofs_fill_inode_rust(k_inode: NonNull<inode>, nid: Nid) -> c_int {
+    try_fill_inode(k_inode, nid).map_or_else(|e| i32::from(e) as c_int, |_| 0)
+}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 18e67219fbc8..42ce84783be7 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -306,10 +306,20 @@ struct erofs_inode {
 #endif	/* CONFIG_EROFS_FS_ZIP */
 	};
 	/* the corresponding vfs inode */
+#ifndef CONFIG_EROFS_FS_RUST
 	struct inode vfs_inode;
+#endif
 };
 
+#ifdef CONFIG_EROFS_FS_RUST
+#define EROFS_I(ptr)	(*(struct erofs_inode **)(((void *)(ptr)) + \
+			 EROFS_I_OFFSET_RUST))
+#define EROFS_I_VFS(ptr) ((struct inode *)(((void *)(ptr)) + EROFS_VFS_INODE_OFFSET_RUST))
+#define EROFS_I_RUST(ptr) ((void *)(ptr) - EROFS_VFS_INODE_OFFSET_RUST)
+#else
 #define EROFS_I(ptr)	container_of(ptr, struct erofs_inode, vfs_inode)
+#define EROFS_I_VFS(ptr) (&((struct erofs_inode *)(ptr))->vfs_inode)
+#endif
 
 static inline erofs_off_t erofs_iloc(struct inode *inode)
 {
@@ -427,10 +437,18 @@ void erofs_onlinefolio_init(struct folio *folio);
 void erofs_onlinefolio_split(struct folio *folio);
 void erofs_onlinefolio_end(struct folio *folio, int err);
 int erofs_fill_inode(struct inode *inode);
+ino_t erofs_squash_ino(erofs_nid_t nid);
+int erofs_iget5_eq(struct inode *inode, void *opaque);
+int erofs_iget5_set(struct inode *inode, void *opaque);
+#ifdef CONFIG_EROFS_FS_RUST
+#define erofs_iget erofs_iget_rust
+#else
 struct inode *erofs_iget(struct super_block *sb, erofs_nid_t nid);
+#endif
 int erofs_getattr(struct mnt_idmap *idmap, const struct path *path,
 		  struct kstat *stat, u32 request_mask,
 		  unsigned int query_flags);
+
 int erofs_namei(struct inode *dir, const struct qstr *name,
 		erofs_nid_t *nid, unsigned int *d_type);
 
@@ -538,6 +556,21 @@ static inline struct bio *erofs_fscache_bio_alloc(struct erofs_map_dev *mdev) {
 static inline void erofs_fscache_submit_bio(struct bio *bio) {}
 #endif
 
+#ifdef CONFIG_EROFS_FS_RUST
+extern int erofs_init_rust(void);
+extern void erofs_destroy_rust(void);
+extern void erofs_init_inode_rust(struct inode *inode);
+extern void erofs_free_inode_rust(struct inode *inode);
+#else
+static inline int erofs_init_rust(void)
+{
+	return 0;
+}
+static inline void erofs_destroy_rust(void) {}
+static inline void erofs_init_inode_rust(struct inode *inode) {}
+static inline void erofs_free_inode_rust(struct inode *inode) {}
+#endif
+
 #define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
 
 #endif	/* __EROFS_INTERNAL_H */
diff --git a/fs/erofs/rust/kinode.rs b/fs/erofs/rust/kinode.rs
index df6de40d0594..fac72bd8b6b3 100644
--- a/fs/erofs/rust/kinode.rs
+++ b/fs/erofs/rust/kinode.rs
@@ -1,16 +1,23 @@
 // Copyright 2024 Yiyang Wu
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 
-use core::ffi::c_void;
+use core::ffi::*;
 use core::mem::MaybeUninit;
 use core::ptr::NonNull;
 
 use kernel::bindings::{inode, super_block};
+use kernel::container_of;
 
+use super::erofs_sys::errnos::*;
 use super::erofs_sys::inode::*;
 use super::erofs_sys::superblock::*;
 use super::erofs_sys::*;
 
+extern "C" {
+    #[link_name = "erofs_iget_locked_rust_helper"]
+    fn iget_locked(sb: NonNull<c_void>, nid: Nid) -> *mut c_void;
+}
+
 #[repr(C)]
 pub(crate) struct KernelInode {
     pub(crate) info: MaybeUninit<InodeInfo>,
@@ -21,7 +28,12 @@ pub(crate) struct KernelInode {
 
 impl Inode for KernelInode {
     fn new(_sb: &SuperBlock, _info: InodeInfo, _nid: Nid) -> Self {
-        unimplemented!();
+        Self {
+            info: MaybeUninit::uninit(),
+            nid: MaybeUninit::uninit(),
+            k_inode: MaybeUninit::uninit(),
+            k_opaque: MaybeUninit::uninit(),
+        }
     }
     fn nid(&self) -> Nid {
         unsafe { self.nid.assume_init() }
@@ -37,8 +49,17 @@ pub(crate) struct KernelInodeCollection {
 
 impl InodeCollection for KernelInodeCollection {
     type I = KernelInode;
-    fn iget(&mut self, _nid: Nid, _f: &dyn FileSystem<Self::I>) -> PosixResult<&mut Self::I> {
-        unimplemented!();
+    fn iget(&mut self, nid: Nid, _f: &dyn FileSystem<Self::I>) -> PosixResult<&mut Self::I> {
+        // SAFETY: iget_locked is safe to call here.
+        let k_inode = unsafe { iget_locked(self.sb.cast(), nid) };
+        if is_value_err(k_inode.cast()) {
+            return Err(Errno::from(k_inode as i32));
+        } else {
+            let erofs_inode: &mut KernelInode =
+                // SAFETY: iget_locked returns a valid pointer to a vfs inode and it's embedded in a KernelInode.
+                unsafe { &mut *(container_of!(k_inode, KernelInode, k_inode) as *mut KernelInode) };
+            return Ok(erofs_inode);
+        }
     }
 }
 
diff --git a/fs/erofs/rust_bindings.h b/fs/erofs/rust_bindings.h
index 9695c5ed5a7c..657f109dd6e7 100644
--- a/fs/erofs/rust_bindings.h
+++ b/fs/erofs/rust_bindings.h
@@ -6,7 +6,19 @@
 
 #include <linux/fs.h>
 
+
+typedef u64 erofs_nid_t;
+typedef u64 erofs_off_t;
+/* data type for filesystem-wide blocks number */
+typedef u32 erofs_blk_t;
+
 extern const unsigned long EROFS_SB_INFO_OFFSET_RUST;
+extern const unsigned int EROFS_INODE_SIZE_RUST;
+extern const unsigned long EROFS_VFS_INODE_OFFSET_RUST;
+extern const long EROFS_I_OFFSET_RUST;
+
 extern void *erofs_alloc_sbi_rust(struct super_block *sb);
 extern void *erofs_free_sbi_rust(struct super_block *sb);
+extern int erofs_iget5_eq_rust(struct inode *inode, void *opaque);
+extern struct inode *erofs_iget_rust(struct super_block *sb, erofs_nid_t nid);
 #endif
diff --git a/fs/erofs/rust_helpers.c b/fs/erofs/rust_helpers.c
index 5fdc158ed9ef..94e9153fc3ff 100644
--- a/fs/erofs/rust_helpers.c
+++ b/fs/erofs/rust_helpers.c
@@ -1,5 +1,7 @@
 #include "rust_helpers.h"
 
+static struct kmem_cache *erofs_inode_cachep __read_mostly;
+
 static void erofs_init_metabuf_rust_helper(struct erofs_buf *buf,
 					   struct super_block *sb,
 					   struct erofs_sb_info *sbi)
@@ -29,3 +31,56 @@ void erofs_put_metabuf_rust_helper(void *addr)
 		.kmap_type = EROFS_KMAP,
 	});
 }
+
+int erofs_init_rust(void)
+{
+	erofs_inode_cachep = kmem_cache_create("erofs_inode",
+					       sizeof(struct erofs_inode), 0,
+					       SLAB_RECLAIM_ACCOUNT, NULL);
+	if (!erofs_inode_cachep)
+		return -ENOMEM;
+	return 0;
+}
+
+void erofs_destroy_rust(void)
+{
+	if (erofs_inode_cachep)
+		kmem_cache_destroy(erofs_inode_cachep);
+}
+
+void erofs_init_inode_rust(struct inode *inode)
+{
+	EROFS_I(inode) = kmem_cache_alloc(erofs_inode_cachep, GFP_KERNEL);
+}
+
+void erofs_free_inode_rust(struct inode *inode)
+{
+	struct erofs_inode *vi = EROFS_I(inode);
+	if (vi)
+		kmem_cache_free(erofs_inode_cachep, vi);
+}
+
+struct inode *erofs_iget_locked_rust_helper(struct super_block *sb, erofs_nid_t nid)
+{
+	struct inode *inode;
+	int err;
+
+	inode = iget5_locked(sb, erofs_squash_ino(nid), erofs_iget5_eq,
+			     erofs_iget5_set, &nid);
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+
+	err = erofs_fill_inode(inode);
+	if(err)
+		goto err_out;
+
+	err = erofs_fill_inode_rust(inode, nid);
+	if(err)
+		goto err_out;
+
+	return inode;
+err_out:
+	if (err)
+		iget_failed(inode);
+	return ERR_PTR(err);
+}
diff --git a/fs/erofs/rust_helpers.h b/fs/erofs/rust_helpers.h
index 158b21438314..5bcd452f6d82 100644
--- a/fs/erofs/rust_helpers.h
+++ b/fs/erofs/rust_helpers.h
@@ -17,5 +17,7 @@ void *erofs_read_metabuf_rust_helper(struct super_block *sb,
 				     struct erofs_sb_info *sbi,
 				     erofs_off_t offset);
 void erofs_put_metabuf_rust_helper(void *addr);
-
+extern int erofs_fill_inode_rust(struct inode *inode, erofs_nid_t nid);
+struct inode *erofs_iget_locked_rust_helper(struct super_block *sb,
+						   erofs_nid_t nid);
 #endif
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 61f138a7d8e2..659502bdf5fe 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -81,22 +81,23 @@ static int erofs_superblock_csum_verify(struct super_block *sb, void *sbdata)
 
 static void erofs_inode_init_once(void *ptr)
 {
-	struct erofs_inode *vi = ptr;
-
-	inode_init_once(&vi->vfs_inode);
+	inode_init_once(EROFS_I_VFS(ptr));
+	erofs_init_inode_rust(EROFS_I_VFS(ptr));
 }
 
 static struct inode *erofs_alloc_inode(struct super_block *sb)
 {
-	struct erofs_inode *vi =
+	void *ptr =
 		alloc_inode_sb(sb, erofs_inode_cachep, GFP_KERNEL);
 
-	if (!vi)
+	if (!ptr)
 		return NULL;
 
+#ifndef CONFIG_EROFS_FS_RUST
 	/* zero out everything except vfs_inode */
-	memset(vi, 0, offsetof(struct erofs_inode, vfs_inode));
-	return &vi->vfs_inode;
+	memset(ptr, 0, offsetof(struct erofs_inode, vfs_inode));
+#endif
+	return EROFS_I_VFS(ptr);
 }
 
 static void erofs_free_inode(struct inode *inode)
@@ -106,7 +107,12 @@ static void erofs_free_inode(struct inode *inode)
 	if (inode->i_op == &erofs_fast_symlink_iops)
 		kfree(inode->i_link);
 	kfree(vi->xattr_shared_xattrs);
+	erofs_free_inode_rust(inode);
+#ifdef CONFIG_EROFS_FS_RUST
+	kmem_cache_free(erofs_inode_cachep, EROFS_I_RUST(inode));
+#else
 	kmem_cache_free(erofs_inode_cachep, vi);
+#endif
 }
 
 /* read variable-sized metadata, offset will be aligned by 4-byte */
@@ -871,13 +877,25 @@ static int __init erofs_module_init(void)
 
 	erofs_check_ondisk_layout_definitions();
 
+#ifndef CONFIG_EROFS_FS_RUST
 	erofs_inode_cachep = kmem_cache_create("erofs_inode",
 			sizeof(struct erofs_inode), 0,
 			SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
 			erofs_inode_init_once);
+#else
+	erofs_inode_cachep = kmem_cache_create("erofs_inode_rust",
+			EROFS_INODE_SIZE_RUST, 0,
+			SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+			erofs_inode_init_once);
+#endif
+
 	if (!erofs_inode_cachep)
 		return -ENOMEM;
 
+	err = erofs_init_rust();
+	if(err)
+		goto rust_err;
+
 	err = erofs_init_shrinker();
 	if (err)
 		goto shrinker_err;
@@ -904,6 +922,8 @@ static int __init erofs_module_init(void)
 	erofs_exit_shrinker();
 shrinker_err:
 	kmem_cache_destroy(erofs_inode_cachep);
+rust_err:
+	erofs_destroy_rust();
 	return err;
 }
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 19/24] erofs: introduce namei alternative to C
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (17 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 18/24] erofs: introduce iget alternative " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 17:08   ` Al Viro
  2024-09-16 13:56 ` [RFC PATCH 20/24] erofs: introduce readdir " Yiyang Wu
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduces erofs_lookup_rust and erofs_get_parent_rust
written in Rust as an alternative to the original namei.c.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/Makefile        |  2 +-
 fs/erofs/internal.h      |  2 ++
 fs/erofs/namei.c         |  2 ++
 fs/erofs/namei_rs.rs     | 56 ++++++++++++++++++++++++++++++++++++++++
 fs/erofs/rust_bindings.h |  4 ++-
 fs/erofs/super.c         |  2 ++
 6 files changed, 66 insertions(+), 2 deletions(-)
 create mode 100644 fs/erofs/namei_rs.rs

diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 46de6f490ca2..0f748f3e0ff6 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -9,4 +9,4 @@ erofs-$(CONFIG_EROFS_FS_ZIP_DEFLATE) += decompressor_deflate.o
 erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o
 erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o
 erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
-erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o inode_rs.o rust_helpers.o
+erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o inode_rs.o namei_rs.o rust_helpers.o
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 42ce84783be7..1d9dfae285d5 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -442,6 +442,8 @@ int erofs_iget5_eq(struct inode *inode, void *opaque);
 int erofs_iget5_set(struct inode *inode, void *opaque);
 #ifdef CONFIG_EROFS_FS_RUST
 #define erofs_iget erofs_iget_rust
+#define erofs_get_parent erofs_get_parent_rust
+#define erofs_lookup erofs_lookup_rust
 #else
 struct inode *erofs_iget(struct super_block *sb, erofs_nid_t nid);
 #endif
diff --git a/fs/erofs/namei.c b/fs/erofs/namei.c
index c94d0c1608a8..f657d475c4a1 100644
--- a/fs/erofs/namei.c
+++ b/fs/erofs/namei.c
@@ -7,6 +7,7 @@
 #include "xattr.h"
 #include <trace/events/erofs.h>
 
+#ifndef CONFIG_EROFS_FS_RUST
 struct erofs_qstr {
 	const unsigned char *name;
 	const unsigned char *end;
@@ -214,6 +215,7 @@ static struct dentry *erofs_lookup(struct inode *dir, struct dentry *dentry,
 		inode = erofs_iget(dir->i_sb, nid);
 	return d_splice_alias(inode, dentry);
 }
+#endif
 
 const struct inode_operations erofs_dir_iops = {
 	.lookup = erofs_lookup,
diff --git a/fs/erofs/namei_rs.rs b/fs/erofs/namei_rs.rs
new file mode 100644
index 000000000000..d73a0a7bee1e
--- /dev/null
+++ b/fs/erofs/namei_rs.rs
@@ -0,0 +1,56 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+//! EROFS Rust Kernel Module Helpers Implementation
+//! This is only for experimental purpose. Feedback is always welcome.
+
+#[allow(dead_code)]
+#[allow(missing_docs)]
+pub(crate) mod rust;
+use core::ffi::*;
+use core::ptr::NonNull;
+
+use kernel::bindings::{d_obtain_alias, d_splice_alias, dentry, inode};
+use kernel::container_of;
+
+use rust::{erofs_sys::operations::*, kinode::*, ksuperblock::*};
+
+/// Lookup function for dentry-inode lookup replacement.
+#[no_mangle]
+pub unsafe extern "C" fn erofs_lookup_rust(
+    k_inode: NonNull<inode>,
+    dentry: NonNull<dentry>,
+    _flags: c_uint,
+) -> *mut c_void {
+    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
+    let erofs_inode = unsafe { &*container_of!(k_inode.as_ptr(), KernelInode, k_inode) };
+    // SAFETY: The super_block is initialized when the erofs_alloc_sbi_rust is called.
+    let sbi = erofs_sbi(unsafe { NonNull::new(k_inode.as_ref().i_sb).unwrap() });
+    // SAFETY: this is backed by qstr which is c representation of a valid slice.
+    let name = unsafe {
+        core::str::from_utf8_unchecked(core::slice::from_raw_parts(
+            dentry.as_ref().d_name.name,
+            dentry.as_ref().d_name.__bindgen_anon_1.__bindgen_anon_1.len as usize,
+        ))
+    };
+    let k_inode: *mut inode =
+        dir_lookup(sbi.filesystem.as_ref(), &mut sbi.inodes, erofs_inode, name)
+            .map_or(core::ptr::null_mut(), |result| result.k_inode.as_mut_ptr());
+
+    // SAFETY: We are sure that the inner k_inode has already been initialized.
+    unsafe { d_splice_alias(k_inode, dentry.as_ptr()).cast() }
+}
+
+/// Exported as a replacement of erofs_get_parent.
+#[no_mangle]
+pub unsafe extern "C" fn erofs_get_parent_rust(child: NonNull<dentry>) -> *mut c_void {
+    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
+    let k_inode = unsafe { child.as_ref().d_inode };
+    // SAFETY: The super_block is initialized when the erofs_alloc_sbi_rust is called.
+    let sbi = erofs_sbi(unsafe { NonNull::new((*k_inode).i_sb).unwrap() }); // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
+    let inode = unsafe { &*container_of!(k_inode, KernelInode, k_inode) };
+    let k_inode: *mut inode = dir_lookup(sbi.filesystem.as_ref(), &mut sbi.inodes, inode, "..")
+        .map_or(core::ptr::null_mut(), |result| result.k_inode.as_mut_ptr());
+    // SAFETY: We are sure that the inner k_inode has already been initialized
+    unsafe { d_obtain_alias(k_inode).cast() }
+}
diff --git a/fs/erofs/rust_bindings.h b/fs/erofs/rust_bindings.h
index 657f109dd6e7..b35014aa5cae 100644
--- a/fs/erofs/rust_bindings.h
+++ b/fs/erofs/rust_bindings.h
@@ -6,7 +6,6 @@
 
 #include <linux/fs.h>
 
-
 typedef u64 erofs_nid_t;
 typedef u64 erofs_off_t;
 /* data type for filesystem-wide blocks number */
@@ -21,4 +20,7 @@ extern void *erofs_alloc_sbi_rust(struct super_block *sb);
 extern void *erofs_free_sbi_rust(struct super_block *sb);
 extern int erofs_iget5_eq_rust(struct inode *inode, void *opaque);
 extern struct inode *erofs_iget_rust(struct super_block *sb, erofs_nid_t nid);
+extern struct dentry *erofs_lookup_rust(struct inode *inode, struct dentry *dentry,
+			      unsigned int flags);
+extern struct dentry *erofs_get_parent_rust(struct dentry *dentry);
 #endif
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 659502bdf5fe..d49c804acf3d 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -554,6 +554,7 @@ static struct dentry *erofs_fh_to_parent(struct super_block *sb,
 				    erofs_nfs_get_inode);
 }
 
+#ifndef CONFIG_EROFS_FS_RUST
 static struct dentry *erofs_get_parent(struct dentry *child)
 {
 	erofs_nid_t nid;
@@ -565,6 +566,7 @@ static struct dentry *erofs_get_parent(struct dentry *child)
 		return ERR_PTR(err);
 	return d_obtain_alias(erofs_iget(child->d_sb, nid));
 }
+#endif
 
 static const struct export_operations erofs_export_ops = {
 	.encode_fh = generic_encode_ino32_fh,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 20/24] erofs: introduce readdir alternative to C
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (18 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 19/24] erofs: introduce namei " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 21/24] erofs: introduce erofs_map_blocks " Yiyang Wu
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduce erofs_readdir_rust as an alternative
for erofs_readdir written in Rust.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/Makefile        |  2 +-
 fs/erofs/dir.c           |  2 ++
 fs/erofs/dir_rs.rs       | 57 ++++++++++++++++++++++++++++++++++++++++
 fs/erofs/internal.h      |  1 +
 fs/erofs/rust_bindings.h |  1 +
 5 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 fs/erofs/dir_rs.rs

diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 0f748f3e0ff6..e086487971b6 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -9,4 +9,4 @@ erofs-$(CONFIG_EROFS_FS_ZIP_DEFLATE) += decompressor_deflate.o
 erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o
 erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o
 erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
-erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o inode_rs.o namei_rs.o rust_helpers.o
+erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o inode_rs.o namei_rs.o dir_rs.o rust_helpers.o
diff --git a/fs/erofs/dir.c b/fs/erofs/dir.c
index c3b90abdee37..0f5df8a4169b 100644
--- a/fs/erofs/dir.c
+++ b/fs/erofs/dir.c
@@ -6,6 +6,7 @@
  */
 #include "internal.h"
 
+#ifndef CONFIG_EROFS_FS_RUST
 static int erofs_fill_dentries(struct inode *dir, struct dir_context *ctx,
 			       void *dentry_blk, struct erofs_dirent *de,
 			       unsigned int nameoff0, unsigned int maxsize)
@@ -92,6 +93,7 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
 	erofs_put_metabuf(&buf);
 	return err < 0 ? err : 0;
 }
+#endif
 
 const struct file_operations erofs_dir_fops = {
 	.llseek		= generic_file_llseek,
diff --git a/fs/erofs/dir_rs.rs b/fs/erofs/dir_rs.rs
new file mode 100644
index 000000000000..d965e6076242
--- /dev/null
+++ b/fs/erofs/dir_rs.rs
@@ -0,0 +1,57 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+//! EROFS Rust Kernel Module Helpers Implementation
+//! This is only for experimental purpose. Feedback is always welcome.
+
+#[allow(dead_code)]
+#[allow(missing_docs)]
+pub(crate) mod rust;
+use core::ffi::*;
+use core::ptr::NonNull;
+
+use kernel::bindings::{dir_context, file};
+use kernel::container_of;
+
+use rust::{
+    erofs_sys::{inode::*, *},
+    kinode::*,
+    ksuperblock::*,
+};
+
+/// Exported as a replacement of erofs_readdir.
+#[no_mangle]
+pub unsafe extern "C" fn erofs_readdir_rust(
+    f: NonNull<file>,
+    mut ctx: NonNull<dir_context>,
+) -> c_int {
+    // SAFETY: inode is always initialized in file.
+    let k_inode = unsafe { f.as_ref().f_inode };
+    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
+    let erofs_inode = unsafe { &*container_of!(k_inode, KernelInode, k_inode) };
+    // SAFETY: The super_block is always initialized when calling iget5_locked.
+    let sb = unsafe { (*k_inode).i_sb };
+    let sbi = erofs_sbi(NonNull::new(sb).unwrap());
+    // SAFETY: ctx is nonnull.
+    let offset = unsafe { ctx.as_ref().pos };
+    match sbi
+        .filesystem
+        .fill_dentries(erofs_inode, offset as Off, &mut |dir, pos| unsafe {
+            // inline expansion from dir_emit
+            ctx.as_ref().actor.unwrap()(
+                ctx.as_ptr(),
+                dir.name.as_ptr().cast(),
+                dir.name.len() as i32,
+                pos as i64,
+                dir.desc.nid as u64,
+                dir.desc.file_type as u32,
+            );
+            ctx.as_mut().pos = pos as i64;
+        }) {
+        Ok(_) => {
+            unsafe { ctx.as_mut().pos = erofs_inode.info().file_size() as i64 }
+            0
+        }
+        Err(e) => (i32::from(e)) as c_int,
+    }
+}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 1d9dfae285d5..6f57bb866637 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -444,6 +444,7 @@ int erofs_iget5_set(struct inode *inode, void *opaque);
 #define erofs_iget erofs_iget_rust
 #define erofs_get_parent erofs_get_parent_rust
 #define erofs_lookup erofs_lookup_rust
+#define erofs_readdir erofs_readdir_rust
 #else
 struct inode *erofs_iget(struct super_block *sb, erofs_nid_t nid);
 #endif
diff --git a/fs/erofs/rust_bindings.h b/fs/erofs/rust_bindings.h
index b35014aa5cae..8b71d65e2c0b 100644
--- a/fs/erofs/rust_bindings.h
+++ b/fs/erofs/rust_bindings.h
@@ -23,4 +23,5 @@ extern struct inode *erofs_iget_rust(struct super_block *sb, erofs_nid_t nid);
 extern struct dentry *erofs_lookup_rust(struct inode *inode, struct dentry *dentry,
 			      unsigned int flags);
 extern struct dentry *erofs_get_parent_rust(struct dentry *dentry);
+extern int erofs_readdir_rust(struct file *file, struct dir_context *ctx);
 #endif
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 21/24] erofs: introduce erofs_map_blocks alternative to C
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (19 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 20/24] erofs: introduce readdir " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 22/24] erofs: add skippable iters in Rust Yiyang Wu
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduces erofs_map_blocks alternative written in Rust,
which will be hooked inside the erofs_iomap_begin.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/Makefile        |  2 +-
 fs/erofs/data.c          |  5 ++++
 fs/erofs/data_rs.rs      | 63 ++++++++++++++++++++++++++++++++++++++++
 fs/erofs/rust_bindings.h |  4 +++
 4 files changed, 73 insertions(+), 1 deletion(-)
 create mode 100644 fs/erofs/data_rs.rs

diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index e086487971b6..219ddca0642e 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -9,4 +9,4 @@ erofs-$(CONFIG_EROFS_FS_ZIP_DEFLATE) += decompressor_deflate.o
 erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o
 erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o
 erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
-erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o inode_rs.o namei_rs.o dir_rs.o rust_helpers.o
+erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o inode_rs.o namei_rs.o dir_rs.o data_rs.o rust_helpers.o
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 61debd799cf9..c9694661136b 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -293,7 +293,12 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	map.m_la = offset;
 	map.m_llen = length;
 
+#ifdef CONFIG_EROFS_FS_RUST
+	ret = erofs_map_blocks_rust(inode, &map);
+#else  
 	ret = erofs_map_blocks(inode, &map);
+#endif
+
 	if (ret < 0)
 		return ret;
 
diff --git a/fs/erofs/data_rs.rs b/fs/erofs/data_rs.rs
new file mode 100644
index 000000000000..ac34a9dd2079
--- /dev/null
+++ b/fs/erofs/data_rs.rs
@@ -0,0 +1,63 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+//! EROFS Rust Kernel Module Helpers Implementation
+//! This is only for experimental purpose. Feedback is always welcome.
+
+#[allow(dead_code)]
+#[allow(missing_docs)]
+pub(crate) mod rust;
+use core::ffi::*;
+use core::ptr::NonNull;
+
+use kernel::bindings::inode;
+use kernel::container_of;
+
+use rust::{erofs_sys::*, kinode::*, ksuperblock::*};
+
+#[repr(C)]
+struct ErofsBuf {
+    mapping: NonNull<c_void>,
+    page: NonNull<c_void>,
+    base: NonNull<c_void>,
+    kmap_type: c_int,
+}
+
+/// A helper sturct to map blocks for iomap_begin because iomap is not generated by bindgen
+#[repr(C)]
+pub struct ErofsMapBlocks {
+    buf: ErofsBuf,
+    pub(crate) m_pa: u64,
+    pub(crate) m_la: u64,
+    pub(crate) m_plen: u64,
+    pub(crate) m_llen: u64,
+    pub(crate) m_deviceid: u16,
+    pub(crate) m_flags: u32,
+}
+/// Exported as a replacement for erofs_map_blocks.
+#[no_mangle]
+pub unsafe extern "C" fn erofs_map_blocks_rust(
+    k_inode: NonNull<inode>,
+    mut map: NonNull<ErofsMapBlocks>,
+) -> c_int {
+    // SAFETY: super_block and superblockinfo is always initialized in k_inode.
+    let sbi = erofs_sbi(unsafe { NonNull::new(k_inode.as_ref().i_sb).unwrap() });
+    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
+    let erofs_inode = unsafe { &*container_of!(k_inode.as_ptr(), KernelInode, k_inode) };
+    // SAFETY: The map is always initialized in the caller.
+    match sbi
+        .filesystem
+        .map(erofs_inode, unsafe { map.as_ref().m_la } as Off)
+    {
+        Ok(m) => unsafe {
+            map.as_mut().m_pa = m.physical.start;
+            map.as_mut().m_la = map.as_ref().m_la;
+            map.as_mut().m_plen = m.physical.len;
+            map.as_mut().m_llen = m.physical.len;
+            map.as_mut().m_deviceid = m.device_id;
+            map.as_mut().m_flags = m.map_type.into();
+            0
+        },
+        Err(e) => i32::from(e) as c_int,
+    }
+}
diff --git a/fs/erofs/rust_bindings.h b/fs/erofs/rust_bindings.h
index 8b71d65e2c0b..ad9aa75a7a2c 100644
--- a/fs/erofs/rust_bindings.h
+++ b/fs/erofs/rust_bindings.h
@@ -24,4 +24,8 @@ extern struct dentry *erofs_lookup_rust(struct inode *inode, struct dentry *dent
 			      unsigned int flags);
 extern struct dentry *erofs_get_parent_rust(struct dentry *dentry);
 extern int erofs_readdir_rust(struct file *file, struct dir_context *ctx);
+
+struct erofs_map_blocks;
+extern int erofs_map_blocks_rust(struct inode *inode,
+				 struct erofs_map_blocks *map);
 #endif
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 22/24] erofs: add skippable iters in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (20 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 21/24] erofs: introduce erofs_map_blocks " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 23/24] erofs: implement xattrs operations " Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 24/24] erofs: introduce xattrs replacement to C Yiyang Wu
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduce self-owned skippable data iterators in Rust.
This iterators will be used to access extended attributes later.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys/data/raw_iters.rs | 121 ++++++++++++++++++++++
 1 file changed, 121 insertions(+)

diff --git a/fs/erofs/rust/erofs_sys/data/raw_iters.rs b/fs/erofs/rust/erofs_sys/data/raw_iters.rs
index 8f3bd250d252..f1ff0a251596 100644
--- a/fs/erofs/rust/erofs_sys/data/raw_iters.rs
+++ b/fs/erofs/rust/erofs_sys/data/raw_iters.rs
@@ -4,3 +4,124 @@
 pub(crate) mod ref_iter;
 mod traits;
 pub(crate) use traits::*;
+
+use super::*;
+use alloc::boxed::Box;
+
+/// Represents a skippable continuous buffer iterator. This is used primarily for reading the
+/// extended attributes. Since the key-value is flattened out in its original format.
+pub(crate) struct SkippableContinuousIter<'a> {
+    iter: Box<dyn ContinuousBufferIter<'a> + 'a>,
+    data: RefBuffer<'a>,
+    cur: usize,
+}
+
+fn cmp_with_cursor_move(
+    lhs: &[u8],
+    rhs: &[u8],
+    lhs_cur: &mut usize,
+    rhs_cur: &mut usize,
+    len: usize,
+) -> bool {
+    let result = lhs[*lhs_cur..(*lhs_cur + len)] == rhs[*rhs_cur..(*rhs_cur + len)];
+    *lhs_cur += len;
+    *rhs_cur += len;
+    result
+}
+
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub(crate) enum SkipCmpError {
+    PosixError(Errno),
+    NotEqual(Off),
+}
+
+impl From<Errno> for SkipCmpError {
+    fn from(e: Errno) -> Self {
+        SkipCmpError::PosixError(e)
+    }
+}
+
+impl<'a> SkippableContinuousIter<'a> {
+    pub(crate) fn try_new(
+        mut iter: Box<dyn ContinuousBufferIter<'a> + 'a>,
+    ) -> PosixResult<Option<Self>> {
+        if iter.eof() {
+            return Ok(None);
+        }
+        let data = iter.next().unwrap()?;
+        Ok(Some(Self { iter, data, cur: 0 }))
+    }
+    pub(crate) fn skip(&mut self, offset: Off) -> PosixResult<()> {
+        let dlen = self.data.content().len() - self.cur;
+        if offset as usize <= dlen {
+            self.cur += offset as usize;
+        } else {
+            self.cur = 0;
+            self.iter.advance_off(dlen as Off);
+            self.data = self.iter.next().unwrap()?;
+        }
+        Ok(())
+    }
+
+    pub(crate) fn read(&mut self, buf: &mut [u8]) -> PosixResult<()> {
+        let mut dlen = self.data.content().len() - self.cur;
+        let mut bcur = 0_usize;
+        let blen = buf.len();
+        if dlen != 0 && dlen >= blen {
+            buf.clone_from_slice(&self.data.content()[self.cur..(self.cur + blen)]);
+            self.cur += blen;
+        } else {
+            buf[bcur..(bcur + dlen)].copy_from_slice(&self.data.content()[self.cur..]);
+            bcur += dlen;
+            while bcur < blen {
+                self.cur = 0;
+                self.data = self.iter.next().unwrap()?;
+                dlen = self.data.content().len();
+                if dlen >= blen - bcur {
+                    buf[bcur..].copy_from_slice(&self.data.content()[..(blen - bcur)]);
+                    self.cur = blen - bcur;
+                    return Ok(());
+                } else {
+                    buf[bcur..(bcur + dlen)].copy_from_slice(self.data.content());
+                    bcur += dlen;
+                }
+            }
+        }
+        Ok(())
+    }
+
+    pub(crate) fn try_cmp(&mut self, buf: &[u8]) -> Result<(), SkipCmpError> {
+        let dlen = self.data.content().len() - self.cur;
+        let blen = buf.len();
+        let mut bcur = 0_usize;
+
+        if dlen != 0 && dlen >= blen {
+            if cmp_with_cursor_move(self.data.content(), buf, &mut self.cur, &mut bcur, blen) {
+                Ok(())
+            } else {
+                Err(SkipCmpError::NotEqual(bcur as Off))
+            }
+        } else {
+            if dlen != 0 {
+                let clen = dlen.min(blen);
+                if !cmp_with_cursor_move(self.data.content(), buf, &mut self.cur, &mut bcur, clen) {
+                    return Err(SkipCmpError::NotEqual(bcur as Off));
+                }
+            }
+            while bcur < blen {
+                self.cur = 0;
+                self.data = self.iter.next().unwrap()?;
+                let dlen = self.data.content().len();
+                let clen = dlen.min(blen - bcur);
+                if !cmp_with_cursor_move(self.data.content(), buf, &mut self.cur, &mut bcur, clen) {
+                    return Err(SkipCmpError::NotEqual(bcur as Off));
+                }
+            }
+
+            Ok(())
+        }
+    }
+    pub(crate) fn eof(&self) -> bool {
+        self.data.content().len() - self.cur == 0 && self.iter.eof()
+    }
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 23/24] erofs: implement xattrs operations in Rust
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (21 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 22/24] erofs: add skippable iters in Rust Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  2024-09-16 13:56 ` [RFC PATCH 24/24] erofs: introduce xattrs replacement to C Yiyang Wu
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch adds xattrs for erofs_sys crate and will later be used to
implement xattr handler in Rust.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/inode_rs.rs                      |   7 +-
 fs/erofs/rust/erofs_sys/inode.rs          |   1 +
 fs/erofs/rust/erofs_sys/operations.rs     |  27 ++++
 fs/erofs/rust/erofs_sys/superblock.rs     | 141 +++++++++++++++++++++
 fs/erofs/rust/erofs_sys/superblock/mem.rs |  13 +-
 fs/erofs/rust/erofs_sys/xattrs.rs         | 148 ++++++++++++++++++++++
 fs/erofs/rust/kinode.rs                   |   6 +
 7 files changed, 341 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/inode_rs.rs b/fs/erofs/inode_rs.rs
index 5cca2ae581ac..a79d1157b910 100644
--- a/fs/erofs/inode_rs.rs
+++ b/fs/erofs/inode_rs.rs
@@ -48,8 +48,13 @@ fn try_fill_inode(k_inode: NonNull<inode>, nid: Nid) -> PosixResult<()> {
     let erofs_inode: &mut KernelInode = unsafe {
         &mut *(container_of!(k_inode.as_ptr(), KernelInode, k_inode) as *mut KernelInode)
     };
-    erofs_inode.info.write(sbi.filesystem.read_inode_info(nid)?);
+    let info = sbi.filesystem.read_inode_info(nid)?;
     erofs_inode.nid.write(nid);
+    erofs_inode.shared_entries.write(
+        sbi.filesystem
+            .read_inode_xattrs_shared_entries(nid, &info)?,
+    );
+    erofs_inode.info.write(info);
     Ok(())
 }
 /// Exported as fill_inode additional fill inode
diff --git a/fs/erofs/rust/erofs_sys/inode.rs b/fs/erofs/rust/erofs_sys/inode.rs
index 1ecd6147a126..eb3c2144cad8 100644
--- a/fs/erofs/rust/erofs_sys/inode.rs
+++ b/fs/erofs/rust/erofs_sys/inode.rs
@@ -299,6 +299,7 @@ pub(crate) trait Inode: Sized {
     fn new(_sb: &SuperBlock, info: InodeInfo, nid: Nid) -> Self;
     fn info(&self) -> &InodeInfo;
     fn nid(&self) -> Nid;
+    fn xattrs_shared_entries(&self) -> &XAttrSharedEntries;
 }
 
 /// Represents the error which occurs when trying to convert the inode.
diff --git a/fs/erofs/rust/erofs_sys/operations.rs b/fs/erofs/rust/erofs_sys/operations.rs
index 070ba20908a2..292bfbc7b72c 100644
--- a/fs/erofs/rust/erofs_sys/operations.rs
+++ b/fs/erofs/rust/erofs_sys/operations.rs
@@ -1,9 +1,16 @@
 // Copyright 2024 Yiyang Wu
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 
+use super::alloc_helper::*;
+use super::data::raw_iters::*;
+use super::data::*;
 use super::inode::*;
 use super::superblock::*;
+use super::xattrs::*;
 use super::*;
+use alloc::vec::Vec;
+
+use crate::round;
 
 pub(crate) fn read_inode<'a, I, C>(
     filesystem: &'a dyn FileSystem<I>,
@@ -33,3 +40,23 @@ pub(crate) fn dir_lookup<'a, I, C>(
             read_inode(filesystem, collection, nid)
         })
 }
+
+pub(crate) fn get_xattr_infixes<'a>(
+    iter: &mut (dyn ContinuousBufferIter<'a> + 'a),
+) -> PosixResult<Vec<XAttrInfix>> {
+    let mut result: Vec<XAttrInfix> = Vec::new();
+    for data in iter {
+        let buffer = data?;
+        let buf = buffer.content();
+        let len = buf.len();
+        let mut cur: usize = 0;
+        while cur <= len {
+            let mut infix: Vec<u8> = Vec::new();
+            let size = u16::from_le_bytes([buf[cur], buf[cur + 1]]) as usize;
+            extend_from_slice(&mut infix, &buf[cur + 2..cur + 2 + size])?;
+            push_vec(&mut result, XAttrInfix(infix))?;
+            cur = round!(UP, cur + 2 + size, 4);
+        }
+    }
+    Ok(result)
+}
diff --git a/fs/erofs/rust/erofs_sys/superblock.rs b/fs/erofs/rust/erofs_sys/superblock.rs
index 403ffdeb4573..6ea59058446e 100644
--- a/fs/erofs/rust/erofs_sys/superblock.rs
+++ b/fs/erofs/rust/erofs_sys/superblock.rs
@@ -3,14 +3,17 @@
 
 pub(crate) mod mem;
 use alloc::boxed::Box;
+use alloc::vec::Vec;
 use core::mem::size_of;
 
+use super::alloc_helper::*;
 use super::data::raw_iters::*;
 use super::data::*;
 use super::devices::*;
 use super::dir::*;
 use super::inode::*;
 use super::map::*;
+use super::xattrs::*;
 use super::*;
 
 use crate::round;
@@ -346,6 +349,144 @@ fn fill_dentries(
         }
         Ok(())
     }
+    // Extended attributes goes here.
+    fn xattr_infixes(&self) -> &Vec<XAttrInfix>;
+    // Currently we eagerly initialized all xattrs;
+    fn read_inode_xattrs_shared_entries(
+        &self,
+        nid: Nid,
+        info: &InodeInfo,
+    ) -> PosixResult<XAttrSharedEntries> {
+        let sb = self.superblock();
+        let mut offset = sb.iloc(nid) + info.inode_size();
+        let mut buf = XATTR_ENTRY_SUMMARY_BUF;
+        let mut indexes: Vec<u32> = Vec::new();
+        self.backend().fill(&mut buf, offset)?;
+
+        let header: XAttrSharedEntrySummary = XAttrSharedEntrySummary::from(buf);
+        offset += size_of::<XAttrSharedEntrySummary>() as Off;
+        for buf in self.continuous_iter(offset, (header.shared_count << 2) as Off)? {
+            let data = buf?;
+            extend_from_slice(&mut indexes, unsafe {
+                core::slice::from_raw_parts(
+                    data.content().as_ptr().cast(),
+                    data.content().len() >> 2,
+                )
+            })?;
+        }
+
+        Ok(XAttrSharedEntries {
+            name_filter: header.name_filter,
+            shared_indexes: indexes,
+        })
+    }
+    fn get_xattr(
+        &self,
+        inode: &I,
+        index: u32,
+        name: &[u8],
+        buffer: &mut Option<&mut [u8]>,
+    ) -> PosixResult<XAttrValue> {
+        let sb = self.superblock();
+        let shared_count = inode.xattrs_shared_entries().shared_indexes.len();
+        let inline_offset = sb.iloc(inode.nid())
+            + inode.info().inode_size() as Off
+            + size_of::<XAttrSharedEntrySummary>() as Off
+            + 4 * shared_count as Off;
+
+        let inline_len = inode.info().xattr_size()
+            - size_of::<XAttrSharedEntrySummary>() as Off
+            - shared_count as Off * 4;
+
+        if let Some(mut inline_provider) =
+            SkippableContinuousIter::try_new(self.continuous_iter(inline_offset, inline_len)?)?
+        {
+            while !inline_provider.eof() {
+                let header = inline_provider.get_entry_header()?;
+                match inline_provider.query_xattr_value(
+                    self.xattr_infixes(),
+                    &header,
+                    name,
+                    index,
+                    buffer,
+                ) {
+                    Ok(value) => return Ok(value),
+                    Err(e) => {
+                        if e != ENODATA {
+                            return Err(e);
+                        }
+                    }
+                }
+            }
+        }
+
+        for entry_index in inode.xattrs_shared_entries().shared_indexes.iter() {
+            let mut shared_provider = SkippableContinuousIter::try_new(self.continuous_iter(
+                sb.blkpos(self.superblock().xattr_blkaddr) + (*entry_index as Off) * 4,
+                u64::MAX,
+            )?)?
+            .unwrap();
+            let header = shared_provider.get_entry_header()?;
+            match shared_provider.query_xattr_value(
+                self.xattr_infixes(),
+                &header,
+                name,
+                index,
+                buffer,
+            ) {
+                Ok(value) => return Ok(value),
+                Err(e) => {
+                    if e != ENODATA {
+                        return Err(e);
+                    }
+                }
+            }
+        }
+
+        Err(ENODATA)
+    }
+
+    fn list_xattrs(&self, inode: &I, buffer: &mut [u8]) -> PosixResult<usize> {
+        let sb = self.superblock();
+        let shared_count = inode.xattrs_shared_entries().shared_indexes.len();
+        let inline_offset = sb.iloc(inode.nid())
+            + inode.info().inode_size() as Off
+            + size_of::<XAttrSharedEntrySummary>() as Off
+            + shared_count as Off * 4;
+        let mut offset = 0;
+        let inline_len = inode.info().xattr_size()
+            - size_of::<XAttrSharedEntrySummary>() as Off
+            - shared_count as Off * 4;
+
+        if let Some(mut inline_provider) =
+            SkippableContinuousIter::try_new(self.continuous_iter(inline_offset, inline_len)?)?
+        {
+            while !inline_provider.eof() {
+                let header = inline_provider.get_entry_header()?;
+                offset += inline_provider.get_xattr_key(
+                    self.xattr_infixes(),
+                    &header,
+                    &mut buffer[offset..],
+                )?;
+                inline_provider.skip_xattr_value(&header)?;
+            }
+        }
+
+        for index in inode.xattrs_shared_entries().shared_indexes.iter() {
+            let mut shared_provider = SkippableContinuousIter::try_new(self.continuous_iter(
+                sb.blkpos(self.superblock().xattr_blkaddr) + (*index as Off) * 4,
+                u64::MAX,
+            )?)?
+            .unwrap();
+            let header = shared_provider.get_entry_header()?;
+            offset += shared_provider.get_xattr_key(
+                self.xattr_infixes(),
+                &header,
+                &mut buffer[offset..],
+            )?;
+        }
+        Ok(offset)
+    }
 }
 
 pub(crate) struct SuperblockInfo<I, C, T>
diff --git a/fs/erofs/rust/erofs_sys/superblock/mem.rs b/fs/erofs/rust/erofs_sys/superblock/mem.rs
index 5756dc08744c..c8af3cb5e56e 100644
--- a/fs/erofs/rust/erofs_sys/superblock/mem.rs
+++ b/fs/erofs/rust/erofs_sys/superblock/mem.rs
@@ -1,8 +1,8 @@
 // Copyright 2024 Yiyang Wu
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 
-use super::alloc_helper::*;
 use super::data::raw_iters::ref_iter::*;
+use super::operations::*;
 use super::*;
 
 // Memory Mapped Device/File so we need to have some external lifetime on the backend trait.
@@ -16,6 +16,7 @@ pub(crate) struct KernelFileSystem<B>
     backend: B,
     sb: SuperBlock,
     device_info: DeviceInfo,
+    infixes: Vec<XAttrInfix>,
 }
 
 impl<I, B> FileSystem<I> for KernelFileSystem<B>
@@ -58,6 +59,9 @@ fn continuous_iter<'a>(
     fn device_info(&self) -> &DeviceInfo {
         &self.device_info
     }
+    fn xattr_infixes(&self) -> &Vec<XAttrInfix> {
+        &self.infixes
+    }
 }
 
 impl<B> KernelFileSystem<B>
@@ -68,6 +72,12 @@ pub(crate) fn try_new(backend: B) -> PosixResult<Self> {
         let mut buf = SUPERBLOCK_EMPTY_BUF;
         backend.fill(&mut buf, EROFS_SUPER_OFFSET)?;
         let sb: SuperBlock = buf.into();
+        let infixes = get_xattr_infixes(&mut ContinuousRefIter::new(
+            &sb,
+            &backend,
+            sb.xattr_prefix_start as Off,
+            sb.xattr_prefix_count as Off * 4,
+        ))?;
         let device_info = get_device_infos(&mut ContinuousRefIter::new(
             &sb,
             &backend,
@@ -78,6 +88,7 @@ pub(crate) fn try_new(backend: B) -> PosixResult<Self> {
             backend,
             sb,
             device_info,
+            infixes,
         })
     }
 }
diff --git a/fs/erofs/rust/erofs_sys/xattrs.rs b/fs/erofs/rust/erofs_sys/xattrs.rs
index d1a110ef10dd..c97640731562 100644
--- a/fs/erofs/rust/erofs_sys/xattrs.rs
+++ b/fs/erofs/rust/erofs_sys/xattrs.rs
@@ -1,7 +1,13 @@
 // Copyright 2024 Yiyang Wu
 // SPDX-License-Identifier: MIT or GPL-2.0-or-later
 
+use super::alloc_helper::*;
+use super::data::raw_iters::*;
+use super::*;
+use crate::round;
+
 use alloc::vec::Vec;
+use core::mem::size_of;
 
 /// The header of the xattr entry index.
 /// This is used to describe the superblock's xattrs collection.
@@ -122,3 +128,145 @@ pub(crate) enum XAttrValue {
     Buffer(usize),
     Vec(Vec<u8>),
 }
+
+/// An iterator to read xattrs by comparing the entry's name one by one and reads its value
+/// correspondingly.
+pub(crate) trait XAttrEntriesProvider {
+    fn get_entry_header(&mut self) -> PosixResult<XAttrEntryHeader>;
+    fn get_xattr_key(
+        &mut self,
+        pfs: &[XAttrInfix],
+        header: &XAttrEntryHeader,
+        buffer: &mut [u8],
+    ) -> PosixResult<usize>;
+    fn query_xattr_value(
+        &mut self,
+        pfs: &[XAttrInfix],
+        header: &XAttrEntryHeader,
+        name: &[u8],
+        index: u32,
+        buffer: &mut Option<&mut [u8]>,
+    ) -> PosixResult<XAttrValue>;
+    fn skip_xattr_value(&mut self, header: &XAttrEntryHeader) -> PosixResult<()>;
+}
+impl<'a> XAttrEntriesProvider for SkippableContinuousIter<'a> {
+    fn get_entry_header(&mut self) -> PosixResult<XAttrEntryHeader> {
+        let mut buf: [u8; 4] = [0; 4];
+        self.read(&mut buf).map(|_| XAttrEntryHeader::from(buf))
+    }
+
+    fn get_xattr_key(
+        &mut self,
+        ifs: &[XAttrInfix],
+        header: &XAttrEntryHeader,
+        buffer: &mut [u8],
+    ) -> PosixResult<usize> {
+        let mut cur = if header.name_index.is_long() {
+            let if_index: usize = header.name_index.into();
+            let infix: &XAttrInfix = ifs.get(if_index).unwrap();
+
+            let pf_index = infix.prefix_index();
+            let prefix = EROFS_XATTRS_PREFIXS[pf_index as usize];
+            let plen = prefix.len();
+
+            buffer[..plen].copy_from_slice(&prefix[..plen]);
+            buffer[plen..infix.name().len() + plen].copy_from_slice(infix.name());
+
+            plen + infix.name().len()
+        } else {
+            let pf_index: usize = header.name_index.into();
+            let prefix = EROFS_XATTRS_PREFIXS[pf_index];
+            let plen = prefix.len();
+            buffer[..plen].copy_from_slice(&prefix[..plen]);
+            plen
+        };
+
+        self.read(&mut buffer[cur..cur + header.suffix_len as usize])?;
+        cur += header.suffix_len as usize;
+        buffer[cur] = b'\0';
+        Ok(cur + 1)
+    }
+
+    fn query_xattr_value(
+        &mut self,
+        ifs: &[XAttrInfix],
+        header: &XAttrEntryHeader,
+        name: &[u8],
+        index: u32,
+        buffer: &mut Option<&mut [u8]>,
+    ) -> PosixResult<XAttrValue> {
+        let xattr_size = round!(
+            UP,
+            header.suffix_len as Off + header.value_len as Off,
+            size_of::<XAttrEntryHeader>() as Off
+        );
+
+        let cur = if header.name_index.is_long() {
+            let if_index: usize = header.name_index.into();
+
+            if if_index >= ifs.len() {
+                return Err(ENODATA);
+            }
+
+            let infix = ifs.get(if_index).unwrap();
+            let ilen = infix.name().len();
+
+            let pf_index = infix.prefix_index();
+
+            if pf_index >= EROFS_XATTRS_PREFIXS.len() as u8 {
+                return Err(ENODATA);
+            }
+
+            if index != pf_index as u32
+                || name.len() != ilen + header.suffix_len as usize
+                || name[..ilen] != *infix.name()
+            {
+                return Err(ENODATA);
+            }
+            ilen
+        } else {
+            let pf_index: usize = header.name_index.into();
+            if pf_index >= EROFS_XATTRS_PREFIXS.len() {
+                return Err(ENODATA);
+            }
+
+            if pf_index != index as usize || header.suffix_len as usize != name.len() {
+                return Err(ENODATA);
+            }
+            0
+        };
+
+        match self.try_cmp(&name[cur..]) {
+            Ok(()) => match buffer.as_mut() {
+                Some(b) => {
+                    if b.len() < header.value_len as usize {
+                        return Err(ERANGE);
+                    }
+                    self.read(&mut b[..header.value_len as usize])?;
+                    Ok(XAttrValue::Buffer(header.value_len as usize))
+                }
+                None => {
+                    let mut b: Vec<u8> = vec_with_capacity(header.value_len as usize)?;
+                    self.read(&mut b)?;
+                    Ok(XAttrValue::Vec(b))
+                }
+            },
+            Err(skip_err) => match skip_err {
+                SkipCmpError::NotEqual(nvalue) => {
+                    self.skip(xattr_size - nvalue)?;
+                    Err(ENODATA)
+                }
+                SkipCmpError::PosixError(e) => Err(e),
+            },
+        }
+    }
+    fn skip_xattr_value(&mut self, header: &XAttrEntryHeader) -> PosixResult<()> {
+        self.skip(
+            round!(
+                UP,
+                header.suffix_len as Off + header.value_len as Off,
+                size_of::<XAttrEntryHeader>() as Off
+            ) - header.suffix_len as Off,
+        )
+    }
+}
diff --git a/fs/erofs/rust/kinode.rs b/fs/erofs/rust/kinode.rs
index fac72bd8b6b3..a4bea228ddc0 100644
--- a/fs/erofs/rust/kinode.rs
+++ b/fs/erofs/rust/kinode.rs
@@ -11,6 +11,7 @@
 use super::erofs_sys::errnos::*;
 use super::erofs_sys::inode::*;
 use super::erofs_sys::superblock::*;
+use super::erofs_sys::xattrs::*;
 use super::erofs_sys::*;
 
 extern "C" {
@@ -22,6 +23,7 @@
 pub(crate) struct KernelInode {
     pub(crate) info: MaybeUninit<InodeInfo>,
     pub(crate) nid: MaybeUninit<Nid>,
+    pub(crate) shared_entries: MaybeUninit<XAttrSharedEntries>,
     pub(crate) k_inode: MaybeUninit<inode>,
     pub(crate) k_opaque: MaybeUninit<*mut c_void>,
 }
@@ -31,6 +33,7 @@ fn new(_sb: &SuperBlock, _info: InodeInfo, _nid: Nid) -> Self {
         Self {
             info: MaybeUninit::uninit(),
             nid: MaybeUninit::uninit(),
+            shared_entries: MaybeUninit::uninit(),
             k_inode: MaybeUninit::uninit(),
             k_opaque: MaybeUninit::uninit(),
         }
@@ -41,6 +44,9 @@ fn nid(&self) -> Nid {
     fn info(&self) -> &InodeInfo {
         unsafe { self.info.assume_init_ref() }
     }
+    fn xattrs_shared_entries(&self) -> &XAttrSharedEntries {
+        unsafe { self.shared_entries.assume_init_ref() }
+    }
 }
 
 pub(crate) struct KernelInodeCollection {
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [RFC PATCH 24/24] erofs: introduce xattrs replacement to C
  2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
                   ` (22 preceding siblings ...)
  2024-09-16 13:56 ` [RFC PATCH 23/24] erofs: implement xattrs operations " Yiyang Wu
@ 2024-09-16 13:56 ` Yiyang Wu
  23 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-16 13:56 UTC (permalink / raw)
  To: linux-erofs; +Cc: rust-for-linux, linux-fsdevel, LKML

This patch introduces erofs_getxattr_rust and erofs_listxattr_rust to C
and can replace the original xattr logic entirely.

Note that the original acl implementation is tweaked with a lifted
function called erofs_getxattr_nobuf, so that difference of the calling
convention of Rust side code can be bridged.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/Makefile        |   3 ++
 fs/erofs/rust_bindings.h |   8 +++
 fs/erofs/xattr.c         |  31 +++++++++---
 fs/erofs/xattr.h         |   7 +++
 fs/erofs/xattr_rs.rs     | 106 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 148 insertions(+), 7 deletions(-)
 create mode 100644 fs/erofs/xattr_rs.rs

diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 219ddca0642e..ad0650698f4b 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -10,3 +10,6 @@ erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o
 erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o
 erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
 erofs-$(CONFIG_EROFS_FS_RUST) += super_rs.o inode_rs.o namei_rs.o dir_rs.o data_rs.o rust_helpers.o
+ifeq ($(CONFIG_EROFS_FS_XATTR),y)
+erofs-$(CONFIG_EROFS_FS_RUST) += xattr_rs.o
+endif
diff --git a/fs/erofs/rust_bindings.h b/fs/erofs/rust_bindings.h
index ad9aa75a7a2c..e5a879efd9e2 100644
--- a/fs/erofs/rust_bindings.h
+++ b/fs/erofs/rust_bindings.h
@@ -28,4 +28,12 @@ extern int erofs_readdir_rust(struct file *file, struct dir_context *ctx);
 struct erofs_map_blocks;
 extern int erofs_map_blocks_rust(struct inode *inode,
 				 struct erofs_map_blocks *map);
+extern int erofs_getxattr_rust(struct inode *inode, unsigned int flags,
+			       const char *name, void *buffer, size_t size);
+extern ssize_t erofs_listxattr_rust(struct dentry *dentry, char *buffer,
+			       size_t buffer_size);
+#ifdef CONFIG_EROFS_FS_POSIX_ACL
+extern int erofs_getxattr_nobuf_rust(struct inode *inode, int prefix,
+				 const char *name, char **value);
+#endif
 #endif
diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c
index a90d7d649739..0296c5809695 100644
--- a/fs/erofs/xattr.c
+++ b/fs/erofs/xattr.c
@@ -8,6 +8,7 @@
 #include <linux/xxhash.h>
 #include "xattr.h"
 
+#ifndef CONFIG_EROFS_FS_RUST
 struct erofs_xattr_iter {
 	struct super_block *sb;
 	struct erofs_buf buf;
@@ -122,6 +123,7 @@ static int erofs_init_inode_xattrs(struct inode *inode)
 	clear_and_wake_up_bit(EROFS_I_BL_XATTR_BIT, &vi->flags);
 	return ret;
 }
+#endif
 
 static bool erofs_xattr_user_list(struct dentry *dentry)
 {
@@ -175,6 +177,7 @@ const struct xattr_handler * const erofs_xattr_handlers[] = {
 	NULL,
 };
 
+#ifndef CONFIG_EROFS_FS_RUST
 static int erofs_xattr_copy_to_buffer(struct erofs_xattr_iter *it,
 				      unsigned int len)
 {
@@ -509,8 +512,28 @@ int erofs_xattr_prefixes_init(struct super_block *sb)
 		erofs_xattr_prefixes_cleanup(sb);
 	return ret;
 }
+#endif
 
 #ifdef CONFIG_EROFS_FS_POSIX_ACL
+#ifndef CONFIG_EROFS_FS_RUST
+static int erofs_getxattr_nobuf(struct inode *inode, int prefix,
+				 const char *name, char **value)
+{
+	int rc;
+	char *buf = NULL;
+	rc = erofs_getxattr(inode, prefix, name, NULL, 0);
+	if (rc > 0) {
+		buf = kmalloc(rc, GFP_KERNEL);
+		if (!value)
+			return ENOMEM;
+		rc = erofs_getxattr(inode, prefix, name, buf, rc);
+	}
+	*value = buf;
+	return rc;
+}
+#else
+#define erofs_getxattr_nobuf erofs_getxattr_nobuf_rust
+#endif
 struct posix_acl *erofs_get_acl(struct inode *inode, int type, bool rcu)
 {
 	struct posix_acl *acl;
@@ -531,13 +554,7 @@ struct posix_acl *erofs_get_acl(struct inode *inode, int type, bool rcu)
 		return ERR_PTR(-EINVAL);
 	}
 
-	rc = erofs_getxattr(inode, prefix, "", NULL, 0);
-	if (rc > 0) {
-		value = kmalloc(rc, GFP_KERNEL);
-		if (!value)
-			return ERR_PTR(-ENOMEM);
-		rc = erofs_getxattr(inode, prefix, "", value, rc);
-	}
+	rc = erofs_getxattr_nobuf(inode, prefix, "", &value);
 
 	if (rc == -ENOATTR)
 		acl = NULL;
diff --git a/fs/erofs/xattr.h b/fs/erofs/xattr.h
index b246cd0e135e..2b934c25e991 100644
--- a/fs/erofs/xattr.h
+++ b/fs/erofs/xattr.h
@@ -46,10 +46,17 @@ static inline const char *erofs_xattr_prefix(unsigned int idx,
 
 extern const struct xattr_handler * const erofs_xattr_handlers[];
 
+#ifdef CONFIG_EROFS_FS_RUST
+#define erofs_getxattr erofs_getxattr_rust
+#define erofs_listxattr erofs_listxattr_rust
+static inline int erofs_xattr_prefixes_init(struct super_block *sb) { return 0; }
+static inline void erofs_xattr_prefixes_cleanup(struct super_block *sb) {}
+#else
 int erofs_xattr_prefixes_init(struct super_block *sb);
 void erofs_xattr_prefixes_cleanup(struct super_block *sb);
 int erofs_getxattr(struct inode *, int, const char *, void *, size_t);
 ssize_t erofs_listxattr(struct dentry *, char *, size_t);
+#endif
 #else
 static inline int erofs_xattr_prefixes_init(struct super_block *sb) { return 0; }
 static inline void erofs_xattr_prefixes_cleanup(struct super_block *sb) {}
diff --git a/fs/erofs/xattr_rs.rs b/fs/erofs/xattr_rs.rs
new file mode 100644
index 000000000000..9429507089f6
--- /dev/null
+++ b/fs/erofs/xattr_rs.rs
@@ -0,0 +1,106 @@
+// Copyright 2024 Yiyang Wu
+// SPDX-License-Identifier: MIT or GPL-2.0-or-later
+
+//! EROFS Rust Kernel Module Helpers Implementation
+//! This is only for experimental purpose. Feedback is always welcome.
+
+#[allow(dead_code)]
+#[allow(missing_docs)]
+pub(crate) mod rust;
+use core::ffi::*;
+use core::ptr::NonNull;
+
+use kernel::bindings::{dentry, inode};
+use kernel::container_of;
+
+use rust::{erofs_sys::xattrs::*, kinode::*, ksuperblock::*};
+
+/// Used as a replacement for erofs_getattr.
+#[no_mangle]
+pub unsafe extern "C" fn erofs_getxattr_rust(
+    k_inode: NonNull<inode>,
+    index: c_uint,
+    name: NonNull<u8>,
+    buffer: NonNull<u8>,
+    size: usize,
+) -> c_int {
+    // SAFETY: super_block and superblockinfo is always initialized in k_inode.
+    let sbi = erofs_sbi(unsafe { NonNull::new(k_inode.as_ref().i_sb).unwrap() });
+    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
+    let erofs_inode = unsafe { &*container_of!(k_inode.as_ptr(), KernelInode, k_inode) };
+    // SAFETY: buffer is always initialized in the caller and name is null terminated C string.
+    unsafe {
+        match sbi.filesystem.get_xattr(
+            erofs_inode,
+            index,
+            core::ffi::CStr::from_ptr(name.as_ptr().cast()).to_bytes(),
+            &mut Some(core::slice::from_raw_parts_mut(
+                buffer.as_ptr().cast(),
+                size,
+            )),
+        ) {
+            Ok(value) => match value {
+                XAttrValue::Buffer(x) => x as c_int,
+                _ => unreachable!(),
+            },
+            Err(e) => i32::from(e) as c_int,
+        }
+    }
+}
+
+/// Used as a replacement for erofs_getattr_nobuf.
+#[no_mangle]
+pub unsafe extern "C" fn erofs_getxattr_nobuf_rust(
+    k_inode: NonNull<inode>,
+    index: u32,
+    name: NonNull<u8>,
+    mut value: NonNull<*mut u8>,
+) -> c_int {
+    // SAFETY: super_block and superblockinfo is always initialized in k_inode.
+    let sbi = erofs_sbi(unsafe { NonNull::new(k_inode.as_ref().i_sb).unwrap() });
+    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
+    let erofs_inode = unsafe { &*container_of!(k_inode.as_ptr(), KernelInode, k_inode) };
+    // SAFETY: buffer is always initialized in the caller and name is null terminated C string.
+    unsafe {
+        match sbi.filesystem.get_xattr(
+            erofs_inode,
+            index,
+            core::ffi::CStr::from_ptr(name.as_ptr().cast()).to_bytes(),
+            &mut None,
+        ) {
+            Ok(xattr_value) => match xattr_value {
+                XAttrValue::Vec(v) => {
+                    let rc = v.len() as c_int;
+                    *value.as_mut() = v.leak().as_mut_ptr().cast();
+                    rc
+                }
+
+                _ => unreachable!(),
+            },
+            Err(e) => i32::from(e) as c_int,
+        }
+    }
+}
+
+/// Used as a replacement for erofs_getattr.
+#[no_mangle]
+pub unsafe extern "C" fn erofs_listxattr_rust(
+    dentry: NonNull<dentry>,
+    buffer: NonNull<u8>,
+    size: usize,
+) -> c_long {
+    // SAFETY: dentry is always initialized in the caller.
+    let k_inode = unsafe { dentry.as_ref().d_inode };
+    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called.
+    let erofs_inode = unsafe { &*container_of!(k_inode, KernelInode, k_inode) };
+    // SAFETY: The super_block is initialized when the erofs_alloc_sbi_rust is called.
+    let sbi = erofs_sbi(unsafe { NonNull::new((*k_inode).i_sb).unwrap() });
+    match sbi.filesystem.list_xattrs(
+        erofs_inode,
+        // SAFETY: buffer is always initialized in the caller.
+        unsafe { core::slice::from_raw_parts_mut(buffer.as_ptr().cast(), size) },
+    ) {
+        Ok(value) => value as c_long,
+        Err(e) => i32::from(e) as c_long,
+    }
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C
  2024-09-16 13:56 ` [RFC PATCH 19/24] erofs: introduce namei " Yiyang Wu
@ 2024-09-16 17:08   ` Al Viro
  2024-09-17  6:48     ` Yiyang Wu
  0 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2024-09-16 17:08 UTC (permalink / raw)
  To: Yiyang Wu; +Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML

On Mon, Sep 16, 2024 at 09:56:29PM +0800, Yiyang Wu wrote:
> +/// Lookup function for dentry-inode lookup replacement.
> +#[no_mangle]
> +pub unsafe extern "C" fn erofs_lookup_rust(
> +    k_inode: NonNull<inode>,
> +    dentry: NonNull<dentry>,
> +    _flags: c_uint,
> +) -> *mut c_void {
> +    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
> +    let erofs_inode = unsafe { &*container_of!(k_inode.as_ptr(), KernelInode, k_inode) };

	Ummm...  A wrapper would be highly useful.  And the reason why
it's safe is different - your function is called only via ->i_op->lookup,
the is only one instance of inode_operations that has that ->lookup
method, and the only place where an inode gets ->i_op set to that
is erofs_fill_inode().  Which is always passed erofs_inode::vfs_inode.

> +    // SAFETY: The super_block is initialized when the erofs_alloc_sbi_rust is called.
> +    let sbi = erofs_sbi(unsafe { NonNull::new(k_inode.as_ref().i_sb).unwrap() });

	Again, that calls for a wrapper - this time not erofs-specific;
inode->i_sb is *always* non-NULL, is assign-once and always points
to live struct super_block instance at least until the call of
destroy_inode().

> +    // SAFETY: this is backed by qstr which is c representation of a valid slice.

	What is that sentence supposed to mean?  Nevermind "why is it correct"...

> +    let name = unsafe {
> +        core::str::from_utf8_unchecked(core::slice::from_raw_parts(
> +            dentry.as_ref().d_name.name,
> +            dentry.as_ref().d_name.__bindgen_anon_1.__bindgen_anon_1.len as usize,

	Is that supposed to be an example of idiomatic Rust?  I'm not
trying to be snide, but my interest here is mostly about safety of
access to VFS data structures.	And ->d_name is _very_ unpleasant in
that respect; the locking rules required for its stability are subtle
and hard to verify on manual code audit.

	Current erofs_lookup() (and your version as well) *is* indeed
safe in that respect, but the proof (from filesystem POV) is that "it's
called only as ->lookup() instance, so dentry is initially unhashed
negative and will remain such until it's passed to d_splice_alias();
until that point it is guaranteed to have ->d_name and ->d_parent stable".

	Note that once you _have_ called d_splice_alias(), you can't
count upon the ->d_name stability - or, indeed, upon ->d_name.name you've
sampled still pointing to allocated memory.

	For directory-modifying methods it's "stable, since parent is held
exclusive".  Some internal function called from different environments?
Well...  Swear, look through the call graph and see what can be proven
for each.

	Expressing that kind of fun in any kind of annotations (Rust type
system included) is not pleasant.  _Probably_ might be handled by a type
that would be a dentry pointer with annotation along the lines "->d_name
and ->d_parent of that one are stable".  Then e.g. ->lookup() would
take that thing as an argument and d_splice_alias() would consume it.
->mkdir() would get the same thing, etc.  I hadn't tried to get that
all way through (the amount of annotation churn in existing filesystems
would be high and hard to split into reviewable patch series), so there
might be dragons - and there definitely are places where the stability is
proven in different ways (e.g. if dentry->d_lock is held, we have the damn
thing stable; then there's a "take a safe snapshot of name" API; etc.).

	I want to reduce the PITA of regular code audits.  If, at
some point, Rust use in parts of the tree reduces that - wonderful.
But then we'd better make sure that Rust-side uses _are_ safe, accurately
annotated and easy to grep for.  Because we'll almost certainly need to
change method calling conventions at some points through all of that.
Even if it's just the annotation-level, any such contract change (it
is doable and quite a few had been done) will require going through
the instances and checking how much massage will be needed in those.
Opaque chunks like the above promise to make that very painful...

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-16 13:56 ` [RFC PATCH 03/24] erofs: add Errno " Yiyang Wu
@ 2024-09-16 17:51   ` Greg KH
  2024-09-16 23:45     ` Gao Xiang
                       ` (2 more replies)
  2024-09-16 20:01   ` Gary Guo
  1 sibling, 3 replies; 69+ messages in thread
From: Greg KH @ 2024-09-16 17:51 UTC (permalink / raw)
  To: Yiyang Wu; +Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML

On Mon, Sep 16, 2024 at 09:56:13PM +0800, Yiyang Wu wrote:
> Introduce Errno to Rust side code. Note that in current Rust For Linux,
> Errnos are implemented as core::ffi::c_uint unit structs.
> However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.
> 
> Since the errno_base hasn't changed for over 13 years,
> This patch merely serves as a temporary workaround for the missing
> errno in the Rust For Linux.

Why not just add the missing errno to the core rust code instead?  No
need to define a whole new one for this.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 02/24] erofs: add superblock data structure in Rust
  2024-09-16 13:56 ` [RFC PATCH 02/24] erofs: add superblock data structure in Rust Yiyang Wu
@ 2024-09-16 17:55   ` Greg KH
  2024-09-17  0:18     ` Gao Xiang
                       ` (2 more replies)
  0 siblings, 3 replies; 69+ messages in thread
From: Greg KH @ 2024-09-16 17:55 UTC (permalink / raw)
  To: Yiyang Wu; +Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML

On Mon, Sep 16, 2024 at 09:56:12PM +0800, Yiyang Wu wrote:
> diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
> new file mode 100644
> index 000000000000..0f1400175fc2
> --- /dev/null
> +++ b/fs/erofs/rust/erofs_sys.rs
> @@ -0,0 +1,22 @@
> +#![allow(dead_code)]
> +// Copyright 2024 Yiyang Wu
> +// SPDX-License-Identifier: MIT or GPL-2.0-or-later

Sorry, but I have to ask, why a dual license here?  You are only linking
to GPL-2.0-only code, so why the different license?  Especially if you
used the GPL-2.0-only code to "translate" from.

If you REALLY REALLY want to use a dual license, please get your
lawyers to document why this is needed and put it in the changelog for
the next time you submit this series when adding files with dual
licenses so I don't have to ask again :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-16 13:56 ` [RFC PATCH 03/24] erofs: add Errno " Yiyang Wu
  2024-09-16 17:51   ` Greg KH
@ 2024-09-16 20:01   ` Gary Guo
  2024-09-16 23:58     ` Gao Xiang
  1 sibling, 1 reply; 69+ messages in thread
From: Gary Guo @ 2024-09-16 20:01 UTC (permalink / raw)
  To: Yiyang Wu; +Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML

On Mon, 16 Sep 2024 21:56:13 +0800
Yiyang Wu <toolmanp@tlmp.cc> wrote:

> Introduce Errno to Rust side code. Note that in current Rust For Linux,
> Errnos are implemented as core::ffi::c_uint unit structs.
> However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.
> 
> Since the errno_base hasn't changed for over 13 years,
> This patch merely serves as a temporary workaround for the missing
> errno in the Rust For Linux.
> 
> Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>

As Greg said, please add missing errno that you need to kernel crate
instead.

Also, it seems that you're building abstractions into EROFS directly
without building a generic abstraction. We have been avoiding that. If
there's an abstraction that you need and missing, please add that
abstraction. In fact, there're a bunch of people trying to add FS
support, please coordinate instead of rolling your own.

You also have been referencing `kernel::bindings::` directly in various
places in the patch series. The module is marked as `#[doc(hidden)]`
for a reason -- it's not supposed to referenced directly. It's only
exposed so that macros can reference them. In fact, we have a policy
that direct reference to raw bindings are not allowed from drivers.

There're a few issues with this patch itself that I pointed out below,
although as already said this would require big changes so most points
are probably moot anyway.

Thanks,
Gary

> ---
>  fs/erofs/rust/erofs_sys.rs        |   6 +
>  fs/erofs/rust/erofs_sys/errnos.rs | 191 ++++++++++++++++++++++++++++++
>  2 files changed, 197 insertions(+)
>  create mode 100644 fs/erofs/rust/erofs_sys/errnos.rs
> 
> diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
> index 0f1400175fc2..2bd1381da5ab 100644
> --- a/fs/erofs/rust/erofs_sys.rs
> +++ b/fs/erofs/rust/erofs_sys.rs
> @@ -19,4 +19,10 @@
>  pub(crate) type Nid = u64;
>  /// Erofs Super Offset to read the ondisk superblock
>  pub(crate) const EROFS_SUPER_OFFSET: Off = 1024;
> +/// PosixResult as a type alias to kernel::error::Result
> +/// to avoid naming conflicts.
> +pub(crate) type PosixResult<T> = Result<T, Errno>;
> +
> +pub(crate) mod errnos;
>  pub(crate) mod superblock;
> +pub(crate) use errnos::Errno;
> diff --git a/fs/erofs/rust/erofs_sys/errnos.rs b/fs/erofs/rust/erofs_sys/errnos.rs
> new file mode 100644
> index 000000000000..40e5cdbcb353
> --- /dev/null
> +++ b/fs/erofs/rust/erofs_sys/errnos.rs
> @@ -0,0 +1,191 @@
> +// Copyright 2024 Yiyang Wu
> +// SPDX-License-Identifier: MIT or GPL-2.0-or-later
> +
> +#[repr(i32)]
> +#[non_exhaustive]
> +#[allow(clippy::upper_case_acronyms)]
> +#[derive(Debug, Copy, Clone, PartialEq)]
> +pub(crate) enum Errno {
> +    NONE = 0,

Why is NONE an error? No "error: operation completed successfully"
please.

> +    EPERM,
> +    ENOENT,
> +    ESRCH,
> +    EINTR,
> +    EIO,
> +    ENXIO,
> +    E2BIG,
> +    ENOEXEC,
> +    EBADF,
> +    ECHILD,
> +    EAGAIN,
> +    ENOMEM,
> +    EACCES,
> +    EFAULT,
> +    ENOTBLK,
> +    EBUSY,
> +    EEXIST,
> +    EXDEV,
> +    ENODEV,
> +    ENOTDIR,
> +    EISDIR,
> +    EINVAL,
> +    ENFILE,
> +    EMFILE,
> +    ENOTTY,
> +    ETXTBSY,
> +    EFBIG,
> +    ENOSPC,
> +    ESPIPE,
> +    EROFS,
> +    EMLINK,
> +    EPIPE,
> +    EDOM,
> +    ERANGE,
> +    EDEADLK,
> +    ENAMETOOLONG,
> +    ENOLCK,
> +    ENOSYS,
> +    ENOTEMPTY,
> +    ELOOP,
> +    ENOMSG = 42,

This looks very fragile way to maintain an enum.

> +    EIDRM,
> +    ECHRNG,
> +    EL2NSYNC,
> +    EL3HLT,
> +    EL3RST,
> +    ELNRNG,
> +    EUNATCH,
> +    ENOCSI,
> +    EL2HLT,
> +    EBADE,
> +    EBADR,
> +    EXFULL,
> +    ENOANO,
> +    EBADRQC,
> +    EBADSLT,
> +    EBFONT = 59,
> +    ENOSTR,
> +    ENODATA,
> +    ETIME,
> +    ENOSR,
> +    ENONET,
> +    ENOPKG,
> +    EREMOTE,
> +    ENOLINK,
> +    EADV,
> +    ESRMNT,
> +    ECOMM,
> +    EPROTO,
> +    EMULTIHOP,
> +    EDOTDOT,
> +    EBADMSG,
> +    EOVERFLOW,
> +    ENOTUNIQ,
> +    EBADFD,
> +    EREMCHG,
> +    ELIBACC,
> +    ELIBBAD,
> +    ELIBSCN,
> +    ELIBMAX,
> +    ELIBEXEC,
> +    EILSEQ,
> +    ERESTART,
> +    ESTRPIPE,
> +    EUSERS,
> +    ENOTSOCK,
> +    EDESTADDRREQ,
> +    EMSGSIZE,
> +    EPROTOTYPE,
> +    ENOPROTOOPT,
> +    EPROTONOSUPPORT,
> +    ESOCKTNOSUPPORT,
> +    EOPNOTSUPP,
> +    EPFNOSUPPORT,
> +    EAFNOSUPPORT,
> +    EADDRINUSE,
> +    EADDRNOTAVAIL,
> +    ENETDOWN,
> +    ENETUNREACH,
> +    ENETRESET,
> +    ECONNABORTED,
> +    ECONNRESET,
> +    ENOBUFS,
> +    EISCONN,
> +    ENOTCONN,
> +    ESHUTDOWN,
> +    ETOOMANYREFS,
> +    ETIMEDOUT,
> +    ECONNREFUSED,
> +    EHOSTDOWN,
> +    EHOSTUNREACH,
> +    EALREADY,
> +    EINPROGRESS,
> +    ESTALE,
> +    EUCLEAN,
> +    ENOTNAM,
> +    ENAVAIL,
> +    EISNAM,
> +    EREMOTEIO,
> +    EDQUOT,
> +    ENOMEDIUM,
> +    EMEDIUMTYPE,
> +    ECANCELED,
> +    ENOKEY,
> +    EKEYEXPIRED,
> +    EKEYREVOKED,
> +    EKEYREJECTED,
> +    EOWNERDEAD,
> +    ENOTRECOVERABLE,
> +    ERFKILL,
> +    EHWPOISON,
> +    EUNKNOWN,
> +}
> +
> +impl From<i32> for Errno {
> +    fn from(value: i32) -> Self {
> +        if (-value) <= 0 || (-value) > Errno::EUNKNOWN as i32 {
> +            Errno::EUNKNOWN
> +        } else {
> +            // Safety: The value is guaranteed to be a valid errno and the memory
> +            // layout is the same for both types.
> +            unsafe { core::mem::transmute(value) }

This is just unsound. As evident from the fact that you need to manually
specify a few constants, the errno enum doesn't cover all values from 1
to EUNKNOWN.

> +        }
> +    }
> +}
> +
> +impl From<Errno> for i32 {
> +    fn from(value: Errno) -> Self {
> +        -(value as i32)
> +    }
> +}
> +
> +/// Replacement for ERR_PTR in Linux Kernel.
> +impl From<Errno> for *const core::ffi::c_void {
> +    fn from(value: Errno) -> Self {
> +        (-(value as core::ffi::c_long)) as *const core::ffi::c_void
> +    }
> +}
> +
> +impl From<Errno> for *mut core::ffi::c_void {
> +    fn from(value: Errno) -> Self {
> +        (-(value as core::ffi::c_long)) as *mut core::ffi::c_void
> +    }
> +}
> +
> +/// Replacement for PTR_ERR in Linux Kernel.
> +impl From<*const core::ffi::c_void> for Errno {
> +    fn from(value: *const core::ffi::c_void) -> Self {
> +        (-(value as i32)).into()
> +    }
> +}
> +
> +impl From<*mut core::ffi::c_void> for Errno {
> +    fn from(value: *mut core::ffi::c_void) -> Self {
> +        (-(value as i32)).into()
> +    }
> +}
> +/// Replacement for IS_ERR in Linux Kernel.
> +#[inline(always)]
> +pub(crate) fn is_value_err(value: *const core::ffi::c_void) -> bool {
> +    (value as core::ffi::c_ulong) >= (-4095 as core::ffi::c_long) as core::ffi::c_ulong
> +}


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-16 17:51   ` Greg KH
@ 2024-09-16 23:45     ` Gao Xiang
  2024-09-20  2:49     ` [PATCH RESEND 0/1] rust: introduce declare_err! autogeneration Yiyang Wu
  2024-09-20  2:57     ` [RFC PATCH 03/24] erofs: add Errno in Rust Yiyang Wu
  2 siblings, 0 replies; 69+ messages in thread
From: Gao Xiang @ 2024-09-16 23:45 UTC (permalink / raw)
  To: Greg KH, Yiyang Wu; +Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML

Hi Greg,

On 2024/9/17 01:51, Greg KH wrote:
> On Mon, Sep 16, 2024 at 09:56:13PM +0800, Yiyang Wu wrote:
>> Introduce Errno to Rust side code. Note that in current Rust For Linux,
>> Errnos are implemented as core::ffi::c_uint unit structs.
>> However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.
>>
>> Since the errno_base hasn't changed for over 13 years,
>> This patch merely serves as a temporary workaround for the missing
>> errno in the Rust For Linux.
> 
> Why not just add the missing errno to the core rust code instead?  No
> need to define a whole new one for this.

I've discussed with Yiyang about this last week.  I also tend to avoid
our own errno.

The main reason is that Rust errno misses EUCLEAN error number. TBH, I
don't know why not just introduces all kernel supported errnos for Rust
in one shot.

I guess just because no Rust user uses other errno?  But for errno
cases, I think it's odd for users to add their own errno.

Thanks,
Gao Xiang

> 
> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-16 20:01   ` Gary Guo
@ 2024-09-16 23:58     ` Gao Xiang
  2024-09-19 13:45       ` Benno Lossin
  0 siblings, 1 reply; 69+ messages in thread
From: Gao Xiang @ 2024-09-16 23:58 UTC (permalink / raw)
  To: Gary Guo, Yiyang Wu
  Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML, Al Viro

Hi Gary,

On 2024/9/17 04:01, Gary Guo wrote:
> On Mon, 16 Sep 2024 21:56:13 +0800
> Yiyang Wu <toolmanp@tlmp.cc> wrote:
> 
>> Introduce Errno to Rust side code. Note that in current Rust For Linux,
>> Errnos are implemented as core::ffi::c_uint unit structs.
>> However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.
>>
>> Since the errno_base hasn't changed for over 13 years,
>> This patch merely serves as a temporary workaround for the missing
>> errno in the Rust For Linux.
>>
>> Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
> 
> As Greg said, please add missing errno that you need to kernel crate
> instead.

I've answered Greg about this in another email.

> 
> Also, it seems that you're building abstractions into EROFS directly
> without building a generic abstraction. We have been avoiding that. If
> there's an abstraction that you need and missing, please add that
> abstraction. In fact, there're a bunch of people trying to add FS

No, I'd like to try to replace some EROFS C logic first to Rust (by
using EROFS C API interfaces) and try if Rust is really useful for
a real in-tree filesystem.  If Rust can improve EROFS security or
performance (although I'm sceptical on performance), As an EROFS
maintainer, I'm totally fine to accept EROFS Rust logic landed to
help the whole filesystem better.

For Rust VFS abstraction, that is a different and indepenent story,
Yiyang don't have any bandwidth on this due to his limited time.
And I _also_ don't think an incomplete ROFS VFS Rust abstraction
is useful to Linux community (because IMO for generic interface
design, we need a global vision for all filesystems instead of
just ROFSes.  No existing user is not an excuse for an incomplete
abstraction.)

If a reasonble Rust VFS abstraction landed, I think we will switch
to use that, but as I said, they are completely two stories.

> support, please coordinate instead of rolling your own.
> 
> You also have been referencing `kernel::bindings::` directly in various
> places in the patch series. The module is marked as `#[doc(hidden)]`
> for a reason -- it's not supposed to referenced directly. It's only
> exposed so that macros can reference them. In fact, we have a policy
> that direct reference to raw bindings are not allowed from drivers.

This patch can be avoided if EUCLEAN is added to errno.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 02/24] erofs: add superblock data structure in Rust
  2024-09-16 17:55   ` Greg KH
@ 2024-09-17  0:18     ` Gao Xiang
  2024-09-17  5:34       ` Greg KH
  2024-09-17  5:27     ` Yiyang Wu
  2024-09-17  5:39     ` Yiyang Wu
  2 siblings, 1 reply; 69+ messages in thread
From: Gao Xiang @ 2024-09-17  0:18 UTC (permalink / raw)
  To: Greg KH, Yiyang Wu; +Cc: linux-fsdevel, linux-erofs, LKML, rust-for-linux

Hi Greg,

On 2024/9/17 01:55, Greg KH wrote:
> On Mon, Sep 16, 2024 at 09:56:12PM +0800, Yiyang Wu wrote:
>> diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
>> new file mode 100644
>> index 000000000000..0f1400175fc2
>> --- /dev/null
>> +++ b/fs/erofs/rust/erofs_sys.rs
>> @@ -0,0 +1,22 @@
>> +#![allow(dead_code)]
>> +// Copyright 2024 Yiyang Wu
>> +// SPDX-License-Identifier: MIT or GPL-2.0-or-later
> 
> Sorry, but I have to ask, why a dual license here?  You are only linking
> to GPL-2.0-only code, so why the different license?  Especially if you
> used the GPL-2.0-only code to "translate" from.
> 
> If you REALLY REALLY want to use a dual license, please get your
> lawyers to document why this is needed and put it in the changelog for
> the next time you submit this series when adding files with dual
> licenses so I don't have to ask again :)

As a new Rust kernel developper, Yiyang is working on EROFS Rust
userspace implementation too.

I think he just would like to share the common Rust logic between
kernel and userspace.  Since for the userspace side, Apache-2.0
or even MIT is more friendly for 3rd applications (especially
cloud-native applications). So the dual license is proposed here,
if you don't have strong opinion, I will ask Yiyang document this
in the next version.  Or we're fine to drop MIT too.

Thanks,
Gao Xiang

> 
> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 02/24] erofs: add superblock data structure in Rust
  2024-09-16 17:55   ` Greg KH
  2024-09-17  0:18     ` Gao Xiang
@ 2024-09-17  5:27     ` Yiyang Wu
  2024-09-17  5:39     ` Yiyang Wu
  2 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-17  5:27 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML

On Mon, Sep 16, 2024 at 07:55:43PM GMT, Greg KH wrote:
> On Mon, Sep 16, 2024 at 09:56:12PM +0800, Yiyang Wu wrote:
> > diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
> > new file mode 100644
> > index 000000000000..0f1400175fc2
> > --- /dev/null
> > +++ b/fs/erofs/rust/erofs_sys.rs
> > @@ -0,0 +1,22 @@
> > +#![allow(dead_code)]
> > +// Copyright 2024 Yiyang Wu
> > +// SPDX-License-Identifier: MIT or GPL-2.0-or-later
> 
> Sorry, but I have to ask, why a dual license here?  You are only linking
> to GPL-2.0-only code, so why the different license?  Especially if you
> used the GPL-2.0-only code to "translate" from.
> 
> If you REALLY REALLY want to use a dual license, please get your
> lawyers to document why this is needed and put it in the changelog for
> the next time you submit this series when adding files with dual
> licenses so I don't have to ask again :)
> 
> thanks,
> 
> greg k-h

C'mon, I just don't want this discussion to be heated.

I mean my original code is licensed under MIT and I've already learned
that Linux is under GPL-2.0. So i originally thought modifying it to
dual licenses can help address incompatiblity issues. According to
wikipedia, may I quote: "When software is multi-licensed, recipients
can typically choose the terms under which they want to use or
distribute the software, but the simple presence of multiple licenses
in a software package or library does not necessarily indicate that
the recipient can freely choose one or the other."[1], so according
to this, I believe putting these under a GPL-2.0 project should be
OK, since it will be forcily licensed **only** under GPL-2.0.

Since I wasn't involved in Kernel Development before, 
I just don't know you guys attitudes towards this kind of stuff.
If you guys are not pretty happy with this, I can just switch back to
GPL-2.0 and it's a big business for me.

Best Regards,
Yiyang Wu.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 02/24] erofs: add superblock data structure in Rust
  2024-09-17  0:18     ` Gao Xiang
@ 2024-09-17  5:34       ` Greg KH
  2024-09-17  5:45         ` Gao Xiang
  0 siblings, 1 reply; 69+ messages in thread
From: Greg KH @ 2024-09-17  5:34 UTC (permalink / raw)
  To: Gao Xiang; +Cc: Yiyang Wu, linux-fsdevel, linux-erofs, LKML, rust-for-linux

On Tue, Sep 17, 2024 at 08:18:06AM +0800, Gao Xiang wrote:
> Hi Greg,
> 
> On 2024/9/17 01:55, Greg KH wrote:
> > On Mon, Sep 16, 2024 at 09:56:12PM +0800, Yiyang Wu wrote:
> > > diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
> > > new file mode 100644
> > > index 000000000000..0f1400175fc2
> > > --- /dev/null
> > > +++ b/fs/erofs/rust/erofs_sys.rs
> > > @@ -0,0 +1,22 @@
> > > +#![allow(dead_code)]
> > > +// Copyright 2024 Yiyang Wu
> > > +// SPDX-License-Identifier: MIT or GPL-2.0-or-later
> > 
> > Sorry, but I have to ask, why a dual license here?  You are only linking
> > to GPL-2.0-only code, so why the different license?  Especially if you
> > used the GPL-2.0-only code to "translate" from.
> > 
> > If you REALLY REALLY want to use a dual license, please get your
> > lawyers to document why this is needed and put it in the changelog for
> > the next time you submit this series when adding files with dual
> > licenses so I don't have to ask again :)
> 
> As a new Rust kernel developper, Yiyang is working on EROFS Rust
> userspace implementation too.
> 
> I think he just would like to share the common Rust logic between
> kernel and userspace.

Is that actually possible here?  This is very kernel-specific code from
what I can tell, and again, it's based on the existing GPL-v2 code, so
you are kind of changing the license in the transformation to a
different language, right?

> Since for the userspace side, Apache-2.0
> or even MIT is more friendly for 3rd applications (especially
> cloud-native applications). So the dual license is proposed here,
> if you don't have strong opinion, I will ask Yiyang document this
> in the next version.  Or we're fine to drop MIT too.

If you do not have explicit reasons to do this, AND legal approval with
the understanding of how to do dual license kernel code properly, I
would not do it at all as it's a lot of extra work.  Again, talk to your
lawyers about this please.  And if you come up with the "we really want
to do this," great, just document it properly as to what is going on
here and why this decision is made.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 02/24] erofs: add superblock data structure in Rust
  2024-09-16 17:55   ` Greg KH
  2024-09-17  0:18     ` Gao Xiang
  2024-09-17  5:27     ` Yiyang Wu
@ 2024-09-17  5:39     ` Yiyang Wu
  2 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-17  5:39 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML

On Mon, Sep 16, 2024 at 07:55:43PM GMT, Greg KH wrote:
> On Mon, Sep 16, 2024 at 09:56:12PM +0800, Yiyang Wu wrote:
> > diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
> > new file mode 100644
> > index 000000000000..0f1400175fc2
> > --- /dev/null
> > +++ b/fs/erofs/rust/erofs_sys.rs
> > @@ -0,0 +1,22 @@
> > +#![allow(dead_code)]
> > +// Copyright 2024 Yiyang Wu
> > +// SPDX-License-Identifier: MIT or GPL-2.0-or-later
> 
> Sorry, but I have to ask, why a dual license here?  You are only linking
> to GPL-2.0-only code, so why the different license?  Especially if you
> used the GPL-2.0-only code to "translate" from.
> 
> If you REALLY REALLY want to use a dual license, please get your
> lawyers to document why this is needed and put it in the changelog for
> the next time you submit this series when adding files with dual
> licenses so I don't have to ask again :)
> 
> thanks,
> 
> greg k-h

C'Mon, I have no intension to make this discussion look heated.

I mean what I original code is under MIT and i've learned that Linux
is GPL-2.0, so I naively thought it's OK to dual licensed this to
support flexibility according the Wikipedia, should I quote: "When
software is multi-licensed, recipients can typically choose the terms
under which they want to use or distribute the software, but the simple
presence of multiple licenses in a software package or library does not
necessarily indicate that the recipient can 
freely choose one or the other. "[1]. Since it says multiple licenses
does not necessarily indicate that the recipient can freely choose one
or other,I thought the strictest license applies here and it should
GPL-2.0-only in this case.

I don't have any previous experience in Kernel Development so I really
just have no ideas about you guys attitude towards this kind of issue.
If insisted on switching back to GPL-2.0-only code, It's fine for me
and i'llchange this in the next version. Again I don't have this kind
of knowledge in advance, and if multi-license is inspected case-by-case,
project-by-project, then I will take notes and never make this
kind of mistakes again.

Best Regards,
Yiyang Wu.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 02/24] erofs: add superblock data structure in Rust
  2024-09-17  5:34       ` Greg KH
@ 2024-09-17  5:45         ` Gao Xiang
  0 siblings, 0 replies; 69+ messages in thread
From: Gao Xiang @ 2024-09-17  5:45 UTC (permalink / raw)
  To: Greg KH; +Cc: Yiyang Wu, linux-fsdevel, linux-erofs, LKML, rust-for-linux



On 2024/9/17 13:34, Greg KH wrote:
> On Tue, Sep 17, 2024 at 08:18:06AM +0800, Gao Xiang wrote:
>> Hi Greg,
>>
>> On 2024/9/17 01:55, Greg KH wrote:
>>> On Mon, Sep 16, 2024 at 09:56:12PM +0800, Yiyang Wu wrote:
>>>> diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
>>>> new file mode 100644
>>>> index 000000000000..0f1400175fc2
>>>> --- /dev/null
>>>> +++ b/fs/erofs/rust/erofs_sys.rs
>>>> @@ -0,0 +1,22 @@
>>>> +#![allow(dead_code)]
>>>> +// Copyright 2024 Yiyang Wu
>>>> +// SPDX-License-Identifier: MIT or GPL-2.0-or-later
>>>
>>> Sorry, but I have to ask, why a dual license here?  You are only linking
>>> to GPL-2.0-only code, so why the different license?  Especially if you
>>> used the GPL-2.0-only code to "translate" from.
>>>
>>> If you REALLY REALLY want to use a dual license, please get your
>>> lawyers to document why this is needed and put it in the changelog for
>>> the next time you submit this series when adding files with dual
>>> licenses so I don't have to ask again :)
>>
>> As a new Rust kernel developper, Yiyang is working on EROFS Rust
>> userspace implementation too.
>>
>> I think he just would like to share the common Rust logic between
>> kernel and userspace.
> 
> Is that actually possible here?  This is very kernel-specific code from
> what I can tell, and again, it's based on the existing GPL-v2 code, so
> you are kind of changing the license in the transformation to a
> different language, right?

It's possible, Yiyang implemented a total userspace Rust crates
to parse EROFS format with limited APIs:

https://github.com/ToolmanP/erofs-rs

Also take another C example, kernel XFS (fs/libxfs) and xfsprogs
(userspace) use the same codebase.  Although they both use GPL
license only.

> 
>> Since for the userspace side, Apache-2.0
>> or even MIT is more friendly for 3rd applications (especially
>> cloud-native applications). So the dual license is proposed here,
>> if you don't have strong opinion, I will ask Yiyang document this
>> in the next version.  Or we're fine to drop MIT too.
> 
> If you do not have explicit reasons to do this, AND legal approval with
> the understanding of how to do dual license kernel code properly, I
> would not do it at all as it's a lot of extra work.  Again, talk to your
> lawyers about this please.  And if you come up with the "we really want
> to do this," great, just document it properly as to what is going on
> here and why this decision is made.

Ok, then let's stay with GPL only.  Although as I mentioned,
cloud-native applications are happy with Apache-2.0 or MIT, which
means there could be diverged for kernel and userspace on the Rust
side too.

Thanks,
Gao Xiang

> 
> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C
  2024-09-16 17:08   ` Al Viro
@ 2024-09-17  6:48     ` Yiyang Wu
  2024-09-17  7:14       ` Gao Xiang
  0 siblings, 1 reply; 69+ messages in thread
From: Yiyang Wu @ 2024-09-17  6:48 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML

On Mon, Sep 16, 2024 at 06:08:01PM GMT, Al Viro wrote:
> On Mon, Sep 16, 2024 at 09:56:29PM +0800, Yiyang Wu wrote:
> > +/// Lookup function for dentry-inode lookup replacement.
> > +#[no_mangle]
> > +pub unsafe extern "C" fn erofs_lookup_rust(
> > +    k_inode: NonNull<inode>,
> > +    dentry: NonNull<dentry>,
> > +    _flags: c_uint,
> > +) -> *mut c_void {
> > +    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
> > +    let erofs_inode = unsafe { &*container_of!(k_inode.as_ptr(), KernelInode, k_inode) };
> 
> 	Ummm...  A wrapper would be highly useful.  And the reason why
> it's safe is different - your function is called only via ->i_op->lookup,
> the is only one instance of inode_operations that has that ->lookup
> method, and the only place where an inode gets ->i_op set to that
> is erofs_fill_inode().  Which is always passed erofs_inode::vfs_inode.
> 
So my original intention behind this is that all vfs_inodes come from
that erofs_iget function and it's always gets initialized in this case
And this just followes the same convention here. I can document this
more precisely.
> > +    // SAFETY: The super_block is initialized when the erofs_alloc_sbi_rust is called.
> > +    let sbi = erofs_sbi(unsafe { NonNull::new(k_inode.as_ref().i_sb).unwrap() });
> 
> 	Again, that calls for a wrapper - this time not erofs-specific;
> inode->i_sb is *always* non-NULL, is assign-once and always points
> to live struct super_block instance at least until the call of
> destroy_inode().
>

Will be modified correctly, I'm not a native speaker and I just can't
find a better way, I will take my note here.
> > +    // SAFETY: this is backed by qstr which is c representation of a valid slice.
> 
> 	What is that sentence supposed to mean?  Nevermind "why is it correct"...
> 
> > +    let name = unsafe {
> > +        core::str::from_utf8_unchecked(core::slice::from_raw_parts(
> > +            dentry.as_ref().d_name.name,
> > +            dentry.as_ref().d_name.__bindgen_anon_1.__bindgen_anon_1.len as usize,
> 
> 	Is that supposed to be an example of idiomatic Rust?  I'm not
> trying to be snide, but my interest here is mostly about safety of
> access to VFS data structures.	And ->d_name is _very_ unpleasant in
> that respect; the locking rules required for its stability are subtle
> and hard to verify on manual code audit.
> 
Yeah, this code is pretty messed up. I just cannot find a better
way to use this qstr in dentry. So the original C qstr is this.

```c
struct qstr {
	union {
		struct {
			HASH_LEN_DECLARE;
		};
		u64 hash_len;
	};
	const unsigned char *name;
};

```
The original C code can use this pretty easily.

```C
int erofs_namei(struct inode *dir, const struct qstr *name, erofs_nid_t *nid,
		unsigned int *d_type)
{
	int ndirents;
	struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
	struct erofs_dirent *de;
	struct erofs_qstr qn;

	if (!dir->i_size)
		return -ENOENT;

	qn.name = name->name;
	qn.end = name->name + name->len;
	buf.mapping = dir->i_mapping;

```

But after bindgen, since Rust does not support any kinds of anonymous
unions here it just converts to this.

```rust
#[repr(C)]
#[derive(Copy, Clone)]
pub struct qstr {
    pub __bindgen_anon_1: qstr__bindgen_ty_1,
    pub name: *const core::ffi::c_uchar,
}

#[repr(C)]
#[derive(Copy, Clone)]
pub union qstr__bindgen_ty_1 {
    pub __bindgen_anon_1: qstr__bindgen_ty_1__bindgen_ty_1,
    pub hash_len: u64_,
}

```
And it just somehow degrades to this pure mess. :(
I know this is stupid :(

> 	Current erofs_lookup() (and your version as well) *is* indeed
> safe in that respect, but the proof (from filesystem POV) is that "it's
> called only as ->lookup() instance, so dentry is initially unhashed
> negative and will remain such until it's passed to d_splice_alias();
> until that point it is guaranteed to have ->d_name and ->d_parent stable".
> 
> 	Note that once you _have_ called d_splice_alias(), you can't
> count upon the ->d_name stability - or, indeed, upon ->d_name.name you've
> sampled still pointing to allocated memory.
> 
> 	For directory-modifying methods it's "stable, since parent is held
> exclusive".  Some internal function called from different environments?
> Well...  Swear, look through the call graph and see what can be proven
> for each.

Sorry for my ignorance.
I mean i just borrowed the code from the fs/erofs/namei.c and i directly
translated that into Rust code. That might be a problem that also
exists in original working C code.

> 	Expressing that kind of fun in any kind of annotations (Rust type
> system included) is not pleasant.  _Probably_ might be handled by a type
> that would be a dentry pointer with annotation along the lines "->d_name
> and ->d_parent of that one are stable".  Then e.g. ->lookup() would
> take that thing as an argument and d_splice_alias() would consume it.
> ->mkdir() would get the same thing, etc.  I hadn't tried to get that
> all way through (the amount of annotation churn in existing filesystems
> would be high and hard to split into reviewable patch series), so there
> might be dragons - and there definitely are places where the stability is
> proven in different ways (e.g. if dentry->d_lock is held, we have the damn
> thing stable; then there's a "take a safe snapshot of name" API; etc.).

That's kinda interesting, I originally thought that VFS will make sure
its d_name / d_parent is stable in the first place.
Again, I just don't have a full picture or understanding of VFS and my
code is just basic translation of original C code, Maybe we can address
this later.

> 	I want to reduce the PITA of regular code audits.  If, at
> some point, Rust use in parts of the tree reduces that - wonderful.
> But then we'd better make sure that Rust-side uses _are_ safe, accurately
> annotated and easy to grep for.

Yeah, even a small portion of VFS gets abstracted can get things work
smoothly, like inode and dentry data structure. From my understanding
Rust can make some of stability issues you have mentioned
compilation errors if it's correctly annotated.

> Because we'll almost certainly need to
> change method calling conventions at some points through all of that.
> Even if it's just the annotation-level, any such contract change (it
> is doable and quite a few had been done) will require going through
> the instances and checking how much massage will be needed in those.
> Opaque chunks like the above promise to make that very painful...

Certainly needs a lot of energy and efforts.
WE just need a guy who has full grasp of a lot of filesystems and Rust
to solve this stuff altogether to avoid the abstraction degraded
into some toy concepts.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C
  2024-09-17  6:48     ` Yiyang Wu
@ 2024-09-17  7:14       ` Gao Xiang
  2024-09-17  7:31         ` Al Viro
  0 siblings, 1 reply; 69+ messages in thread
From: Gao Xiang @ 2024-09-17  7:14 UTC (permalink / raw)
  To: Yiyang Wu, Al Viro; +Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML



On 2024/9/17 14:48, Yiyang Wu wrote:
> On Mon, Sep 16, 2024 at 06:08:01PM GMT, Al Viro wrote:
>> On Mon, Sep 16, 2024 at 09:56:29PM +0800, Yiyang Wu wrote:
>>> +/// Lookup function for dentry-inode lookup replacement.
>>> +#[no_mangle]
>>> +pub unsafe extern "C" fn erofs_lookup_rust(
>>> +    k_inode: NonNull<inode>,
>>> +    dentry: NonNull<dentry>,
>>> +    _flags: c_uint,
>>> +) -> *mut c_void {
>>> +    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
>>> +    let erofs_inode = unsafe { &*container_of!(k_inode.as_ptr(), KernelInode, k_inode) };
>>
>> 	Ummm...  A wrapper would be highly useful.  And the reason why
>> it's safe is different - your function is called only via ->i_op->lookup,
>> the is only one instance of inode_operations that has that ->lookup
>> method, and the only place where an inode gets ->i_op set to that
>> is erofs_fill_inode().  Which is always passed erofs_inode::vfs_inode.
>>
> So my original intention behind this is that all vfs_inodes come from
> that erofs_iget function and it's always gets initialized in this case
> And this just followes the same convention here. I can document this
> more precisely.

I think Al just would like a wrapper here, like the current C EROFS_I().

>>> +    // SAFETY: The super_block is initialized when the erofs_alloc_sbi_rust is called.
>>> +    let sbi = erofs_sbi(unsafe { NonNull::new(k_inode.as_ref().i_sb).unwrap() });
>>
>> 	Again, that calls for a wrapper - this time not erofs-specific;
>> inode->i_sb is *always* non-NULL, is assign-once and always points
>> to live struct super_block instance at least until the call of
>> destroy_inode().
>>
> 
> Will be modified correctly, I'm not a native speaker and I just can't
> find a better way, I will take my note here.

Same here, like the current EROFS_I_SB().

>>> +    // SAFETY: this is backed by qstr which is c representation of a valid slice.
>>
>> 	What is that sentence supposed to mean?  Nevermind "why is it correct"...
>>
>>> +    let name = unsafe {
>>> +        core::str::from_utf8_unchecked(core::slice::from_raw_parts(
>>> +            dentry.as_ref().d_name.name,
>>> +            dentry.as_ref().d_name.__bindgen_anon_1.__bindgen_anon_1.len as usize,
>>

...

> 
>> 	Current erofs_lookup() (and your version as well) *is* indeed
>> safe in that respect, but the proof (from filesystem POV) is that "it's
>> called only as ->lookup() instance, so dentry is initially unhashed
>> negative and will remain such until it's passed to d_splice_alias();
>> until that point it is guaranteed to have ->d_name and ->d_parent stable".

Agreed.

>>
>> 	Note that once you _have_ called d_splice_alias(), you can't
>> count upon the ->d_name stability - or, indeed, upon ->d_name.name you've
>> sampled still pointing to allocated memory.
>>
>> 	For directory-modifying methods it's "stable, since parent is held
>> exclusive".  Some internal function called from different environments?
>> Well...  Swear, look through the call graph and see what can be proven
>> for each.
> 
> Sorry for my ignorance.
> I mean i just borrowed the code from the fs/erofs/namei.c and i directly
> translated that into Rust code. That might be a problem that also
> exists in original working C code.

As for EROFS (an immutable fs), I think after d_splice_alias(), d_name is
still stable (since we don't have rename semantics likewise for now).

But as the generic filesystem POV, d_name access is actually tricky under
RCU walk path indeed.

> 
>> 	Expressing that kind of fun in any kind of annotations (Rust type
>> system included) is not pleasant.  _Probably_ might be handled by a type
>> that would be a dentry pointer with annotation along the lines "->d_name
>> and ->d_parent of that one are stable".  Then e.g. ->lookup() would
>> take that thing as an argument and d_splice_alias() would consume it.
>> ->mkdir() would get the same thing, etc.  I hadn't tried to get that
>> all way through (the amount of annotation churn in existing filesystems
>> would be high and hard to split into reviewable patch series), so there
>> might be dragons - and there definitely are places where the stability is
>> proven in different ways (e.g. if dentry->d_lock is held, we have the damn
>> thing stable; then there's a "take a safe snapshot of name" API; etc.).
> 
> That's kinda interesting, I originally thought that VFS will make sure
> its d_name / d_parent is stable in the first place.
> Again, I just don't have a full picture or understanding of VFS and my
> code is just basic translation of original C code, Maybe we can address
> this later.

d_alloc will allocate an unhashed dentry which is almost unrecognized
by VFS dcache (d_name is stable of course).

After d_splice_alias() and d_add(), rename() could change d_name.  So
either we take d_lock or with rcu_read_lock() to take a snapshot of
d_name in the RCU walk path.  That is my overall understanding.

But for EROFS, since we don't have rename, so it doesn't matter.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C
  2024-09-17  7:14       ` Gao Xiang
@ 2024-09-17  7:31         ` Al Viro
  2024-09-17  7:44           ` Al Viro
  2024-09-17  8:06           ` Gao Xiang
  0 siblings, 2 replies; 69+ messages in thread
From: Al Viro @ 2024-09-17  7:31 UTC (permalink / raw)
  To: Gao Xiang; +Cc: Yiyang Wu, linux-erofs, rust-for-linux, linux-fsdevel, LKML

On Tue, Sep 17, 2024 at 03:14:58PM +0800, Gao Xiang wrote:

> > Sorry for my ignorance.
> > I mean i just borrowed the code from the fs/erofs/namei.c and i directly
> > translated that into Rust code. That might be a problem that also
> > exists in original working C code.
> 
> As for EROFS (an immutable fs), I think after d_splice_alias(), d_name is
> still stable (since we don't have rename semantics likewise for now).

Even on corrupted images?  If you have two directories with entries that
act as hardlinks to the same subdirectory, and keep hitting them on lookups,
it will have to transplant the subtree between the parents.

> But as the generic filesystem POV, d_name access is actually tricky under
> RCU walk path indeed.

->lookup() is never called in RCU mode.

> > That's kinda interesting, I originally thought that VFS will make sure
> > its d_name / d_parent is stable in the first place.
> > Again, I just don't have a full picture or understanding of VFS and my
> > code is just basic translation of original C code, Maybe we can address
> > this later.
> 
> d_alloc will allocate an unhashed dentry which is almost unrecognized
> by VFS dcache (d_name is stable of course).
> 
> After d_splice_alias() and d_add(), rename() could change d_name.  So
> either we take d_lock or with rcu_read_lock() to take a snapshot of
> d_name in the RCU walk path.  That is my overall understanding.

No, it's more complicated than that, sadly.  ->d_name and ->d_parent are
the trickiest parts of dentry field stability.

> But for EROFS, since we don't have rename, so it doesn't matter.

See above.  IF we could guarantee that all filesystem images are valid
and will remain so, life would be much simpler.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C
  2024-09-17  7:31         ` Al Viro
@ 2024-09-17  7:44           ` Al Viro
  2024-09-17  8:08             ` Gao Xiang
  2024-09-17 22:22             ` Al Viro
  2024-09-17  8:06           ` Gao Xiang
  1 sibling, 2 replies; 69+ messages in thread
From: Al Viro @ 2024-09-17  7:44 UTC (permalink / raw)
  To: Gao Xiang; +Cc: Yiyang Wu, linux-erofs, rust-for-linux, linux-fsdevel, LKML

On Tue, Sep 17, 2024 at 08:31:49AM +0100, Al Viro wrote:

> > After d_splice_alias() and d_add(), rename() could change d_name.  So
> > either we take d_lock or with rcu_read_lock() to take a snapshot of
> > d_name in the RCU walk path.  That is my overall understanding.
> 
> No, it's more complicated than that, sadly.  ->d_name and ->d_parent are
> the trickiest parts of dentry field stability.
> 
> > But for EROFS, since we don't have rename, so it doesn't matter.
> 
> See above.  IF we could guarantee that all filesystem images are valid
> and will remain so, life would be much simpler.

In any case, currently it is safe - d_splice_alias() is the last thing
done by erofs_lookup().  Just don't assume that names can't change in
there - and the fewer places in filesystem touch ->d_name, the better.

In practice, for ->lookup() you are safe until after d_splice_alias()
and for directory-modifying operations you are safe unless you start
playing insane games with unlocking and relocking the parent directories
(apparmorfs does; the locking is really obnoxious there).  That covers
the majority of ->d_name and ->d_parent accesses in filesystem code.

->d_hash() and ->d_compare() are separate story; I've posted a text on
that last year (or this winter - not sure, will check once I get some
sleep).

d_path() et.al. are taking care to do the right thing; those (and %pd
format) can be used safely.

Anyway, I'm half-asleep at the moment and I'd rather leave writing these
rules up until tomorrow.  Sorry...

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C
  2024-09-17  7:31         ` Al Viro
  2024-09-17  7:44           ` Al Viro
@ 2024-09-17  8:06           ` Gao Xiang
  1 sibling, 0 replies; 69+ messages in thread
From: Gao Xiang @ 2024-09-17  8:06 UTC (permalink / raw)
  To: Al Viro; +Cc: Yiyang Wu, linux-erofs, rust-for-linux, linux-fsdevel, LKML



On 2024/9/17 15:31, Al Viro wrote:
> On Tue, Sep 17, 2024 at 03:14:58PM +0800, Gao Xiang wrote:
> 
>>> Sorry for my ignorance.
>>> I mean i just borrowed the code from the fs/erofs/namei.c and i directly
>>> translated that into Rust code. That might be a problem that also
>>> exists in original working C code.
>>
>> As for EROFS (an immutable fs), I think after d_splice_alias(), d_name is
>> still stable (since we don't have rename semantics likewise for now).
> 
> Even on corrupted images?  If you have two directories with entries that
> act as hardlinks to the same subdirectory, and keep hitting them on lookups,
> it will have to transplant the subtree between the parents.

Oh, I missed unexpected directory hardlink corrupted cases.

> 
>> But as the generic filesystem POV, d_name access is actually tricky under
>> RCU walk path indeed.
> 
> ->lookup() is never called in RCU mode.

I know, I just said d_name access is tricky in RCU walk.

->lookup() is for real lookup, not search dcache as fast cached lookup
in the RCU context.

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C
  2024-09-17  7:44           ` Al Viro
@ 2024-09-17  8:08             ` Gao Xiang
  2024-09-17 22:22             ` Al Viro
  1 sibling, 0 replies; 69+ messages in thread
From: Gao Xiang @ 2024-09-17  8:08 UTC (permalink / raw)
  To: Al Viro; +Cc: Yiyang Wu, linux-erofs, rust-for-linux, linux-fsdevel, LKML



On 2024/9/17 15:44, Al Viro wrote:
> On Tue, Sep 17, 2024 at 08:31:49AM +0100, Al Viro wrote:
> 
>>> After d_splice_alias() and d_add(), rename() could change d_name.  So
>>> either we take d_lock or with rcu_read_lock() to take a snapshot of
>>> d_name in the RCU walk path.  That is my overall understanding.
>>
>> No, it's more complicated than that, sadly.  ->d_name and ->d_parent are
>> the trickiest parts of dentry field stability.
>>
>>> But for EROFS, since we don't have rename, so it doesn't matter.
>>
>> See above.  IF we could guarantee that all filesystem images are valid
>> and will remain so, life would be much simpler.
> 
> In any case, currently it is safe - d_splice_alias() is the last thing
> done by erofs_lookup().  Just don't assume that names can't change in
> there - and the fewer places in filesystem touch ->d_name, the better.
> 
> In practice, for ->lookup() you are safe until after d_splice_alias()
> and for directory-modifying operations you are safe unless you start
> playing insane games with unlocking and relocking the parent directories
> (apparmorfs does; the locking is really obnoxious there).  That covers
> the majority of ->d_name and ->d_parent accesses in filesystem code.
> 
> ->d_hash() and ->d_compare() are separate story; I've posted a text on
> that last year (or this winter - not sure, will check once I get some
> sleep).
> 
> d_path() et.al. are taking care to do the right thing; those (and %pd
> format) can be used safely.
> 
> Anyway, I'm half-asleep at the moment and I'd rather leave writing these
> rules up until tomorrow.  Sorry...

Agreed, thanks for writing so many words on this!

Thanks,
Gao Xiang



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C
  2024-09-17  7:44           ` Al Viro
  2024-09-17  8:08             ` Gao Xiang
@ 2024-09-17 22:22             ` Al Viro
  1 sibling, 0 replies; 69+ messages in thread
From: Al Viro @ 2024-09-17 22:22 UTC (permalink / raw)
  To: Gao Xiang; +Cc: Yiyang Wu, linux-erofs, rust-for-linux, linux-fsdevel, LKML

On Tue, Sep 17, 2024 at 08:44:29AM +0100, Al Viro wrote:

> Anyway, I'm half-asleep at the moment and I'd rather leave writing these
> rules up until tomorrow.  Sorry...


[Below are the bits of my notes related to d_name and d_parent,
with most of the unprintable parts thrown out and minimal markup added.
Probably not all relevant notes are here - this has been culled from
a bunch of files sitting around and I might've missed some]

NOTE: ->d_parent and ->d_name are by far the worst parts of dentry
wrt stability rules.  Code audits are not pleasant, to put it mildly.
This covers the bits outside of VFS proper - verifying the locking
rules is a separate story.


A Really Blunt Tool You Should Not Use.
======================================

All changes of ->d_parent are serialized on rename_lock.  It's *NOT*
something you want the stuff outside of core VFS to touch, though.
It's a seqlock, and write_seqlock() on it is limited to fs/dcache.c
alone.  Reader side is allowed, but it's still not something you
want to use lightly - outside of fs/dcache.c, fs/d_path.c and fs/namei.c
there are only 3 users (ceph_mdsc_build_path(), nfs_path() and
auditsc handle_path()).  Don't add more without a discussion on fsdevel
and detailed ACKs; it's quite likely that a better solution will be
found.

With one exception (see the discussion of d_mark_tmpfile() in the end),
->d_name is also stabilized by that.


Slightly Less Blunt Tool You Still Should Not Use.
==================================================

->s_vfs_rename_mutex will stabilize ->d_parent.  The trouble is,
while it's not system-wide like rename_lock, it's fs-wide, so there's
a plenty of contention to run into *AND* if you try that while
->i_rwsem is held on some directory in that filesystem, you are fucked -
lock_rename() (and rename(2), and...) will deadlock on you.
Nothing outside of fs/{namei,dcache}.c touches it directly; there is
an indirect use (lock_rename()), but that should be done only around
the call of cross-directory rename on _another_ filesystem - overlayfs
moving stuff within the writable layer, ecryptfs doing rename on
underlying filesystem, nfsd handling rename request from client, etc.

Anyone trying to use that hammer without a good reason will be very
sorry - that's a promise.  The pain will begin with the request to
adjust the proof of deadlock avoidance in directory locking and it
will only go downwards from there...


Parent's ->i_rwsem Held Exclusive.
==================================

That stabilizes ->d_parent and ->d_name.  To be more precise,

holding parent->d_inode->i_rwsem exclusive stabilizes the result
of (dentry->d_parent == parent).  That is to say, dentry
that is a child of parent will remain such and dentry that isn't
a child won't become a child.

holding parent->d_inode->i_rwsem exclusive stabilizes ->d_name of
all children.  In other words, if you've locked the parent exclusive
and found something to be its child while keeping the parent locked,
child will have ->d_parent and ->d_name stable until you unlock the
parent.

That covers most of the directory-modifying methods - stuff like
->mkdir(), ->unlink(), ->rename(), etc. can access ->d_parent and
->d_name of the dentry argument(s) without any extra locks; the
caller is already holding enough.  Well, unless you are special
(*cough* apparmor *cough*) and feel like dropping and regaining
the lock on parent inside your ->mkdir()...  Don't do that, please -
you might have no renames, but there's a plenty of other headache
you can get into that way.


Negatives.
==========

Negative dentry doesn't change ->d_parent or ->d_name.  Of course,
that is only worth something if you are guaranteed that it won't
become positive under you - if that happens, all bets are off.

Holding the inode of parent locked (at least) shared is enough to
guarantee that.  That takes care of ->lookup() instances - their
dentry argument has ->d_parent and ->d_name stable until you make
it positive (normally - by d_splice_alias()).  Once you've done
d_splice_alias(), you'd better be careful with access to those;
you won't get hit by concurrent rename() (it locks parent(s)
exclusive), but if your inode is a directory and d_splice_alias()
*elsewhere* picks the same inode (fs image corruption, network
filesystem with rename done from another client behind your back,
etc.), you'll see the sucker moved.

In practice, d_splice_alias() is the last thing done by most of ->lookup()
instances - whatever it has returned gets returned by ->lookup() itself,
possibly after freeing some temporary allocations, etc.  The rest needs
to watch out for accesses to ->d_name and ->d_parent downstream of
d_splice_alias() return.

Another case where we are guaranteed that dentry is negative and
will stay so is ->d_release() - it's called for a dentry that is
well on the way to becoming an ex-parrot; it's already marked
dead, unhashed and negative.  So a ->d_release() instance doesn't
have to worry about ->d_name and ->d_parent - both are valid and
stable.


sprintf().
==========

%pd prints dentry name, safely.  %p2d - parent_name/dentry_name, etc. up
to %p4d.  %pD .. %p4D  do the same by file reference.  Any time you see
pr_warn("Some weird bollocks with %s (%d)\n", dentry->d_name, err);
it should've been
pr_warn("Some weird bollocks with %pd (%d)\n", dentry, err)...

d_path() and friends are there for purpose - don't open-code those without
a damn good reason.


Checking if one dentry is an ancestor of another.
=================================================

Use the damn is_subdir(), don't open-code it.


Spinlocks.
==========

dentry->d_lock stabilizes ->d_parent and ->d_name (as well as almost
everything else about dentry); downside is that it's a spinlock
*and* nesting it is not to be attempted easily; you are allowed
to lock child while holding lock on parent, but very few places
have any business doing that (only 2 outside of VFS - tree-walking
in autofs, which might eventually get redone avoiding that and
fsnotify_set_children_dentry_flags(), which just might get moved to
fs/dcache.c itself; we'll see)

Note that "almost everything" includes refcount; that is to say, dget()
and dput() will spin if you are holding ->d_lock, so you can't dget()
the parent under ->d_lock on child - that's a locking order violation
that can easily deadlock on you.  dget_parent() does that kind of thing
safely, and a look at it might be instructive.  Try to open-code something
of that sort, and you'll be hurt.

Said that, dget_parent() is overused - it has legitimate uses, but
more often than not it's the wrong tool.  In particular, while you
grab a reference to something that was the parent at some point during
dget_parent(), it might not be the parent anymore by the time it is
returned to caller.

Most of the dget_parent() uses are due to bad calling conventions of
->d_revalidate().  When that gets sanitized, those will be gone.

The methods that are called with ->d_lock get the protection - that
would be ->d_delete() and ->d_prune().

->d_lock on parent is also sufficient; similar to exclusive ->i_rwsem,
parent->d_lock stabilizes (dentry->d_parent == parent) and if child
has been observed to have child->d_parent equal to parent after
you've locked parent->d_lock, you know that child->d_{parent,name}
will remain stable until you unlock the parent.


Name Snapshots.
===============

There's take_dentry_name_snapshot()/release_dentry_name_snapshot().
That stuff eats about 64 bytes on stack (longer names _are_ handled
correctly; no allocations are needed, we can simply grab an extra
reference to external name and hold it).  Can't be done under
->d_lock, won't do anything about ->d_parent *and* there's nothing
to prevent dentry being renamed while you are looking at the name
snapshot.  Sometimes it's useful...


RCU Headaches.
==============

See e.g. https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=fixes.pathwalk-rcu-2&id=8d0a75eba81813cbb00beb73a67783e1cde9982f
(NB: ought to repost that)


[*] d_mark_tmpfile() is pretty much a delayed bit of constructor.  There is
a possible intermediate state of dentry ("will be tmpfile one"); dentries
in that state are all created in vfs_tmpfile(), get passed to ->tmpfile()
where they transition to normal unhashed postive dentries.  The reason why
name is not set from the very beginning is that at that point we do not know
the inumber of inode they are going to get (that becomes known inside
->tmpfile() instance) and we want that inumber seen in their names (for
/proc/*/fd/*, basically).  Name change is done while holding ->d_lock both
on dentry itself and on its parent.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [External Mail][RFC PATCH 05/24] erofs: add inode data structure in Rust
  2024-09-16 13:56 ` [RFC PATCH 05/24] erofs: add inode " Yiyang Wu
@ 2024-09-18 13:04   ` Huang Jianan
  0 siblings, 0 replies; 69+ messages in thread
From: Huang Jianan @ 2024-09-18 13:04 UTC (permalink / raw)
  To: Yiyang Wu, linux-erofs@lists.ozlabs.org
  Cc: linux-fsdevel@vger.kernel.org, LKML,
	rust-for-linux@vger.kernel.org

On 2024/9/16 21:56, Yiyang Wu via Linux-erofs wrote:
> 
> This patch introduces the same on-disk erofs data structure
> in rust and also introduces multiple helpers for inode i_format
> and chunk_indexing and later can be used to implement map_blocks.
> 
> Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
> ---
>   fs/erofs/rust/erofs_sys.rs       |   1 +
>   fs/erofs/rust/erofs_sys/inode.rs | 291 +++++++++++++++++++++++++++++++
>   2 files changed, 292 insertions(+)
>   create mode 100644 fs/erofs/rust/erofs_sys/inode.rs
> 
> diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
> index 6f3c12665ed6..34267ec7772d 100644
> --- a/fs/erofs/rust/erofs_sys.rs
> +++ b/fs/erofs/rust/erofs_sys.rs
> @@ -24,6 +24,7 @@
>   pub(crate) type PosixResult<T> = Result<T, Errno>;
> 
>   pub(crate) mod errnos;
> +pub(crate) mod inode;
>   pub(crate) mod superblock;
>   pub(crate) mod xattrs;
>   pub(crate) use errnos::Errno;
> diff --git a/fs/erofs/rust/erofs_sys/inode.rs b/fs/erofs/rust/erofs_sys/inode.rs
> new file mode 100644
> index 000000000000..1762023e97f8
> --- /dev/null
> +++ b/fs/erofs/rust/erofs_sys/inode.rs
> @@ -0,0 +1,291 @@
> +use super::xattrs::*;
> +use super::*;
> +use core::ffi::*;
> +use core::mem::size_of;
> +
> +/// Represents the compact bitfield of the Erofs Inode format.
> +#[repr(transparent)]
> +#[derive(Clone, Copy)]
> +pub(crate) struct Format(u16);
> +
> +pub(crate) const INODE_VERSION_MASK: u16 = 0x1;
> +pub(crate) const INODE_VERSION_BIT: u16 = 0;
> +
> +pub(crate) const INODE_LAYOUT_BIT: u16 = 1;
> +pub(crate) const INODE_LAYOUT_MASK: u16 = 0x7;
> +
> +/// Helper macro to extract property from the bitfield.
> +macro_rules! extract {
> +    ($name: expr, $bit: expr, $mask: expr) => {
> +        ($name >> $bit) & ($mask)
> +    };
> +}
> +
> +/// The Version of the Inode which represents whether this inode is extended or compact.
> +/// Extended inodes have more infos about nlinks + mtime.
> +/// This is documented in https://erofs.docs.kernel.org/en/latest/core_ondisk.html#inodes
> +#[repr(C)]
> +#[derive(Clone, Copy)]
> +pub(crate) enum Version {
> +    Compat,
> +    Extended,
> +    Unknown,
> +}
> +
> +/// Represents the data layout backed by the Inode.
> +/// As Documented in https://erofs.docs.kernel.org/en/latest/core_ondisk.html#inode-data-layouts
> +#[repr(C)]
> +#[derive(Clone, Copy, PartialEq)]
> +pub(crate) enum Layout {
> +    FlatPlain,
> +    CompressedFull,
> +    FlatInline,
> +    CompressedCompact,
> +    Chunk,
> +    Unknown,
> +}
> +
> +#[repr(C)]
> +#[allow(non_camel_case_types)]
> +#[derive(Clone, Copy, Debug, PartialEq)]
> +pub(crate) enum Type {
> +    Regular,
> +    Directory,
> +    Link,
> +    Character,
> +    Block,
> +    Fifo,
> +    Socket,
> +    Unknown,
> +}
> +
> +/// This is format extracted from i_format bit representation.
> +/// This includes various infos and specs about the inode.
> +impl Format {
> +    pub(crate) fn version(&self) -> Version {
> +        match extract!(self.0, INODE_VERSION_BIT, INODE_VERSION_MASK) {
> +            0 => Version::Compat,
> +            1 => Version::Extended,
> +            _ => Version::Unknown,
> +        }
> +    }
> +
> +    pub(crate) fn layout(&self) -> Layout {
> +        match extract!(self.0, INODE_LAYOUT_BIT, INODE_LAYOUT_MASK) {
> +            0 => Layout::FlatPlain,
> +            1 => Layout::CompressedFull,
> +            2 => Layout::FlatInline,
> +            3 => Layout::CompressedCompact,
> +            4 => Layout::Chunk,
> +            _ => Layout::Unknown,
> +        }
> +    }
> +}
> +
> +/// Represents the compact inode which resides on-disk.
> +/// This is documented in https://erofs.docs.kernel.org/en/latest/core_ondisk.html#inodes
> +#[repr(C)]
> +#[derive(Clone, Copy)]
> +pub(crate) struct CompactInodeInfo {
> +    pub(crate) i_format: Format,
> +    pub(crate) i_xattr_icount: u16,
> +    pub(crate) i_mode: u16,
> +    pub(crate) i_nlink: u16,
> +    pub(crate) i_size: u32,
> +    pub(crate) i_reserved: [u8; 4],
> +    pub(crate) i_u: [u8; 4],
> +    pub(crate) i_ino: u32,
> +    pub(crate) i_uid: u16,
> +    pub(crate) i_gid: u16,
> +    pub(crate) i_reserved2: [u8; 4],
> +}
> +
> +/// Represents the extended inode which resides on-disk.
> +/// This is documented in https://erofs.docs.kernel.org/en/latest/core_ondisk.html#inodes
> +#[repr(C)]
> +#[derive(Clone, Copy)]
> +pub(crate) struct ExtendedInodeInfo {
> +    pub(crate) i_format: Format,
> +    pub(crate) i_xattr_icount: u16,
> +    pub(crate) i_mode: u16,
> +    pub(crate) i_reserved: [u8; 2],
> +    pub(crate) i_size: u64,
> +    pub(crate) i_u: [u8; 4],
> +    pub(crate) i_ino: u32,
> +    pub(crate) i_uid: u32,
> +    pub(crate) i_gid: u32,
> +    pub(crate) i_mtime: u64,
> +    pub(crate) i_mtime_nsec: u32,
> +    pub(crate) i_nlink: u32,
> +    pub(crate) i_reserved2: [u8; 16],
> +}
> +
> +/// Represents the inode info which is either compact or extended.
> +#[derive(Clone, Copy)]
> +pub(crate) enum InodeInfo {
> +    Extended(ExtendedInodeInfo),
> +    Compact(CompactInodeInfo),
> +}
> +
> +pub(crate) const CHUNK_BLKBITS_MASK: u16 = 0x1f;
> +pub(crate) const CHUNK_FORMAT_INDEX_BIT: u16 = 0x20;
> +
> +/// Represents on-disk chunk index of the file backing inode.
> +#[repr(C)]
> +#[derive(Clone, Copy, Debug)]
> +pub(crate) struct ChunkIndex {
> +    pub(crate) advise: u16,
> +    pub(crate) device_id: u16,
> +    pub(crate) blkaddr: u32,
> +}
> +
> +impl From<[u8; 8]> for ChunkIndex {
> +    fn from(u: [u8; 8]) -> Self {
> +        let advise = u16::from_le_bytes([u[0], u[1]]);
> +        let device_id = u16::from_le_bytes([u[2], u[3]]);
> +        let blkaddr = u32::from_le_bytes([u[4], u[5], u[6], u[7]]);
> +        ChunkIndex {
> +            advise,
> +            device_id,
> +            blkaddr,
> +        }
> +    }
> +}
> +
> +/// Chunk format used for indicating the chunkbits and chunkindex.
> +#[repr(C)]
> +#[derive(Clone, Copy, Debug)]
> +pub(crate) struct ChunkFormat(pub(crate) u16);
> +
> +impl ChunkFormat {
> +    pub(crate) fn is_chunkindex(&self) -> bool {
> +        self.0 & CHUNK_FORMAT_INDEX_BIT != 0
> +    }
> +    pub(crate) fn chunkbits(&self) -> u16 {
> +        self.0 & CHUNK_BLKBITS_MASK
> +    }

It is recommended to add blank lines between code blocks. This problem 
exists in many places in this patch set.

> +}
> +
> +/// Represents the inode spec which is either data or device.
> +#[derive(Clone, Copy, Debug)]
> +#[repr(u32)]
> +pub(crate) enum Spec {
> +    Chunk(ChunkFormat),
> +    RawBlk(u32),
> +    Device(u32),
> +    CompressedBlocks(u32),
> +    Unknown,
> +}
> +
> +/// Convert the spec from the format of the inode based on the layout.
> +impl From<(&[u8; 4], Layout)> for Spec {
> +    fn from(value: (&[u8; 4], Layout)) -> Self {
> +        match value.1 {
> +            Layout::FlatInline | Layout::FlatPlain => Spec::RawBlk(u32::from_le_bytes(*value.0)),
> +            Layout::CompressedFull | Layout::CompressedCompact => {
> +                Spec::CompressedBlocks(u32::from_le_bytes(*value.0))
> +            }
> +            Layout::Chunk => Self::Chunk(ChunkFormat(u16::from_le_bytes([value.0[0], value.0[1]]))),
> +            // We don't support compressed inlines or compressed chunks currently.
> +            _ => Spec::Unknown,
> +        }
> +    }
> +}
> +
> +/// Helper functions for Inode Info.
> +impl InodeInfo {
> +    const S_IFMT: u16 = 0o170000;
> +    const S_IFSOCK: u16 = 0o140000;
> +    const S_IFLNK: u16 = 0o120000;
> +    const S_IFREG: u16 = 0o100000;
> +    const S_IFBLK: u16 = 0o60000;
> +    const S_IFDIR: u16 = 0o40000;
> +    const S_IFCHR: u16 = 0o20000;
> +    const S_IFIFO: u16 = 0o10000;
> +    const S_ISUID: u16 = 0o4000;
> +    const S_ISGID: u16 = 0o2000;
> +    const S_ISVTX: u16 = 0o1000;
> +    pub(crate) fn ino(&self) -> u32 {
> +        match self {
> +            Self::Extended(extended) => extended.i_ino,
> +            Self::Compact(compact) => compact.i_ino,
> +        }
> +    }
> +
> +    pub(crate) fn format(&self) -> Format {
> +        match self {
> +            Self::Extended(extended) => extended.i_format,
> +            Self::Compact(compact) => compact.i_format,
> +        }
> +    }
> +
> +    pub(crate) fn file_size(&self) -> Off {
> +        match self {
> +            Self::Extended(extended) => extended.i_size,
> +            Self::Compact(compact) => compact.i_size as u64,
> +        }
> +    }
> +
> +    pub(crate) fn inode_size(&self) -> Off {
> +        match self {
> +            Self::Extended(_) => 64,
> +            Self::Compact(_) => 32,

Self::Extended(_) => size_of::<ExtendedInodeInfo>() as Off,
Self::Compact(_) => size_of::<CompactInodeInfo>() as Off,

> +        }
> +    }
> +
> +    pub(crate) fn spec(&self) -> Spec {
> +        let mode = match self {
> +            Self::Extended(extended) => extended.i_mode,
> +            Self::Compact(compact) => compact.i_mode,
> +        };
> +
> +        let u = match self {
> +            Self::Extended(extended) => &extended.i_u,
> +            Self::Compact(compact) => &compact.i_u,
> +        };
> +
> +        match mode & 0o170000 {
> +            0o40000 | 0o100000 | 0o120000 => Spec::from((u, self.format().layout())),

match mode & Self::S_IFMT {
     Self::S_IFDIR | Self::S_IFREG | Self::S_IFLNK => Spec::from((u, 
self.format().layout())),

> +            // We don't support device inodes currently.
> +            _ => Spec::Unknown,
> +        }
> +    }
> +
> +    pub(crate) fn inode_type(&self) -> Type {
> +        let mode = match self {
> +            Self::Extended(extended) => extended.i_mode,
> +            Self::Compact(compact) => compact.i_mode,
> +        };
> +        match mode & Self::S_IFMT {
> +            Self::S_IFDIR => Type::Directory, // Directory
> +            Self::S_IFREG => Type::Regular,   // Regular File
> +            Self::S_IFLNK => Type::Link,      // Symbolic Link
> +            Self::S_IFIFO => Type::Fifo,      // FIFO
> +            Self::S_IFSOCK => Type::Socket,   // Socket
> +            Self::S_IFBLK => Type::Block,     // Block
> +            Self::S_IFCHR => Type::Character, // Character
> +            _ => Type::Unknown,
> +        }
> +    }
> +
> +    pub(crate) fn xattr_size(&self) -> Off {
> +        match self {
> +            Self::Extended(extended) => {

if extended.i_xattr_icount == 0 {
     return 0;
}

to avoid subtract with overflow.

Thanks,
Jianan

> +                size_of::<XAttrSharedEntrySummary>() as Off
> +                    + (size_of::<c_int>() as Off) * (extended.i_xattr_icount as Off - 1)
> +            }
> +            Self::Compact(_) => 0,
> +        }
> +    }
> +
> +    pub(crate) fn xattr_count(&self) -> u16 {
> +        match self {
> +            Self::Extended(extended) => extended.i_xattr_icount,
> +            Self::Compact(compact) => compact.i_xattr_icount,
> +        }
> +    }
> +}
> +
> +pub(crate) type CompactInodeInfoBuf = [u8; size_of::<CompactInodeInfo>()];
> +pub(crate) type ExtendedInodeInfoBuf = [u8; size_of::<ExtendedInodeInfo>()];
> +pub(crate) const DEFAULT_INODE_BUF: ExtendedInodeInfoBuf = [0; size_of::<ExtendedInodeInfo>()];
> --
> 2.46.0
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-16 23:58     ` Gao Xiang
@ 2024-09-19 13:45       ` Benno Lossin
  2024-09-19 15:13         ` Gao Xiang
  0 siblings, 1 reply; 69+ messages in thread
From: Benno Lossin @ 2024-09-19 13:45 UTC (permalink / raw)
  To: Gao Xiang, Gary Guo, Yiyang Wu
  Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML, Al Viro

Hi,

Thanks for the patch series. I think it's great that you want to use
Rust for this filesystem.

On 17.09.24 01:58, Gao Xiang wrote:
> On 2024/9/17 04:01, Gary Guo wrote:
>> Also, it seems that you're building abstractions into EROFS directly
>> without building a generic abstraction. We have been avoiding that. If
>> there's an abstraction that you need and missing, please add that
>> abstraction. In fact, there're a bunch of people trying to add FS
> 
> No, I'd like to try to replace some EROFS C logic first to Rust (by
> using EROFS C API interfaces) and try if Rust is really useful for
> a real in-tree filesystem.  If Rust can improve EROFS security or
> performance (although I'm sceptical on performance), As an EROFS
> maintainer, I'm totally fine to accept EROFS Rust logic landed to
> help the whole filesystem better.

As Gary already said, we have been using a different approach and it has
served us well. Your approach of calling directly into C from the driver
can be used to create a proof of concept, but in our opinion it is not
something that should be put into mainline. That is because calling C
from Rust is rather complicated due to the many nuanced features that
Rust provides (for example the safety requirements of references).
Therefore moving the dangerous parts into a central location is crucial
for making use of all of Rust's advantages inside of your code.

> For Rust VFS abstraction, that is a different and indepenent story,
> Yiyang don't have any bandwidth on this due to his limited time.

This seems a bit weird, you have the bandwidth to write your own
abstractions, but not use the stuff that has already been developed?

I have quickly glanced over the patchset and the abstractions seem
rather immature, not general enough for other filesystems to also take
advantage of them. They also miss safety documentation and are in
general poorly documented.

Additionally, all of the code that I saw is put into the `fs/erofs` and
`rust/erofs_sys` directories. That way people can't directly benefit
from your code, put your general abstractions into the kernel crate.
Soon we will be split the kernel crate, I could imagine that we end up
with an `fs` crate, when that happens, we would put those abstractions
there.

As I don't have the bandwidth to review two different sets of filesystem
abstractions, I can only provide you with feedback if you use the
existing abstractions.

> And I _also_ don't think an incomplete ROFS VFS Rust abstraction
> is useful to Linux community

IIRC Wedson created ROFS VFS abstractions before going for the full
filesystem. So it would definitely be useful for other read-only
filesystems (as well as filesystems that also allow writing, since last
time I checked, they often also support reading).

> (because IMO for generic interface
> design, we need a global vision for all filesystems instead of
> just ROFSes.  No existing user is not an excuse for an incomplete
> abstraction.)

Yes we need a global vision, but if you would use the existing
abstractions, then you would participate in this global vision.

Sorry for repeating this point so many times, but it is *really*
important that we don't have multiple abstractions for the same thing.

> If a reasonble Rust VFS abstraction landed, I think we will switch
> to use that, but as I said, they are completely two stories.

For them to land, there has to be some kind of user. For example, a rust
reference driver, or a new filesystem. For example this one.

---
Cheers,
Benno


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-19 13:45       ` Benno Lossin
@ 2024-09-19 15:13         ` Gao Xiang
  2024-09-19 19:36           ` Benno Lossin
  0 siblings, 1 reply; 69+ messages in thread
From: Gao Xiang @ 2024-09-19 15:13 UTC (permalink / raw)
  To: Benno Lossin, Gary Guo, Yiyang Wu
  Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML, Al Viro,
	Greg Kroah-Hartman

Hi Benno,

On 2024/9/19 21:45, Benno Lossin wrote:
> Hi,
> 
> Thanks for the patch series. I think it's great that you want to use
> Rust for this filesystem.
> 
> On 17.09.24 01:58, Gao Xiang wrote:
>> On 2024/9/17 04:01, Gary Guo wrote:
>>> Also, it seems that you're building abstractions into EROFS directly
>>> without building a generic abstraction. We have been avoiding that. If
>>> there's an abstraction that you need and missing, please add that
>>> abstraction. In fact, there're a bunch of people trying to add FS
>>
>> No, I'd like to try to replace some EROFS C logic first to Rust (by
>> using EROFS C API interfaces) and try if Rust is really useful for
>> a real in-tree filesystem.  If Rust can improve EROFS security or
>> performance (although I'm sceptical on performance), As an EROFS
>> maintainer, I'm totally fine to accept EROFS Rust logic landed to
>> help the whole filesystem better.
> 
> As Gary already said, we have been using a different approach and it has
> served us well. Your approach of calling directly into C from the driver
> can be used to create a proof of concept, but in our opinion it is not
> something that should be put into mainline. That is because calling C
> from Rust is rather complicated due to the many nuanced features that
> Rust provides (for example the safety requirements of references).
> Therefore moving the dangerous parts into a central location is crucial
> for making use of all of Rust's advantages inside of your code.

I'm not quite sure about your point honestly.  In my opinion, there
is nothing different to use Rust _within a filesystem_ or _within a
driver_ or _within a Linux subsystem_ as long as all negotiated APIs
are audited.

Otherwise, it means Rust will never be used to write Linux core parts
such as MM, VFS or block layer. Does this point make sense? At least,
Rust needs to get along with the existing C code (in an audited way)
rather than refuse C code.

My personal idea about Rust: I think Rust is just another _language
tool_ for the Linux kernel which could save us time and make the
kernel development better.

Or I wonder why not writing a complete new Rust stuff instead rather
than living in the C world?

> 
>> For Rust VFS abstraction, that is a different and indepenent story,
>> Yiyang don't have any bandwidth on this due to his limited time.
> 
> This seems a bit weird, you have the bandwidth to write your own
> abstractions, but not use the stuff that has already been developed?

It's not written by me, Yiyang is still an undergraduate tudent.
It's his research project and I don't think it's his responsibility
to make an upstreamable VFS abstraction.

> 
> I have quickly glanced over the patchset and the abstractions seem
> rather immature, not general enough for other filesystems to also take

I don't have enough time to take a full look of this patchset too
due to other ongoing work for now (Rust EROFS is not quite a high
priority stuff for me).

And that's why it's called "RFC PATCH".

> advantage of them. They also miss safety documentation and are in

I don't think it needs to be general enough, since we'd like to use
the new Rust language tool within a subsystem.

So why it needs to take care of other filesystems? Again, I'm not
working on a full VFS abstriction.

Yes, this patchset is not perfect.  But I've asked Yiyang to isolate
all VFS structures as much as possible, but it seems that it still
touches something.

> general poorly documented.

Okay, I think it can be improved then if you give more detailed hints.

> 
> Additionally, all of the code that I saw is put into the `fs/erofs` and
> `rust/erofs_sys` directories. That way people can't directly benefit
> from your code, put your general abstractions into the kernel crate.
> Soon we will be split the kernel crate, I could imagine that we end up
> with an `fs` crate, when that happens, we would put those abstractions
> there.
> 
> As I don't have the bandwidth to review two different sets of filesystem
> abstractions, I can only provide you with feedback if you use the
> existing abstractions.

I think Rust is just a tool, if you could have extra time to review
our work, that would be wonderful!  Many thanks then.

However, if you don't have time to review, IMHO, Rust is just a tool,
I think each subsystem can choose to use Rust in their codebase, or
I'm not sure what's your real point is?

> 
>> And I _also_ don't think an incomplete ROFS VFS Rust abstraction
>> is useful to Linux community
> 
> IIRC Wedson created ROFS VFS abstractions before going for the full
> filesystem. So it would definitely be useful for other read-only
> filesystems (as well as filesystems that also allow writing, since last
> time I checked, they often also support reading).

Leaving aside everything else, an incomplete Rust read-only VFS
abstraction itself is just an unsafe stuff.

> 
>> (because IMO for generic interface
>> design, we need a global vision for all filesystems instead of
>> just ROFSes.  No existing user is not an excuse for an incomplete
>> abstraction.)
> 
> Yes we need a global vision, but if you would use the existing
> abstractions, then you would participate in this global vision.
> 
> Sorry for repeating this point so many times, but it is *really*
> important that we don't have multiple abstractions for the same thing.

I've expressed my viewpoint.

> 
>> If a reasonble Rust VFS abstraction landed, I think we will switch
>> to use that, but as I said, they are completely two stories.
> 
> For them to land, there has to be some kind of user. For example, a rust
> reference driver, or a new filesystem. For example this one.

Without a full proper VFS abstraction, it's just broken and
needs to be refactored.  And that will be painful to all
users then.

=======
In the end,

Other thoughts, comments are helpful here since I wonder how "Rust
-for-Linux" works in the long term, and decide whether I will work
on Kernel Rust or not at least in the short term.

Thanks,
Gao Xiang

> 
> ---
> Cheers,
> Benno
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-19 15:13         ` Gao Xiang
@ 2024-09-19 19:36           ` Benno Lossin
  2024-09-20  0:49             ` Gao Xiang
  2024-09-25 15:48             ` Ariel Miculas
  0 siblings, 2 replies; 69+ messages in thread
From: Benno Lossin @ 2024-09-19 19:36 UTC (permalink / raw)
  To: Gao Xiang, Gary Guo, Yiyang Wu
  Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML, Al Viro,
	Greg Kroah-Hartman

On 19.09.24 17:13, Gao Xiang wrote:
> Hi Benno,
> 
> On 2024/9/19 21:45, Benno Lossin wrote:
>> Hi,
>>
>> Thanks for the patch series. I think it's great that you want to use
>> Rust for this filesystem.
>>
>> On 17.09.24 01:58, Gao Xiang wrote:
>>> On 2024/9/17 04:01, Gary Guo wrote:
>>>> Also, it seems that you're building abstractions into EROFS directly
>>>> without building a generic abstraction. We have been avoiding that. If
>>>> there's an abstraction that you need and missing, please add that
>>>> abstraction. In fact, there're a bunch of people trying to add FS
>>>
>>> No, I'd like to try to replace some EROFS C logic first to Rust (by
>>> using EROFS C API interfaces) and try if Rust is really useful for
>>> a real in-tree filesystem.  If Rust can improve EROFS security or
>>> performance (although I'm sceptical on performance), As an EROFS
>>> maintainer, I'm totally fine to accept EROFS Rust logic landed to
>>> help the whole filesystem better.
>>
>> As Gary already said, we have been using a different approach and it has
>> served us well. Your approach of calling directly into C from the driver
>> can be used to create a proof of concept, but in our opinion it is not
>> something that should be put into mainline. That is because calling C
>> from Rust is rather complicated due to the many nuanced features that
>> Rust provides (for example the safety requirements of references).
>> Therefore moving the dangerous parts into a central location is crucial
>> for making use of all of Rust's advantages inside of your code.
> 
> I'm not quite sure about your point honestly.  In my opinion, there
> is nothing different to use Rust _within a filesystem_ or _within a
> driver_ or _within a Linux subsystem_ as long as all negotiated APIs
> are audited.

To us there is a big difference: If a lot of functions in an API are
`unsafe` without being inherent from the problem that it solves, then
it's a bad API.

> Otherwise, it means Rust will never be used to write Linux core parts
> such as MM, VFS or block layer. Does this point make sense? At least,
> Rust needs to get along with the existing C code (in an audited way)
> rather than refuse C code.

I am neither requiring you to write solely safe code, nor am I banning
interacting with the C side. What we mean when we talk about
abstractions is that we want to minimize the Rust code that directly
interfaces with C. Rust-to-Rust interfaces can be a lot safer and are
easier to implement correctly.

> My personal idea about Rust: I think Rust is just another _language
> tool_ for the Linux kernel which could save us time and make the
> kernel development better.

Yes, but we do have conventions, rules and guidelines for writing such
code. C code also has them. If you want/need to break them, there should
be a good reason to do so. I don't see one in this instance.

> Or I wonder why not writing a complete new Rust stuff instead rather
> than living in the C world?

There are projects that do that yes. But Rust-for-Linux is about
bringing Rust to the kernel and part of that is coming up with good
conventions and rules.

>>> For Rust VFS abstraction, that is a different and indepenent story,
>>> Yiyang don't have any bandwidth on this due to his limited time.
>>
>> This seems a bit weird, you have the bandwidth to write your own
>> abstractions, but not use the stuff that has already been developed?
> 
> It's not written by me, Yiyang is still an undergraduate tudent.
> It's his research project and I don't think it's his responsibility
> to make an upstreamable VFS abstraction.

That is fair, but he wouldn't have to start from scratch, Wedsons
abstractions were good enough for him to write a Rust version of ext2.
In addition, tarfs and puzzlefs also use those bindings.
To me it sounds as if you have not taken the time to try to make it work
with the existing abstractions. Have you tried reaching out to Ariel? He
is working on puzzlefs and might have some insight to give you. Sadly
Wedson has left the project, so someone will have to pick up his work.

I hope that you understand that we can't have two abstractions for the
same C API. It confuses people which to use, some features might only be
available in one version and others only in the other. It would be a
total mess. It's just like the rule for no duplicated drivers that you
have on the C side.

People (mostly Wedson) also have put in a lot of work into making the
VFS abstractions good. Why ignore all of that?

>> I have quickly glanced over the patchset and the abstractions seem
>> rather immature, not general enough for other filesystems to also take
> 
> I don't have enough time to take a full look of this patchset too
> due to other ongoing work for now (Rust EROFS is not quite a high
> priority stuff for me).
> 
> And that's why it's called "RFC PATCH".

Yeah I saw the RFC title. I just wanted to communicate early that I
would not review it if it were a normal patch. In fact, I would advise
against taking the patch, due to the reasons I outlined.

>> advantage of them. They also miss safety documentation and are in
> 
> I don't think it needs to be general enough, since we'd like to use
> the new Rust language tool within a subsystem.
> 
> So why it needs to take care of other filesystems? Again, I'm not
> working on a full VFS abstriction.

And that's OK, feel free to just pick the parts of the existing VFS that
you need and extend as you (or your student) see fit. What you said
yourself is that we need a global vision for VFS abstractions. If you
only use a subset of them, then you only care about that subset, other
people can extend it if they need. If everyone would roll their own
abstractions without communicating, then how would we create a global
vision?

> Yes, this patchset is not perfect.  But I've asked Yiyang to isolate
> all VFS structures as much as possible, but it seems that it still
> touches something.

It would already be a big improvement to put the VFS structures into the
kernel crate. Because then everyone can benefit from your work.

>> general poorly documented.
> 
> Okay, I think it can be improved then if you give more detailed hints.
> 
>>
>> Additionally, all of the code that I saw is put into the `fs/erofs` and
>> `rust/erofs_sys` directories. That way people can't directly benefit
>> from your code, put your general abstractions into the kernel crate.
>> Soon we will be split the kernel crate, I could imagine that we end up
>> with an `fs` crate, when that happens, we would put those abstractions
>> there.
>>
>> As I don't have the bandwidth to review two different sets of filesystem
>> abstractions, I can only provide you with feedback if you use the
>> existing abstractions.
> 
> I think Rust is just a tool, if you could have extra time to review
> our work, that would be wonderful!  Many thanks then.
> 
> However, if you don't have time to review, IMHO, Rust is just a tool,
> I think each subsystem can choose to use Rust in their codebase, or
> I'm not sure what's your real point is?

I don't want to prevent or discourage you from using Rust in the kernel.
In fact, I can't prevent you from putting this in, since after all you
are the maintainer.
What I can do, is advise against not using abstractions. That has been
our philosophy since very early on. They are the reason that you can
write PHY drivers without any `unsafe` code whatsoever *today*. I think
that is an impressive feat and our recipe for success.

We even have this in our documentation:
https://docs.kernel.org/rust/general-information.html#abstractions-vs-bindings

My real point is that I want Rust to succeed in the kernel. I strongly
believe that good abstractions (in the sense that you can do as much as
possible using only safe Rust) are a crucial factor.
I and others from the RfL team can help you if you (or your student)
have any Rust related questions for the abstractions. Feel free to reach
out.


Maybe Miguel can say more on this matter, since he was at the
maintainers summit, but our takeaways essentially are that we want
maintainers to experiment with Rust. And if you don't have any real
users, then breaking the Rust code is fine.
Though I think that with breaking we mean that changes to the C side
prevent the Rust side from working, not shipping Rust code without
abstractions.

We might be able to make an exception to the "your driver can only use
abstractions" rule, but only with the promise that the subsystem is
working towards creating suitable abstractions and replacing the direct
C accesses with that.

I personally think that we should not make that the norm, instead try to
create the minimal abstraction and minimal driver (without directly
calling C) that you need to start. Of course this might not work, the
"minimal driver" might need to be rather complex for you to start, but I
don't know your subsystem to make that judgement.

>>> And I _also_ don't think an incomplete ROFS VFS Rust abstraction
>>> is useful to Linux community
>>
>> IIRC Wedson created ROFS VFS abstractions before going for the full
>> filesystem. So it would definitely be useful for other read-only
>> filesystems (as well as filesystems that also allow writing, since last
>> time I checked, they often also support reading).
> 
> Leaving aside everything else, an incomplete Rust read-only VFS
> abstraction itself is just an unsafe stuff.

I don't understand what you want to say.

>>> (because IMO for generic interface
>>> design, we need a global vision for all filesystems instead of
>>> just ROFSes.  No existing user is not an excuse for an incomplete
>>> abstraction.)
>>
>> Yes we need a global vision, but if you would use the existing
>> abstractions, then you would participate in this global vision.
>>
>> Sorry for repeating this point so many times, but it is *really*
>> important that we don't have multiple abstractions for the same thing.
> 
> I've expressed my viewpoint.
> 
>>
>>> If a reasonble Rust VFS abstraction landed, I think we will switch
>>> to use that, but as I said, they are completely two stories.
>>
>> For them to land, there has to be some kind of user. For example, a rust
>> reference driver, or a new filesystem. For example this one.
> 
> Without a full proper VFS abstraction, it's just broken and
> needs to be refactored.  And that will be painful to all
> users then.

I also don't understand your point here. What is broken, this EROFS
implementation? Why will it be painful to refactor?

> =======
> In the end,
> 
> Other thoughts, comments are helpful here since I wonder how "Rust
> -for-Linux" works in the long term, and decide whether I will work
> on Kernel Rust or not at least in the short term.

The longterm goal is to make everything that is possible in C, possible
in Rust. For more info, please take a look at the kernel summit talk by
Miguel Ojeda.
However, we can only reach that longterm goal if maintainers are willing
and ready to put Rust into their subsystems (either by knowing/learning
Rust themselves or by having a co-maintainer that does just the Rust
part). So you wanting to experiment is great. I appreciate that you also
have a student working on this. Still, I think we should follow our
guidelines and create abstractions in order to require as little
`unsafe` code as possible.

---
Cheers,
Benno


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-19 19:36           ` Benno Lossin
@ 2024-09-20  0:49             ` Gao Xiang
  2024-09-21  8:37               ` Greg Kroah-Hartman
  2024-09-25 15:48             ` Ariel Miculas
  1 sibling, 1 reply; 69+ messages in thread
From: Gao Xiang @ 2024-09-20  0:49 UTC (permalink / raw)
  To: Benno Lossin, Gary Guo, Yiyang Wu
  Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML, Al Viro,
	Greg Kroah-Hartman



On 2024/9/20 03:36, Benno Lossin wrote:
> On 19.09.24 17:13, Gao Xiang wrote:
>> Hi Benno,
>>
>> On 2024/9/19 21:45, Benno Lossin wrote:
>>> Hi,
>>>
>>> Thanks for the patch series. I think it's great that you want to use
>>> Rust for this filesystem.
>>>
>>> On 17.09.24 01:58, Gao Xiang wrote:
>>>> On 2024/9/17 04:01, Gary Guo wrote:
>>>>> Also, it seems that you're building abstractions into EROFS directly
>>>>> without building a generic abstraction. We have been avoiding that. If
>>>>> there's an abstraction that you need and missing, please add that
>>>>> abstraction. In fact, there're a bunch of people trying to add FS
>>>>
>>>> No, I'd like to try to replace some EROFS C logic first to Rust (by
>>>> using EROFS C API interfaces) and try if Rust is really useful for
>>>> a real in-tree filesystem.  If Rust can improve EROFS security or
>>>> performance (although I'm sceptical on performance), As an EROFS
>>>> maintainer, I'm totally fine to accept EROFS Rust logic landed to
>>>> help the whole filesystem better.
>>>
>>> As Gary already said, we have been using a different approach and it has
>>> served us well. Your approach of calling directly into C from the driver
>>> can be used to create a proof of concept, but in our opinion it is not
>>> something that should be put into mainline. That is because calling C
>>> from Rust is rather complicated due to the many nuanced features that
>>> Rust provides (for example the safety requirements of references).
>>> Therefore moving the dangerous parts into a central location is crucial
>>> for making use of all of Rust's advantages inside of your code.
>>
>> I'm not quite sure about your point honestly.  In my opinion, there
>> is nothing different to use Rust _within a filesystem_ or _within a
>> driver_ or _within a Linux subsystem_ as long as all negotiated APIs
>> are audited.
> 
> To us there is a big difference: If a lot of functions in an API are
> `unsafe` without being inherent from the problem that it solves, then
> it's a bad API.

Which one? If you point it out, we will update the EROFS kernel
APIs then.

> 
>> Otherwise, it means Rust will never be used to write Linux core parts
>> such as MM, VFS or block layer. Does this point make sense? At least,
>> Rust needs to get along with the existing C code (in an audited way)
>> rather than refuse C code.
> 
> I am neither requiring you to write solely safe code, nor am I banning
> interacting with the C side. What we mean when we talk about
> abstractions is that we want to minimize the Rust code that directly
> interfaces with C. Rust-to-Rust interfaces can be a lot safer and are

We will definitly minimize the API interface between Rust and C in
EROFS.

And it can be done incrementally, why not?  I assume your world is
not pure C and pure Rust as for the Rust for Linux project, no?

> easier to implement correctly.
> 
>> My personal idea about Rust: I think Rust is just another _language
>> tool_ for the Linux kernel which could save us time and make the
>> kernel development better.
> 
> Yes, but we do have conventions, rules and guidelines for writing such
> code. C code also has them. If you want/need to break them, there should
> be a good reason to do so. I don't see one in this instance.
> >> Or I wonder why not writing a complete new Rust stuff instead rather
>> than living in the C world?
> 
> There are projects that do that yes. But Rust-for-Linux is about
> bringing Rust to the kernel and part of that is coming up with good
> conventions and rules.

Which rule is broken?  Was they discussed widely around the
Linux world?

> 
>>>> For Rust VFS abstraction, that is a different and indepenent story,
>>>> Yiyang don't have any bandwidth on this due to his limited time.
>>>
>>> This seems a bit weird, you have the bandwidth to write your own
>>> abstractions, but not use the stuff that has already been developed?
>>
>> It's not written by me, Yiyang is still an undergraduate tudent.
>> It's his research project and I don't think it's his responsibility
>> to make an upstreamable VFS abstraction.
> 
> That is fair, but he wouldn't have to start from scratch, Wedsons
> abstractions were good enough for him to write a Rust version of ext2.

The Wedson one is just broken, I assume that you've read
https://lwn.net/Articles/978738/ ?

The initial Linux VFS C version is already for generic
read-write use.

> In addition, tarfs and puzzlefs also use those bindings.

These are both toy fses, I don't know who will use these two
fses for their customers.

> To me it sounds as if you have not taken the time to try to make it work
> with the existing abstractions. Have you tried reaching out to Ariel? He
> is working on puzzlefs and might have some insight to give you. Sadly

IMHO, puzzlefs is another Rust incomplete clone of EROFS, I
could tell him what EROFS currently do.

I'm very happy to collaborate with him to work on his use
cases (and tell him why EROFS can already be used for his
use cases), just
like the previous ComposeFS discussion.

There are enough FS projects which reinvents in-tree fses without
enough good reasons (for example, performance or design): ZUFS,
FamFS, ComposeFS.

Tarfs (here tar is not the real tar format), and Puzzlefs are
two special one just because they are written in Rust.  But
other than that they are just incomplete approach to EROFS.

I do hope Ariel could attend LSF/MM/BPF to discuss his use
cases with filesystem developpers.  And I'm very happy to
collaborate with him .

> Wedson has left the project, so someone will have to pick up his work.

It's not necessary to be Yiyang, since he's interested in
EROFS only.

> 
> I hope that you understand that we can't have two abstractions for the
> same C API. It confuses people which to use, some features might only be
> available in one version and others only in the other. It would be a
> total mess. It's just like the rule for no duplicated drivers that you
> have on the C side.
> 
> People (mostly Wedson) also have put in a lot of work into making the
> VFS abstractions good. Why ignore all of that?

How good? TBH, I think there could be something left eventually,
but the current prososed Rust VFS abstraction is just broken.

I don't think the current abstraction is of any use to be
upstreamed, at least, it should be driven with a generic
read-write filesystem, and resolve lifetime issues during
development (for example, just like Al mentioned d_name and
d_parent, etc.)

Because the initial Linux VFS C version is completely out of
minix fs, rather than an incomplete broken one just for some
toy (I don't know how to express more accurately, since each
upstream filesystem should have strong use cases and users,
but tarfs and puzzlefs are both not.)

Otherwise, all the broken Rust VFS users will be painful for
bugs and endless API refactering.

> 
>>> I have quickly glanced over the patchset and the abstractions seem
>>> rather immature, not general enough for other filesystems to also take
>>
>> I don't have enough time to take a full look of this patchset too
>> due to other ongoing work for now (Rust EROFS is not quite a high
>> priority stuff for me).
>>
>> And that's why it's called "RFC PATCH".
> 
> Yeah I saw the RFC title. I just wanted to communicate early that I
> would not review it if it were a normal patch. In fact, I would advise
> against taking the patch, due to the reasons I outlined.

You reason currently is still not valid.  IMO, again, Rust
is just a tool, you cannot forbid a real Linux subsystem to
use Rust as an experiment.

Or what's your real point?  An Rust gatekeeper?

> 
>>> advantage of them. They also miss safety documentation and are in
>>
>> I don't think it needs to be general enough, since we'd like to use
>> the new Rust language tool within a subsystem.
>>
>> So why it needs to take care of other filesystems? Again, I'm not
>> working on a full VFS abstriction.
> 
> And that's OK, feel free to just pick the parts of the existing VFS that
> you need and extend as you (or your student) see fit. What you said
> yourself is that we need a global vision for VFS abstractions. If you
> only use a subset of them, then you only care about that subset, other
> people can extend it if they need. If everyone would roll their own
> abstractions without communicating, then how would we create a global
> vision?

No.  We don't roll our own abstraction, instead we define a
clear C <-> Rust boundary of EROFS APIs, just like
"fs/xfs/libxfs" if you could take a look.

> 
>> Yes, this patchset is not perfect.  But I've asked Yiyang to isolate
>> all VFS structures as much as possible, but it seems that it still
>> touches something.
> 
> It would already be a big improvement to put the VFS structures into the
> kernel crate. Because then everyone can benefit from your work.

Again, that is not Yiyang's interest.  Which is just like to sell
something you don't want, I don't think it's reasonable.

> 
>>> general poorly documented.
>>
>> Okay, I think it can be improved then if you give more detailed hints.
>>
>>>
>>> Additionally, all of the code that I saw is put into the `fs/erofs` and
>>> `rust/erofs_sys` directories. That way people can't directly benefit
>>> from your code, put your general abstractions into the kernel crate.
>>> Soon we will be split the kernel crate, I could imagine that we end up
>>> with an `fs` crate, when that happens, we would put those abstractions
>>> there.
>>>
>>> As I don't have the bandwidth to review two different sets of filesystem
>>> abstractions, I can only provide you with feedback if you use the
>>> existing abstractions.
>>
>> I think Rust is just a tool, if you could have extra time to review
>> our work, that would be wonderful!  Many thanks then.
>>
>> However, if you don't have time to review, IMHO, Rust is just a tool,
>> I think each subsystem can choose to use Rust in their codebase, or
>> I'm not sure what's your real point is?
> 
> I don't want to prevent or discourage you from using Rust in the kernel.
> In fact, I can't prevent you from putting this in, since after all you
> are the maintainer.

I do think you're discouraging anyone to use Rust in their codebase,
because I've said we _will_ form a good abstraction in our codebase.

But you're just selling another stuff forcely.

> What I can do, is advise against not using abstractions. That has been
> our philosophy since very early on. They are the reason that you can
> write PHY drivers without any `unsafe` code whatsoever *today*. I think

I don't think filesystems are comparable to some PHY drivers.  If you
take ext4, it's more than 65000 line, and XFS, that is almost 78000.

I think filesystems can have a way to be reimplmented in Rust
incrementally, rather than purely black and write world.

> that is an impressive feat and our recipe for success.
> 
> We even have this in our documentation:
> https://docs.kernel.org/rust/general-information.html#abstractions-vs-bindings
> 
> My real point is that I want Rust to succeed in the kernel. I strongly
> believe that good abstractions (in the sense that you can do as much as
> possible using only safe Rust) are a crucial factor.

On my side, you are just isolating any useful subsystem to try
to use Rust and

“Leaf” modules (e.g. drivers) should not use the C bindings directly

is unreasonable for filesystems because it cannot be done in one
shot.

In addition, there are a lot ongoing features on both C and Rust
side, you need at least a fallback to make end users happy with a
unique feature view rather than just return "it's broken".

Users don't care C or Rust (they only care full functionality)
but only developers care.  And mixing up two different things
(VFS abstraction and use Rust in the codebase) is not good to
RFL success.

IMHO, this is "Rust for Linux" not "Linux for Rust".

> I and others from the RfL team can help you if you (or your student)
> have any Rust related questions for the abstractions. Feel free to reach
> out.
> 
> 
> Maybe Miguel can say more on this matter, since he was at the
> maintainers summit, but our takeaways essentially are that we want
> maintainers to experiment with Rust. And if you don't have any real
> users, then breaking the Rust code is fine.
> Though I think that with breaking we mean that changes to the C side
> prevent the Rust side from working, not shipping Rust code without
> abstractions.

I think you're still mixing them up.

> 
> We might be able to make an exception to the "your driver can only use
> abstractions" rule, but only with the promise that the subsystem is
> working towards creating suitable abstractions and replacing the direct
> C accesses with that.

I don't think those rules are reasonable for RFL success, honestly.

You are artificially isolating the Linux C and Rust world, not from
Linux users or Linux ecosystem perspective, but only from some
developer perspersive.

Good luck, anyway.

> 
> I personally think that we should not make that the norm, instead try to
> create the minimal abstraction and minimal driver (without directly
> calling C) that you need to start. Of course this might not work, the
> "minimal driver" might need to be rather complex for you to start, but I
> don't know your subsystem to make that judgement.

...

>>
>> Without a full proper VFS abstraction, it's just broken and
>> needs to be refactored.  And that will be painful to all
>> users then.
> 
> I also don't understand your point here. What is broken, this EROFS
> implementation? Why will it be painful to refactor?

I've said earlier.

> 
>> =======
>> In the end,
>>
>> Other thoughts, comments are helpful here since I wonder how "Rust
>> -for-Linux" works in the long term, and decide whether I will work
>> on Kernel Rust or not at least in the short term.
> 
> The longterm goal is to make everything that is possible in C, possible
> in Rust. For more info, please take a look at the kernel summit talk by

But you're disallowing Rust in the codebase.

> Miguel Ojeda.
> However, we can only reach that longterm goal if maintainers are willing
> and ready to put Rust into their subsystems (either by knowing/learning
> Rust themselves or by having a co-maintainer that does just the Rust
> part). So you wanting to experiment is great. I appreciate that you also
> have a student working on this. Still, I think we should follow our
> guidelines and create abstractions in order to require as little
> `unsafe` code as possible.

I've expressed my point.  I don't think some `guideline`
could bring success to RFL.  Since many subsystems needs
an incremental way, not just a black-or-white thing.

Thanks,
Gao Xiang

> 
> ---
> Cheers,
> Benno


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH RESEND 0/1] rust: introduce declare_err! autogeneration
  2024-09-16 17:51   ` Greg KH
  2024-09-16 23:45     ` Gao Xiang
@ 2024-09-20  2:49     ` Yiyang Wu
  2024-09-20  2:49       ` [PATCH RESEND 1/1] rust: error: auto-generate error declarations Yiyang Wu
  2024-09-20  2:57     ` [RFC PATCH 03/24] erofs: add Errno in Rust Yiyang Wu
  2 siblings, 1 reply; 69+ messages in thread
From: Yiyang Wu @ 2024-09-20  2:49 UTC (permalink / raw)
  To: rust-for-linux; +Cc: gregkh, xiang, gary, linux-erofs, linux-fsdevel, LKML

Currently, the error.rs's errno import is done manually by copying the
comments from include/linux/errno.h and uses the declare_err! to wrap
the constant errno.

However, this reduces the readability and increases difficulty of
maintaining the error.rs if the errno list is growing too long or
or if errno.h gets updated for new semantics.

This patchset solves this issues for good by introducing a rule
to generate errno_generated.rs by seding the errno.h and
including the generated file in the error.rs.

This patch is based on the rust-next branch.

Yiyang Wu (1):
  rust: error: auto-generate error declarations

 rust/.gitignore      |  1 +
 rust/Makefile        | 14 ++++++++++-
 rust/kernel/error.rs | 58 +++-----------------------------------------
 3 files changed, 18 insertions(+), 55 deletions(-)

-- 
2.46.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH RESEND 1/1] rust: error: auto-generate error declarations
  2024-09-20  2:49     ` [PATCH RESEND 0/1] rust: introduce declare_err! autogeneration Yiyang Wu
@ 2024-09-20  2:49       ` Yiyang Wu
  0 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-20  2:49 UTC (permalink / raw)
  To: rust-for-linux; +Cc: gregkh, xiang, gary, linux-erofs, linux-fsdevel, LKML

This patch adds a new cmd_errno to convert the include/linux/errno.h
content into declare_err! macros for better maintainability and readability.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 rust/.gitignore      |  1 +
 rust/Makefile        | 14 ++++++++++-
 rust/kernel/error.rs | 58 +++-----------------------------------------
 3 files changed, 18 insertions(+), 55 deletions(-)

diff --git a/rust/.gitignore b/rust/.gitignore
index d3829ffab80b..ba71ef4a9239 100644
--- a/rust/.gitignore
+++ b/rust/.gitignore
@@ -5,6 +5,7 @@ bindings_helpers_generated.rs
 doctests_kernel_generated.rs
 doctests_kernel_generated_kunit.c
 uapi_generated.rs
+errno_generated.rs
 exports_*_generated.h
 doc/
 test/
diff --git a/rust/Makefile b/rust/Makefile
index dd76dc27d666..f5a1680fe59c 100644
--- a/rust/Makefile
+++ b/rust/Makefile
@@ -22,6 +22,8 @@ always-$(CONFIG_RUST) += exports_alloc_generated.h exports_helpers_generated.h \
 always-$(CONFIG_RUST) += uapi/uapi_generated.rs
 obj-$(CONFIG_RUST) += uapi.o
 
+always-$(CONFIG_RUST) += kernel/errno_generated.rs
+
 ifdef CONFIG_RUST_BUILD_ASSERT_ALLOW
 obj-$(CONFIG_RUST) += build_error.o
 else
@@ -289,6 +291,15 @@ $(obj)/uapi/uapi_generated.rs: $(src)/uapi/uapi_helper.h \
     $(src)/bindgen_parameters FORCE
 	$(call if_changed_dep,bindgen)
 
+quiet_cmd_errno = EXPORTS $@
+      cmd_errno = \
+	$(CC) $(c_flags) -E -CC -dD $< \
+	| sed -E 's/\#define\s*([A-Z0-9]+)\s+([0-9]+)\s+\/\*\s*(.*)\s\*\//declare_err!(\1, "\3.");/' \
+	| grep -E '^declare_err.*$$' > $@
+
+$(obj)/kernel/errno_generated.rs: $(srctree)/include/linux/errno.h FORCE
+	$(call if_changed,errno)
+
 # See `CFLAGS_REMOVE_helpers.o` above. In addition, Clang on C does not warn
 # with `-Wmissing-declarations` (unlike GCC), so it is not strictly needed here
 # given it is `libclang`; but for consistency, future Clang changes and/or
@@ -420,7 +431,8 @@ $(obj)/uapi.o: $(src)/uapi/lib.rs \
 
 $(obj)/kernel.o: private rustc_target_flags = --extern alloc \
     --extern build_error --extern macros --extern bindings --extern uapi
-$(obj)/kernel.o: $(src)/kernel/lib.rs $(obj)/alloc.o $(obj)/build_error.o \
+$(obj)/kernel.o: $(src)/kernel/lib.rs $(obj)/kernel/errno_generated.rs \
+    $(obj)/alloc.o $(obj)/build_error.o \
     $(obj)/libmacros.so $(obj)/bindings.o $(obj)/uapi.o FORCE
 	+$(call if_changed_rule,rustc_library)
 
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 6f1587a2524e..bb16b40a8d19 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -23,60 +23,10 @@ macro_rules! declare_err {
             pub const $err: super::Error = super::Error(-(crate::bindings::$err as i32));
         };
     }
-
-    declare_err!(EPERM, "Operation not permitted.");
-    declare_err!(ENOENT, "No such file or directory.");
-    declare_err!(ESRCH, "No such process.");
-    declare_err!(EINTR, "Interrupted system call.");
-    declare_err!(EIO, "I/O error.");
-    declare_err!(ENXIO, "No such device or address.");
-    declare_err!(E2BIG, "Argument list too long.");
-    declare_err!(ENOEXEC, "Exec format error.");
-    declare_err!(EBADF, "Bad file number.");
-    declare_err!(ECHILD, "No child processes.");
-    declare_err!(EAGAIN, "Try again.");
-    declare_err!(ENOMEM, "Out of memory.");
-    declare_err!(EACCES, "Permission denied.");
-    declare_err!(EFAULT, "Bad address.");
-    declare_err!(ENOTBLK, "Block device required.");
-    declare_err!(EBUSY, "Device or resource busy.");
-    declare_err!(EEXIST, "File exists.");
-    declare_err!(EXDEV, "Cross-device link.");
-    declare_err!(ENODEV, "No such device.");
-    declare_err!(ENOTDIR, "Not a directory.");
-    declare_err!(EISDIR, "Is a directory.");
-    declare_err!(EINVAL, "Invalid argument.");
-    declare_err!(ENFILE, "File table overflow.");
-    declare_err!(EMFILE, "Too many open files.");
-    declare_err!(ENOTTY, "Not a typewriter.");
-    declare_err!(ETXTBSY, "Text file busy.");
-    declare_err!(EFBIG, "File too large.");
-    declare_err!(ENOSPC, "No space left on device.");
-    declare_err!(ESPIPE, "Illegal seek.");
-    declare_err!(EROFS, "Read-only file system.");
-    declare_err!(EMLINK, "Too many links.");
-    declare_err!(EPIPE, "Broken pipe.");
-    declare_err!(EDOM, "Math argument out of domain of func.");
-    declare_err!(ERANGE, "Math result not representable.");
-    declare_err!(ERESTARTSYS, "Restart the system call.");
-    declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal and will be restarted.");
-    declare_err!(ERESTARTNOHAND, "Restart if no handler.");
-    declare_err!(ENOIOCTLCMD, "No ioctl command.");
-    declare_err!(ERESTART_RESTARTBLOCK, "Restart by calling sys_restart_syscall.");
-    declare_err!(EPROBE_DEFER, "Driver requests probe retry.");
-    declare_err!(EOPENSTALE, "Open found a stale dentry.");
-    declare_err!(ENOPARAM, "Parameter not supported.");
-    declare_err!(EBADHANDLE, "Illegal NFS file handle.");
-    declare_err!(ENOTSYNC, "Update synchronization mismatch.");
-    declare_err!(EBADCOOKIE, "Cookie is stale.");
-    declare_err!(ENOTSUPP, "Operation is not supported.");
-    declare_err!(ETOOSMALL, "Buffer or request is too small.");
-    declare_err!(ESERVERFAULT, "An untranslatable error occurred.");
-    declare_err!(EBADTYPE, "Type not supported by server.");
-    declare_err!(EJUKEBOX, "Request initiated, but will not complete before timeout.");
-    declare_err!(EIOCBQUEUED, "iocb queued, will get completion event.");
-    declare_err!(ERECALLCONFLICT, "Conflict with recalled state.");
-    declare_err!(ENOGRACE, "NFS file lock reclaim refused.");
+    include!(concat!(
+        env!("OBJTREE"),
+        "/rust/kernel/errno_generated.rs"
+    ));
 }
 
 /// Generic integer kernel error.
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-16 17:51   ` Greg KH
  2024-09-16 23:45     ` Gao Xiang
  2024-09-20  2:49     ` [PATCH RESEND 0/1] rust: introduce declare_err! autogeneration Yiyang Wu
@ 2024-09-20  2:57     ` Yiyang Wu
  2 siblings, 0 replies; 69+ messages in thread
From: Yiyang Wu @ 2024-09-20  2:57 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-erofs, rust-for-linux, linux-fsdevel, LKML

On Mon, Sep 16, 2024 at 07:51:40PM GMT, Greg KH wrote:
> On Mon, Sep 16, 2024 at 09:56:13PM +0800, Yiyang Wu wrote:
> > Introduce Errno to Rust side code. Note that in current Rust For Linux,
> > Errnos are implemented as core::ffi::c_uint unit structs.
> > However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.
> > 
> > Since the errno_base hasn't changed for over 13 years,
> > This patch merely serves as a temporary workaround for the missing
> > errno in the Rust For Linux.
> 
> Why not just add the missing errno to the core rust code instead?  No
> need to define a whole new one for this.
> 
> thanks,
> 
> greg k-h

I have added all the missing errnos by autogenerating declare_err!
in the preceding patches. Please check :)

Best Regards,

Yiyang Wu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-20  0:49             ` Gao Xiang
@ 2024-09-21  8:37               ` Greg Kroah-Hartman
  2024-09-21  9:29                 ` Gao Xiang
  0 siblings, 1 reply; 69+ messages in thread
From: Greg Kroah-Hartman @ 2024-09-21  8:37 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Benno Lossin, Gary Guo, Yiyang Wu, linux-erofs, rust-for-linux,
	linux-fsdevel, LKML, Al Viro

On Fri, Sep 20, 2024 at 08:49:26AM +0800, Gao Xiang wrote:
> 
> 
> On 2024/9/20 03:36, Benno Lossin wrote:
> > On 19.09.24 17:13, Gao Xiang wrote:
> > > Hi Benno,
> > > 
> > > On 2024/9/19 21:45, Benno Lossin wrote:
> > > > Hi,
> > > > 
> > > > Thanks for the patch series. I think it's great that you want to use
> > > > Rust for this filesystem.
> > > > 
> > > > On 17.09.24 01:58, Gao Xiang wrote:
> > > > > On 2024/9/17 04:01, Gary Guo wrote:
> > > > > > Also, it seems that you're building abstractions into EROFS directly
> > > > > > without building a generic abstraction. We have been avoiding that. If
> > > > > > there's an abstraction that you need and missing, please add that
> > > > > > abstraction. In fact, there're a bunch of people trying to add FS
> > > > > 
> > > > > No, I'd like to try to replace some EROFS C logic first to Rust (by
> > > > > using EROFS C API interfaces) and try if Rust is really useful for
> > > > > a real in-tree filesystem.  If Rust can improve EROFS security or
> > > > > performance (although I'm sceptical on performance), As an EROFS
> > > > > maintainer, I'm totally fine to accept EROFS Rust logic landed to
> > > > > help the whole filesystem better.
> > > > 
> > > > As Gary already said, we have been using a different approach and it has
> > > > served us well. Your approach of calling directly into C from the driver
> > > > can be used to create a proof of concept, but in our opinion it is not
> > > > something that should be put into mainline. That is because calling C
> > > > from Rust is rather complicated due to the many nuanced features that
> > > > Rust provides (for example the safety requirements of references).
> > > > Therefore moving the dangerous parts into a central location is crucial
> > > > for making use of all of Rust's advantages inside of your code.
> > > 
> > > I'm not quite sure about your point honestly.  In my opinion, there
> > > is nothing different to use Rust _within a filesystem_ or _within a
> > > driver_ or _within a Linux subsystem_ as long as all negotiated APIs
> > > are audited.
> > 
> > To us there is a big difference: If a lot of functions in an API are
> > `unsafe` without being inherent from the problem that it solves, then
> > it's a bad API.
> 
> Which one? If you point it out, we will update the EROFS kernel
> APIs then.
> 
> > 
> > > Otherwise, it means Rust will never be used to write Linux core parts
> > > such as MM, VFS or block layer. Does this point make sense? At least,
> > > Rust needs to get along with the existing C code (in an audited way)
> > > rather than refuse C code.
> > 
> > I am neither requiring you to write solely safe code, nor am I banning
> > interacting with the C side. What we mean when we talk about
> > abstractions is that we want to minimize the Rust code that directly
> > interfaces with C. Rust-to-Rust interfaces can be a lot safer and are
> 
> We will definitly minimize the API interface between Rust and C in
> EROFS.
> 
> And it can be done incrementally, why not?  I assume your world is
> not pure C and pure Rust as for the Rust for Linux project, no?
> 
> > easier to implement correctly.
> > 
> > > My personal idea about Rust: I think Rust is just another _language
> > > tool_ for the Linux kernel which could save us time and make the
> > > kernel development better.
> > 
> > Yes, but we do have conventions, rules and guidelines for writing such
> > code. C code also has them. If you want/need to break them, there should
> > be a good reason to do so. I don't see one in this instance.
> > >> Or I wonder why not writing a complete new Rust stuff instead rather
> > > than living in the C world?
> > 
> > There are projects that do that yes. But Rust-for-Linux is about
> > bringing Rust to the kernel and part of that is coming up with good
> > conventions and rules.
> 
> Which rule is broken?  Was they discussed widely around the
> Linux world?
> 
> > 
> > > > > For Rust VFS abstraction, that is a different and indepenent story,
> > > > > Yiyang don't have any bandwidth on this due to his limited time.
> > > > 
> > > > This seems a bit weird, you have the bandwidth to write your own
> > > > abstractions, but not use the stuff that has already been developed?
> > > 
> > > It's not written by me, Yiyang is still an undergraduate tudent.
> > > It's his research project and I don't think it's his responsibility
> > > to make an upstreamable VFS abstraction.
> > 
> > That is fair, but he wouldn't have to start from scratch, Wedsons
> > abstractions were good enough for him to write a Rust version of ext2.
> 
> The Wedson one is just broken, I assume that you've read
> https://lwn.net/Articles/978738/ ?

Yes, and if you see the patches on linux-fsdevel, people are working to
get these vfs bindings correct for any filesystem to use.  Please review
them and see if they will work for you for erofs, as "burying" the
binding in just one filesystem is not a good idea.

> > In addition, tarfs and puzzlefs also use those bindings.
> 
> These are both toy fses, I don't know who will use these two
> fses for their customers.

tarfs is being used by real users as it solves a need they have today.
And it's a good example of how the vfs bindings would work, although
in a very simple way.  You have to start somewhere :)

> > Miguel Ojeda.
> > However, we can only reach that longterm goal if maintainers are willing
> > and ready to put Rust into their subsystems (either by knowing/learning
> > Rust themselves or by having a co-maintainer that does just the Rust
> > part). So you wanting to experiment is great. I appreciate that you also
> > have a student working on this. Still, I think we should follow our
> > guidelines and create abstractions in order to require as little
> > `unsafe` code as possible.
> 
> I've expressed my point.  I don't think some `guideline`
> could bring success to RFL.  Since many subsystems needs
> an incremental way, not just a black-or-white thing.

Incremental is good, and if you want to use a .rs file or two in the
middle of your module, that's fine.  But please don't try to implement
bindings to common kernel data structures like inodes and dentries and
superblocks if at all possible and ignore the work that others are doing
in this area as that's just duplicated work and will cause more
confusion over time.

It's the same for drivers, I will object strongly if someone attempted
to create a USB binding for 'struct usb_interface' in the middle of a
USB driver and instead insist they work on a generic binding that can be
used by all USB drivers.  I imagine the VFS maintainers have the same
opinion on their apis as well for valid reasons.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-21  8:37               ` Greg Kroah-Hartman
@ 2024-09-21  9:29                 ` Gao Xiang
  0 siblings, 0 replies; 69+ messages in thread
From: Gao Xiang @ 2024-09-21  9:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Benno Lossin, Gary Guo, Yiyang Wu, linux-erofs, rust-for-linux,
	linux-fsdevel, LKML, Al Viro

Hi Greg,

On 2024/9/21 16:37, Greg Kroah-Hartman wrote:
> On Fri, Sep 20, 2024 at 08:49:26AM +0800, Gao Xiang wrote:
>>
>>

...

>>
>>>
>>>>>> For Rust VFS abstraction, that is a different and indepenent story,
>>>>>> Yiyang don't have any bandwidth on this due to his limited time.
>>>>>
>>>>> This seems a bit weird, you have the bandwidth to write your own
>>>>> abstractions, but not use the stuff that has already been developed?
>>>>
>>>> It's not written by me, Yiyang is still an undergraduate tudent.
>>>> It's his research project and I don't think it's his responsibility
>>>> to make an upstreamable VFS abstraction.
>>>
>>> That is fair, but he wouldn't have to start from scratch, Wedsons
>>> abstractions were good enough for him to write a Rust version of ext2.
>>
>> The Wedson one is just broken, I assume that you've read
>> https://lwn.net/Articles/978738/ ?
> 
> Yes, and if you see the patches on linux-fsdevel, people are working to
> get these vfs bindings correct for any filesystem to use.  Please review
> them and see if they will work for you for erofs, as "burying" the
> binding in just one filesystem is not a good idea.

Thanks for the reply!

I do think the first Rust filesystem should be ext2 or other
simple read-write fses due to many VFS member lifetime
concerns as other filesystem developpers suggested before [1],
otherwise honestly the VFS abstraction will be refactoredagain
and again just due to limited vision and broken functionality,
I do think which way is not how currently new C Linux kernel
APIs are proposed too (e.g. carefully review all potential use
cases).

[1] https://lore.kernel.org/linux-fsdevel/ZZ3GeehAw%2F78gZJk@dread.disaster.area/

> 
>>> In addition, tarfs and puzzlefs also use those bindings.
>>
>> These are both toy fses, I don't know who will use these two
>> fses for their customers.
> 
> tarfs is being used by real users as it solves a need they have today.
> And it's a good example of how the vfs bindings would work, although
> in a very simple way.  You have to start somewhere :)

EROFS has resolved the same functionality upstream in
2023, see [2]

```
Replacing tar or cpio archives with a filesystem is a
potential use case for EROFS. There has been a proposal
from the confidential-computing community for a kernel
tarfs filesystem, which would allow guest VMs to
efficiently mount a tar file directly. But EROFS would
be a better choice, he said. There is a proof-of-concept
patch set that allows directly mounting a downloaded tar
file using EROFS that performs better than unpacking the
tarball to ext4, then mounting it in the guest using
overlayfs.
```

Honestly, I've kept doing very friendly public/private
communitation with Wedson in the confidential-computing
community to see if there could be a collaboration for
our tar direct mount use cases, but he just ignored my
suggestion [3] and keep on his "tarfs" development (even
this "tarfs" has no relationship with the standard
POSIX tar/pax format because you cannot mount a real
POSIX tar/pax by his implementation.)

So my concern is just as below:
  1) EROFS can already work well for his "tarfs" use
     cases, so there is already an in-tree stuff works
     without any issue;

  2) No matter from his "tarfs" current on-disk format,
     and on-disk extendability perspersive, I think it
     will be used for a very narrow use case.
     So in the long term, it could be vanished or forget
     since there are more powerful alternatives in the
     kernel tree for more wider use cases.

I think there could be some example fs to show Rust VFS
abstraction (such as ext2, and even minix fs).  Those
fses shouldn't be too hard to get a Rust implementation
(e.g. minix fs for pre Linux 1.0).  But honestly I don't
think it's a good idea to upstream a narrow use case
stuff even it's written in Rust: also considering Wedson
has been quited, so the code may not be maintainerd
anymore.

In short, I do _hope_ a nice Rust VFS abstraction could
be landed upstream.  But it should be driven by a simple
no-journal read-write filesystem to check all Rust VFS
components in the global vision instead of some
unsustainable random upstream work just for
corporation pride likewise.

And if some other approach could compare EROFS as a known
prior art (as I once fully compared with SquashFS in the
paper) with good reasons, I will be very happy and feel
respect (also I could know the limitation of EROFS or how
to improve EROFS.)  But if there is no reason and just
ignore EROFS exists, and I think it's not the proper way
to propose a new kernel feature / filesystem.

[2] https://lwn.net/Articles/934047
[3] https://github.com/kata-containers/kata-containers/pull/7106#issuecomment-1592192981

> 
>>> Miguel Ojeda.
>>> However, we can only reach that longterm goal if maintainers are willing
>>> and ready to put Rust into their subsystems (either by knowing/learning
>>> Rust themselves or by having a co-maintainer that does just the Rust
>>> part). So you wanting to experiment is great. I appreciate that you also
>>> have a student working on this. Still, I think we should follow our
>>> guidelines and create abstractions in order to require as little
>>> `unsafe` code as possible.
>>
>> I've expressed my point.  I don't think some `guideline`
>> could bring success to RFL.  Since many subsystems needs
>> an incremental way, not just a black-or-white thing.
> 
> Incremental is good, and if you want to use a .rs file or two in the
> middle of your module, that's fine.  But please don't try to implement
> bindings to common kernel data structures like inodes and dentries and
> superblocks if at all possible and ignore the work that others are doing
> in this area as that's just duplicated work and will cause more
> confusion over time.

Yeah, agreed. That is what I'd like to say.

Honestly, Yiyang don't have enough time to implement
VFS abstraction due to his studies and my time slot is
limited for now too.

So I also asked him to "don't touch common kernel data
structures like inodes and dentries and superblocks
if possible" and just convert the EROFS core logic.

But it seems his RFC patch is still left something,
I think he will address them the next version.

> 
> It's the same for drivers, I will object strongly if someone attempted
> to create a USB binding for 'struct usb_interface' in the middle of a
> USB driver and instead insist they work on a generic binding that can be
> used by all USB drivers.  I imagine the VFS maintainers have the same
> opinion on their apis as well for valid reasons.

Agreed.

Thanks,
Gao Xiang

> 
> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 10/24] erofs: add device_infos implementation in Rust
  2024-09-16 13:56 ` [RFC PATCH 10/24] erofs: add device_infos implementation " Yiyang Wu
@ 2024-09-21  9:44   ` Jianan Huang
  0 siblings, 0 replies; 69+ messages in thread
From: Jianan Huang @ 2024-09-21  9:44 UTC (permalink / raw)
  To: Yiyang Wu; +Cc: linux-erofs, linux-fsdevel, LKML, rust-for-linux

Yiyang Wu via Linux-erofs <linux-erofs@lists.ozlabs.org> 于2024年9月16日周一 21:57写道:
>
> Add device_infos implementation in rust. It will later be used
> to be put inside the SuperblockInfo. This mask and spec can later
> be used to chunk-based image file block mapping.
>
> Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
> ---
>  fs/erofs/rust/erofs_sys/devices.rs | 47 ++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
>
> diff --git a/fs/erofs/rust/erofs_sys/devices.rs b/fs/erofs/rust/erofs_sys/devices.rs
> index 097676ee8720..7495164c7bd0 100644
> --- a/fs/erofs/rust/erofs_sys/devices.rs
> +++ b/fs/erofs/rust/erofs_sys/devices.rs
> @@ -1,6 +1,10 @@
>  // Copyright 2024 Yiyang Wu
>  // SPDX-License-Identifier: MIT or GPL-2.0-or-later
>
> +use super::alloc_helper::*;
> +use super::data::raw_iters::*;
> +use super::data::*;
> +use super::*;
>  use alloc::vec::Vec;
>
>  /// Device specification.
> @@ -21,8 +25,51 @@ pub(crate) struct DeviceSlot {
>      reserved: [u8; 56],
>  }
>
> +impl From<[u8; 128]> for DeviceSlot {
> +    fn from(data: [u8; 128]) -> Self {
> +        Self {
> +            tags: data[0..64].try_into().unwrap(),
> +            blocks: u32::from_le_bytes([data[64], data[65], data[66], data[67]]),
> +            mapped_blocks: u32::from_le_bytes([data[68], data[69], data[70], data[71]]),
> +            reserved: data[72..128].try_into().unwrap(),
> +        }
> +    }
> +}
> +
>  /// Device information.
>  pub(crate) struct DeviceInfo {
>      pub(crate) mask: u16,
>      pub(crate) specs: Vec<DeviceSpec>,
>  }
> +
> +pub(crate) fn get_device_infos<'a>(
> +    iter: &mut (dyn ContinuousBufferIter<'a> + 'a),
> +) -> PosixResult<DeviceInfo> {
> +    let mut specs = Vec::new();
> +    for data in iter {
> +        let buffer = data?;
> +        let mut cur: usize = 0;
> +        let len = buffer.content().len();
> +        while cur + 128 <= len {


It is better to use macros instead of hardcode, like:
const EROFS_DEVT_SLOT_SIZE: usize = size_of::<DeviceSlot>();
Also works to the other similar usages in this patch set.

Thanks,
Jianan

>
> +            let slot_data: [u8; 128] = buffer.content()[cur..cur + 128].try_into().unwrap();
> +            let slot = DeviceSlot::from(slot_data);
> +            cur += 128;
> +            push_vec(
> +                &mut specs,
> +                DeviceSpec {
> +                    tags: slot.tags,
> +                    blocks: slot.blocks,
> +                    mapped_blocks: slot.mapped_blocks,
> +                },
> +            )?;
> +        }
> +    }
> +
> +    let mask = if specs.is_empty() {
> +        0
> +    } else {
> +        (1 << (specs.len().ilog2() + 1)) - 1
> +    };
> +
> +    Ok(DeviceInfo { mask, specs })
> +}
> --
> 2.46.0
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-19 19:36           ` Benno Lossin
  2024-09-20  0:49             ` Gao Xiang
@ 2024-09-25 15:48             ` Ariel Miculas
  2024-09-25 16:35               ` Gao Xiang
  1 sibling, 1 reply; 69+ messages in thread
From: Ariel Miculas @ 2024-09-25 15:48 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Gao Xiang, Gary Guo, Yiyang Wu, rust-for-linux,
	Greg Kroah-Hartman, LKML, Al Viro, linux-fsdevel, linux-erofs

On 24/09/19 07:36, Benno Lossin via Linux-erofs wrote:
> On 19.09.24 17:13, Gao Xiang wrote:
> > Hi Benno,
> > 
> > On 2024/9/19 21:45, Benno Lossin wrote:
> >> Hi,
> >>
> >> Thanks for the patch series. I think it's great that you want to use
> >> Rust for this filesystem.
> >>
> >> On 17.09.24 01:58, Gao Xiang wrote:
> >>> On 2024/9/17 04:01, Gary Guo wrote:
> >>>> Also, it seems that you're building abstractions into EROFS directly
> >>>> without building a generic abstraction. We have been avoiding that. If
> >>>> there's an abstraction that you need and missing, please add that
> >>>> abstraction. In fact, there're a bunch of people trying to add FS
> >>>
> >>> No, I'd like to try to replace some EROFS C logic first to Rust (by
> >>> using EROFS C API interfaces) and try if Rust is really useful for
> >>> a real in-tree filesystem.  If Rust can improve EROFS security or
> >>> performance (although I'm sceptical on performance), As an EROFS
> >>> maintainer, I'm totally fine to accept EROFS Rust logic landed to
> >>> help the whole filesystem better.
> >>
> >> As Gary already said, we have been using a different approach and it has
> >> served us well. Your approach of calling directly into C from the driver
> >> can be used to create a proof of concept, but in our opinion it is not
> >> something that should be put into mainline. That is because calling C
> >> from Rust is rather complicated due to the many nuanced features that
> >> Rust provides (for example the safety requirements of references).
> >> Therefore moving the dangerous parts into a central location is crucial
> >> for making use of all of Rust's advantages inside of your code.
> > 
> > I'm not quite sure about your point honestly.  In my opinion, there
> > is nothing different to use Rust _within a filesystem_ or _within a
> > driver_ or _within a Linux subsystem_ as long as all negotiated APIs
> > are audited.
> 
> To us there is a big difference: If a lot of functions in an API are
> `unsafe` without being inherent from the problem that it solves, then
> it's a bad API.
> 
> > Otherwise, it means Rust will never be used to write Linux core parts
> > such as MM, VFS or block layer. Does this point make sense? At least,
> > Rust needs to get along with the existing C code (in an audited way)
> > rather than refuse C code.
> 
> I am neither requiring you to write solely safe code, nor am I banning
> interacting with the C side. What we mean when we talk about
> abstractions is that we want to minimize the Rust code that directly
> interfaces with C. Rust-to-Rust interfaces can be a lot safer and are
> easier to implement correctly.
> 
> > My personal idea about Rust: I think Rust is just another _language
> > tool_ for the Linux kernel which could save us time and make the
> > kernel development better.
> 
> Yes, but we do have conventions, rules and guidelines for writing such
> code. C code also has them. If you want/need to break them, there should
> be a good reason to do so. I don't see one in this instance.
> 
> > Or I wonder why not writing a complete new Rust stuff instead rather
> > than living in the C world?
> 
> There are projects that do that yes. But Rust-for-Linux is about
> bringing Rust to the kernel and part of that is coming up with good
> conventions and rules.
> 
> >>> For Rust VFS abstraction, that is a different and indepenent story,
> >>> Yiyang don't have any bandwidth on this due to his limited time.
> >>
> >> This seems a bit weird, you have the bandwidth to write your own
> >> abstractions, but not use the stuff that has already been developed?
> > 
> > It's not written by me, Yiyang is still an undergraduate tudent.
> > It's his research project and I don't think it's his responsibility
> > to make an upstreamable VFS abstraction.
> 
> That is fair, but he wouldn't have to start from scratch, Wedsons
> abstractions were good enough for him to write a Rust version of ext2.
> In addition, tarfs and puzzlefs also use those bindings.
> To me it sounds as if you have not taken the time to try to make it work
> with the existing abstractions. Have you tried reaching out to Ariel? He
> is working on puzzlefs and might have some insight to give you. Sadly
> Wedson has left the project, so someone will have to pick up his work.

I share the same opinions as Benno that we should try to use the
existing filesystem abstractions, even if they are not yet upstream.
Since erofs is a read-only filesystem and the Rust filesystem
abstractions are also used by other two read-only filesystems (TarFS and
PuzzleFS), it shouldn't be too difficult to adapt the erofs Rust code so
that it also uses the existing filesystem abstractions. And if there is
anything lacking, we can improve the existing generic APIs. This would
also increase the chances of upstreaming them.

I'm happy to help you if you decide to go down this route.

Cheers,
Ariel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-25 15:48             ` Ariel Miculas
@ 2024-09-25 16:35               ` Gao Xiang
  2024-09-25 21:45                 ` Ariel Miculas
  0 siblings, 1 reply; 69+ messages in thread
From: Gao Xiang @ 2024-09-25 16:35 UTC (permalink / raw)
  To: Ariel Miculas, Benno Lossin
  Cc: Gary Guo, Yiyang Wu, rust-for-linux, Greg Kroah-Hartman, LKML,
	Al Viro, linux-fsdevel, linux-erofs, Linus Torvalds

Hi Ariel,

On 2024/9/25 23:48, Ariel Miculas wrote:

...

> I share the same opinions as Benno that we should try to use the
> existing filesystem abstractions, even if they are not yet upstream.
> Since erofs is a read-only filesystem and the Rust filesystem
> abstractions are also used by other two read-only filesystems (TarFS and
> PuzzleFS), it shouldn't be too difficult to adapt the erofs Rust code so
> that it also uses the existing filesystem abstractions. And if there is
> anything lacking, we can improve the existing generic APIs. This would
> also increase the chances of upstreaming them.

I've expressed my ideas about "TarFS" [1] and PuzzleFS already: since
I'm one of the EROFS authors, I should be responsible for this
long-term project as my own promise to the Linux community and makes
it serve for more Linux users (it has not been interrupted since 2017,
even I sacrificed almost all my leisure time because the EROFS project
isn't all my paid job, I need to maintain our internal kernel storage
stack too).

[1] https://lore.kernel.org/r/3a6314fc-7956-47f3-8727-9dc026f3f50e@linux.alibaba.com

Basically there should be some good reasons to upstream a new stuff to
Linux kernel, I believe it has no exception on the Rust side even it's
somewhat premature: please help compare to the prior arts in details.

And there are all thoughts for reference [2][3][4][5]:
[2] https://github.com/project-machine/puzzlefs/issues/114#issuecomment-2369872133
[3] https://github.com/opencontainers/image-spec/issues/1190#issuecomment-2138572683
[4] https://lore.kernel.org/linux-fsdevel/b9358e7c-8615-1b12-e35d-aae59bf6a467@linux.alibaba.com/
[5] https://lore.kernel.org/linux-fsdevel/20230609-nachrangig-handwagen-375405d3b9f1@brauner/

Here still, I do really want to collaborate with you on your
reasonable use cases.  But if you really want to do your upstream
attempt without even any comparsion, please go ahead because I
believe I can only express my own opinion, but I really don't
decide if your work is acceptable for the kernel.

> 
> I'm happy to help you if you decide to go down this route.

Again, the current VFS abstraction is totally incomplete and broken
[6].

I believe it should be driven by a full-featured read-write fs [7]
(even like a simple minix fs in pre-Linux 1.0 era) and EROFS will
use Rust in "fs/erofs" as the experiment, but I will definitely
polish the Rust version until it looks good before upstreaming.

I really don't want to be a repeater again.

[6] https://lwn.net/SubscriberLink/991062/9de8e9a466a3faf5
[7] https://lore.kernel.org/linux-fsdevel/ZZ3GeehAw%2F78gZJk@dread.disaster.area

Thanks,
Gao Xiang

> 
> Cheers,
> Ariel


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-25 16:35               ` Gao Xiang
@ 2024-09-25 21:45                 ` Ariel Miculas
  2024-09-26  0:40                   ` Gao Xiang
  0 siblings, 1 reply; 69+ messages in thread
From: Ariel Miculas @ 2024-09-25 21:45 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Benno Lossin, Gary Guo, Yiyang Wu, rust-for-linux,
	Greg Kroah-Hartman, LKML, Al Viro, linux-fsdevel, linux-erofs,
	Linus Torvalds

On 24/09/26 12:35, Gao Xiang wrote:
> Hi Ariel,
> 
> On 2024/9/25 23:48, Ariel Miculas wrote:
> 
> ...
> 
> > I share the same opinions as Benno that we should try to use the
> > existing filesystem abstractions, even if they are not yet upstream.
> > Since erofs is a read-only filesystem and the Rust filesystem
> > abstractions are also used by other two read-only filesystems (TarFS and
> > PuzzleFS), it shouldn't be too difficult to adapt the erofs Rust code so
> > that it also uses the existing filesystem abstractions. And if there is
> > anything lacking, we can improve the existing generic APIs. This would
> > also increase the chances of upstreaming them.
> 
> I've expressed my ideas about "TarFS" [1] and PuzzleFS already: since
> I'm one of the EROFS authors, I should be responsible for this
> long-term project as my own promise to the Linux community and makes
> it serve for more Linux users (it has not been interrupted since 2017,
> even I sacrificed almost all my leisure time because the EROFS project
> isn't all my paid job, I need to maintain our internal kernel storage
> stack too).
> 
> [1] https://lore.kernel.org/r/3a6314fc-7956-47f3-8727-9dc026f3f50e@linux.alibaba.com
> 
> Basically there should be some good reasons to upstream a new stuff to
> Linux kernel, I believe it has no exception on the Rust side even it's
> somewhat premature: please help compare to the prior arts in details.
> 
> And there are all thoughts for reference [2][3][4][5]:
> [2] https://github.com/project-machine/puzzlefs/issues/114#issuecomment-2369872133
> [3] https://github.com/opencontainers/image-spec/issues/1190#issuecomment-2138572683
> [4] https://lore.kernel.org/linux-fsdevel/b9358e7c-8615-1b12-e35d-aae59bf6a467@linux.alibaba.com/
> [5] https://lore.kernel.org/linux-fsdevel/20230609-nachrangig-handwagen-375405d3b9f1@brauner/
> 
> Here still, I do really want to collaborate with you on your
> reasonable use cases.  But if you really want to do your upstream
> attempt without even any comparsion, please go ahead because I
> believe I can only express my own opinion, but I really don't
> decide if your work is acceptable for the kernel.
> 

Thanks for your thoughts on PuzzleFS, I would really like if we could
centralize the discussions on the latest patch series I sent to the
mailing lists back in May [1]. The reason I say this is because looking
at that thread, it seems there is no feedback for PuzzleFS. The feedback
exists, it's just scattered throughout different mediums. On top of
this, I would also like to engage in the discussions with Dave Chinner,
so I can better understand the limitations of PuzzleFS and the reasons
for which it might be rejected in the Linux Kernel. I do appreciate your
feedback and I need to take my time to respond to the technical issues
that you brought up in the github issue.

However, even if it's not upstream, PuzzleFS does use the latest Rust
filesystem abstractions and thus it stands as an example of how to use
them. And this thread is not about PuzzleFS, but about the Rust
filesystem abstractions and how one might start to use them. That's
where I offered to help, since I already went through the process of
having to use them.

[1] https://lore.kernel.org/all/20240516190345.957477-1-amiculas@cisco.com/

> > 
> > I'm happy to help you if you decide to go down this route.
> 
> Again, the current VFS abstraction is totally incomplete and broken
> [6].

If they're incomplete, we can work together to implement the missing
functionalities. Furthermore, we can work to fix the broken stuff. I
don't think these are good reasons to completely ignore the work that's
already been done on this topic.

By the way, what is it that's actually broken? You've linked to an LWN
article [2] (or at least I think your 6th link was supposed to link to
"Rust for filesystems" instead of the "Committing to Rust in the kernel"
one), but I'm interested in the specifics. What exactly doesn't work as
expected from the filesystem abstractions?

[2] https://lwn.net/Articles/978738/

> 
> I believe it should be driven by a full-featured read-write fs [7]
> (even like a simple minix fs in pre-Linux 1.0 era) and EROFS will

I do find it weird that you want a full-featured read-write fs
implemented in Rust, when erofs is a read-only filesystem.

> use Rust in "fs/erofs" as the experiment, but I will definitely
> polish the Rust version until it looks good before upstreaming.

I honestly don't see how it would look good if they're not using the
existing filesystem abstractions. And I'm not convinced that Rust in the
kernel would be useful in any way without the many subsystem
abstractions which were implemented by the Rust for Linux team for the
past few years.

Cheers,
Ariel

> 
> I really don't want to be a repeater again.
> 
> [6] https://lwn.net/SubscriberLink/991062/9de8e9a466a3faf5
> [7] https://lore.kernel.org/linux-fsdevel/ZZ3GeehAw%2F78gZJk@dread.disaster.area
> 
> Thanks,
> Gao Xiang
> 
> > 
> > Cheers,
> > Ariel
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-25 21:45                 ` Ariel Miculas
@ 2024-09-26  0:40                   ` Gao Xiang
  2024-09-26  1:04                     ` Gao Xiang
  0 siblings, 1 reply; 69+ messages in thread
From: Gao Xiang @ 2024-09-26  0:40 UTC (permalink / raw)
  To: Ariel Miculas
  Cc: Benno Lossin, Gary Guo, Yiyang Wu, rust-for-linux,
	Greg Kroah-Hartman, LKML, Al Viro, linux-fsdevel, linux-erofs,
	Linus Torvalds



On 2024/9/26 05:45, Ariel Miculas wrote:
> On 24/09/26 12:35, Gao Xiang wrote:
>> Hi Ariel,
>>
>> On 2024/9/25 23:48, Ariel Miculas wrote:
>>

...

>>
>> And there are all thoughts for reference [2][3][4][5]:
>> [2] https://github.com/project-machine/puzzlefs/issues/114#issuecomment-2369872133
>> [3] https://github.com/opencontainers/image-spec/issues/1190#issuecomment-2138572683
>> [4] https://lore.kernel.org/linux-fsdevel/b9358e7c-8615-1b12-e35d-aae59bf6a467@linux.alibaba.com/
>> [5] https://lore.kernel.org/linux-fsdevel/20230609-nachrangig-handwagen-375405d3b9f1@brauner/
>>
>> Here still, I do really want to collaborate with you on your
>> reasonable use cases.  But if you really want to do your upstream
>> attempt without even any comparsion, please go ahead because I
>> believe I can only express my own opinion, but I really don't
>> decide if your work is acceptable for the kernel.
>>
> 
> Thanks for your thoughts on PuzzleFS, I would really like if we could
> centralize the discussions on the latest patch series I sent to the
> mailing lists back in May [1]. The reason I say this is because looking
> at that thread, it seems there is no feedback for PuzzleFS. The feedback
> exists, it's just scattered throughout different mediums. On top of
> this, I would also like to engage in the discussions with Dave Chinner,
> so I can better understand the limitations of PuzzleFS and the reasons
> for which it might be rejected in the Linux Kernel. I do appreciate your
> feedback and I need to take my time to respond to the technical issues
> that you brought up in the github issue.

In short, I really want to avoid open arbitary number files in the
page fault path regardless of the performance concerns, because
even there are many cases that mmap_lock is dropped, but IMHO there
is still cases that mmap_lock will be taken.

IOWs, I think it's controversal for a kernel fs to open random file
in the page fault context under mmap_lock in the begining.
Otherwise, it's pretty straight-forward to add some similiar feature
to EROFS.

> 
> However, even if it's not upstream, PuzzleFS does use the latest Rust
> filesystem abstractions and thus it stands as an example of how to use
> them. And this thread is not about PuzzleFS, but about the Rust
> filesystem abstractions and how one might start to use them. That's
> where I offered to help, since I already went through the process of
> having to use them.
> 
> [1] https://lore.kernel.org/all/20240516190345.957477-1-amiculas@cisco.com/
> 
>>>
>>> I'm happy to help you if you decide to go down this route.
>>
>> Again, the current VFS abstraction is totally incomplete and broken
>> [6].
> 
> If they're incomplete, we can work together to implement the missing
> functionalities. Furthermore, we can work to fix the broken stuff. I
> don't think these are good reasons to completely ignore the work that's
> already been done on this topic.

I've said, we don't miss any Rust VFS abstraction work, as long as
some work lands in the Linux kernel, we will switch to use them.

The reason we don't do that is again
  - I don't have time to work on this because my life is still limited
    for RFL in any way at least this year; I don't know if Yiyang has
    time to work on a complete ext2 and a Rust VFS abstraction.

  - We just would like to _use Rust_ for the core EROFS logic, instead
    of touching any VFS stuff.  I'm not sure why it's called "completely
    ignore the VFS abstraction", because there is absolutely no
    relationship between these two things.  Why we need to mix them up?

> 
> By the way, what is it that's actually broken? You've linked to an LWN
> article [2] (or at least I think your 6th link was supposed to link to
> "Rust for filesystems" instead of the "Committing to Rust in the kernel"
> one), but I'm interested in the specifics. What exactly doesn't work as
> expected from the filesystem abstractions?

For example, with my current Rust skill, I'm not sure why
fill_super for "T::SUPER_TYPE, sb::Type::BlockDev" must use
"new_sb.bdev().inode().mapper()".

It's unnecessary for a bdev-based fs to use bdev inode page
cache to read metadata;

Also it's unnecessary for a const fs type to be
sb::Type::BlockDev or sb::Type::Independent as

/// Determines how superblocks for this file system type are keyed.
+    const SUPER_TYPE: sb::Type = sb::Type::Independent;

because at least for the current EROFS use cases, we will
decide to use get_tree_bdev() or get_tree_nodev() according
to the mount source or mount options.

> 
> [2] https://lwn.net/Articles/978738/
> 
>>
>> I believe it should be driven by a full-featured read-write fs [7]
>> (even like a simple minix fs in pre-Linux 1.0 era) and EROFS will
> 
> I do find it weird that you want a full-featured read-write fs
> implemented in Rust, when erofs is a read-only filesystem.

I'm not sure why it's weird from the sane Rust VFS abstraction
perspective.

> 
>> use Rust in "fs/erofs" as the experiment, but I will definitely
>> polish the Rust version until it looks good before upstreaming.
> 
> I honestly don't see how it would look good if they're not using the
> existing filesystem abstractions. And I'm not convinced that Rust in the
> kernel would be useful in any way without the many subsystem
> abstractions which were implemented by the Rust for Linux team for the
> past few years.

So let's see the next version.

Thanks,
Gao Xiang

> 
> Cheers,
> Ariel
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-26  0:40                   ` Gao Xiang
@ 2024-09-26  1:04                     ` Gao Xiang
  2024-09-26  8:10                       ` Ariel Miculas
  0 siblings, 1 reply; 69+ messages in thread
From: Gao Xiang @ 2024-09-26  1:04 UTC (permalink / raw)
  To: Ariel Miculas
  Cc: Benno Lossin, Gary Guo, Yiyang Wu, rust-for-linux,
	Greg Kroah-Hartman, LKML, Al Viro, linux-fsdevel, linux-erofs,
	Linus Torvalds



On 2024/9/26 08:40, Gao Xiang wrote:
> 
> 
> On 2024/9/26 05:45, Ariel Miculas wrote:

...

>>
>> I honestly don't see how it would look good if they're not using the
>> existing filesystem abstractions. And I'm not convinced that Rust in the
>> kernel would be useful in any way without the many subsystem
>> abstractions which were implemented by the Rust for Linux team for the
>> past few years.
> 
> So let's see the next version.

Some more words, regardless of in-tree "fs/xfs/libxfs",
you also claimed "Another goal is to share the same code between user
space and kernel space in order to provide one secure implementation."
for example in [1].

I wonder Rust kernel VFS abstraction is forcely used in your userspace
implementation, or (somewhat) your argument is still broken here.

[1] https://lore.kernel.org/r/20230609-feldversuch-fixieren-fa141a2d9694@brauner

Thanks,
Gao Xiang

> 
> Thanks,
> Gao Xiang
> 
>>
>> Cheers,
>> Ariel
>>


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-26  1:04                     ` Gao Xiang
@ 2024-09-26  8:10                       ` Ariel Miculas
  2024-09-26  8:25                         ` Gao Xiang
  2024-09-26  8:48                         ` Gao Xiang
  0 siblings, 2 replies; 69+ messages in thread
From: Ariel Miculas @ 2024-09-26  8:10 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Benno Lossin, Gary Guo, Yiyang Wu, rust-for-linux,
	Greg Kroah-Hartman, LKML, Al Viro, linux-fsdevel, linux-erofs,
	Linus Torvalds

On 24/09/26 09:04, Gao Xiang wrote:
> 
> 
> On 2024/9/26 08:40, Gao Xiang wrote:
> > 
> > 
> > On 2024/9/26 05:45, Ariel Miculas wrote:
> 
> ...
> 
> > > 
> > > I honestly don't see how it would look good if they're not using the
> > > existing filesystem abstractions. And I'm not convinced that Rust in the
> > > kernel would be useful in any way without the many subsystem
> > > abstractions which were implemented by the Rust for Linux team for the
> > > past few years.
> > 
> > So let's see the next version.
> 
> Some more words, regardless of in-tree "fs/xfs/libxfs",
> you also claimed "Another goal is to share the same code between user
> space and kernel space in order to provide one secure implementation."
> for example in [1].
> 
> I wonder Rust kernel VFS abstraction is forcely used in your userspace
> implementation, or (somewhat) your argument is still broken here.

Of course the implementations cannot be identical, but there is a lot of
shared code between the user space and kernel space PuzzleFS
implementations. The user space implementation uses the fuser [1] crate
and implicitly its API for implementing the read/seek/list_xattrs etc.
operations, while the kernel implementation uses the Rust filesystem
abstractions.

While it's currently not possible to use external crates in the Linux
kernel (maybe it won't ever be), one area for improvement would be to
keep the shared code between these PuzzleFS implementations in the
kernel and publish releases to crates.io from there. In this way it will
be obvious which parts of the code are shared and they will actually be
shared (right now the code is duplicated).

I've actually touched on these points [2] during my last year's
presentation of PuzzleFS at Open Source Summit Europe [3].

And here [4] you can see the space savings achieved by PuzzleFS. In
short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
is before applying any compression, the space savings are only due to
the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
seekable compression), which is a fairer comparison (considering that
the OCI image uses gzip compression), then we get down to 53 MB for
storing all 10 Ubuntu Jammy versions using PuzzleFS.

Here's a summary:
# Steps

* I’ve downloaded 10 versions of Jammy from hub.docker.com
* These images only have one layer which is in tar.gz format
* I’ve built 10 equivalent puzzlefs images
* Compute the tarball_total_size by summing the sizes of every Jammy
  tarball (uncompressed) => 766 MB (use this as baseline)
* Sum the sizes of every oci/puzzlefs image => total_size
* Compute the total size as if all the versions were stored in a single
  oci/puzzlefs repository => total_unified_size
* Saved space = tarball_total_size - total_unified_size

# Results
(See [5] if you prefer the video format)

| Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
| --- | --- | --- | --- | --- |
| Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
| PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
| Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
| PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |

Here's the script I used to download the Ubuntu Jammy versions and
generate the PuzzleFS images [6] to get an idea about how I got to these
results.

Can we achieve these results with the current erofs features?  I'm
referring specifically to this comment: "EROFS already supports
variable-sized chunks + CDC" [7].

[1] https://docs.rs/fuser/latest/fuser/
[2] https://youtu.be/OhMtoLrjiBY?si=iuk7PstznEUgnr4g&t=1150
[3] https://osseu2023.sched.com/event/1OGjk/puzzlefs-the-next-generation-container-filesystem-armand-ariel-miculas-cisco-systems
[4] https://youtu.be/OhMtoLrjiBY?si=jwReE1qjs1wXLUCr&t=1732
[5] https://youtu.be/OhMtoLrjiBY?si=Nhlz8FJ9CGnwgOlS&t=1862
[6] https://gist.github.com/ariel-miculas/956056f213db2d3027905c61264d160b
[7] https://github.com/project-machine/puzzlefs/issues/114#issuecomment-2367039464

Regards,
Ariel

> 
> [1] https://lore.kernel.org/r/20230609-feldversuch-fixieren-fa141a2d9694@brauner
> 
> Thanks,
> Gao Xiang
> 
> > 
> > Thanks,
> > Gao Xiang
> > 
> > > 
> > > Cheers,
> > > Ariel
> > > 
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-26  8:10                       ` Ariel Miculas
@ 2024-09-26  8:25                         ` Gao Xiang
  2024-09-26  9:51                           ` Ariel Miculas
  2024-09-26  8:48                         ` Gao Xiang
  1 sibling, 1 reply; 69+ messages in thread
From: Gao Xiang @ 2024-09-26  8:25 UTC (permalink / raw)
  To: Ariel Miculas
  Cc: Benno Lossin, Gary Guo, Yiyang Wu, rust-for-linux,
	Greg Kroah-Hartman, LKML, Al Viro, linux-fsdevel, linux-erofs,
	Linus Torvalds



On 2024/9/26 16:10, Ariel Miculas wrote:
> On 24/09/26 09:04, Gao Xiang wrote:
>>


...

> 
> And here [4] you can see the space savings achieved by PuzzleFS. In
> short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
> up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
> is before applying any compression, the space savings are only due to
> the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
> seekable compression), which is a fairer comparison (considering that
> the OCI image uses gzip compression), then we get down to 53 MB for
> storing all 10 Ubuntu Jammy versions using PuzzleFS.
> 
> Here's a summary:
> # Steps
> 
> * I’ve downloaded 10 versions of Jammy from hub.docker.com
> * These images only have one layer which is in tar.gz format
> * I’ve built 10 equivalent puzzlefs images
> * Compute the tarball_total_size by summing the sizes of every Jammy
>    tarball (uncompressed) => 766 MB (use this as baseline)
> * Sum the sizes of every oci/puzzlefs image => total_size
> * Compute the total size as if all the versions were stored in a single
>    oci/puzzlefs repository => total_unified_size
> * Saved space = tarball_total_size - total_unified_size
> 
> # Results
> (See [5] if you prefer the video format)
> 
> | Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
> | --- | --- | --- | --- | --- |
> | Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
> | PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
> | Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
> | PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |
> 
> Here's the script I used to download the Ubuntu Jammy versions and
> generate the PuzzleFS images [6] to get an idea about how I got to these
> results.
> 
> Can we achieve these results with the current erofs features?  I'm
> referring specifically to this comment: "EROFS already supports
> variable-sized chunks + CDC" [7].

Please see
https://erofs.docs.kernel.org/en/latest/comparsion/dedupe.html

	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
Compressed OCI (tar.gz)	282.5	28.3	63%
Uncompressed OCI (tar)	766.1	76.6	0%
Uncomprssed EROFS	109.5	11.0	86%
EROFS (DEFLATE,9,32k)	46.4	4.6	94%
EROFS (LZ4HC,12,64k)	54.2	5.4	93%

I don't know which compression algorithm are you using (maybe Zstd?),
but from the result is
   EROFS (LZ4HC,12,64k)  54.2
   PuzzleFS compressed   53?
   EROFS (DEFLATE,9,32k) 46.4

I could reran with EROFS + Zstd, but it should be smaller. This feature
has been supported since Linux 6.1, thanks.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-26  8:10                       ` Ariel Miculas
  2024-09-26  8:25                         ` Gao Xiang
@ 2024-09-26  8:48                         ` Gao Xiang
  1 sibling, 0 replies; 69+ messages in thread
From: Gao Xiang @ 2024-09-26  8:48 UTC (permalink / raw)
  To: Ariel Miculas
  Cc: Benno Lossin, rust-for-linux, Greg Kroah-Hartman, LKML,
	Linus Torvalds, Al Viro, Gary Guo, linux-fsdevel, linux-erofs



On 2024/9/26 16:10, Ariel Miculas via Linux-erofs wrote:
> On 24/09/26 09:04, Gao Xiang wrote:
>>
>>
>> On 2024/9/26 08:40, Gao Xiang wrote:
>>>
>>>
>>> On 2024/9/26 05:45, Ariel Miculas wrote:
>>
>> ...
>>
>>>>
>>>> I honestly don't see how it would look good if they're not using the
>>>> existing filesystem abstractions. And I'm not convinced that Rust in the
>>>> kernel would be useful in any way without the many subsystem
>>>> abstractions which were implemented by the Rust for Linux team for the
>>>> past few years.
>>>
>>> So let's see the next version.
>>
>> Some more words, regardless of in-tree "fs/xfs/libxfs",
>> you also claimed "Another goal is to share the same code between user
>> space and kernel space in order to provide one secure implementation."
>> for example in [1].
>>
>> I wonder Rust kernel VFS abstraction is forcely used in your userspace
>> implementation, or (somewhat) your argument is still broken here.
> 
> Of course the implementations cannot be identical, but there is a lot of
> shared code between the user space and kernel space PuzzleFS
> implementations. The user space implementation uses the fuser [1] crate
If you know what you're doing, you may know what Yiyang is doing
here, he will just form a Rust EROFS core logic and upstream later.

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-26  8:25                         ` Gao Xiang
@ 2024-09-26  9:51                           ` Ariel Miculas
  2024-09-26 10:46                             ` Gao Xiang
  0 siblings, 1 reply; 69+ messages in thread
From: Ariel Miculas @ 2024-09-26  9:51 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Benno Lossin, rust-for-linux, Greg Kroah-Hartman, LKML,
	Linus Torvalds, Al Viro, Gary Guo, linux-fsdevel, linux-erofs

On 24/09/26 04:25, Gao Xiang wrote:
> 
> 
> On 2024/9/26 16:10, Ariel Miculas wrote:
> > On 24/09/26 09:04, Gao Xiang wrote:
> > > 
> 
> 
> ...
> 
> > 
> > And here [4] you can see the space savings achieved by PuzzleFS. In
> > short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
> > up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
> > is before applying any compression, the space savings are only due to
> > the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
> > seekable compression), which is a fairer comparison (considering that
> > the OCI image uses gzip compression), then we get down to 53 MB for
> > storing all 10 Ubuntu Jammy versions using PuzzleFS.
> > 
> > Here's a summary:
> > # Steps
> > 
> > * I’ve downloaded 10 versions of Jammy from hub.docker.com
> > * These images only have one layer which is in tar.gz format
> > * I’ve built 10 equivalent puzzlefs images
> > * Compute the tarball_total_size by summing the sizes of every Jammy
> >    tarball (uncompressed) => 766 MB (use this as baseline)
> > * Sum the sizes of every oci/puzzlefs image => total_size
> > * Compute the total size as if all the versions were stored in a single
> >    oci/puzzlefs repository => total_unified_size
> > * Saved space = tarball_total_size - total_unified_size
> > 
> > # Results
> > (See [5] if you prefer the video format)
> > 
> > | Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
> > | --- | --- | --- | --- | --- |
> > | Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
> > | PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
> > | Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
> > | PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |
> > 
> > Here's the script I used to download the Ubuntu Jammy versions and
> > generate the PuzzleFS images [6] to get an idea about how I got to these
> > results.
> > 
> > Can we achieve these results with the current erofs features?  I'm
> > referring specifically to this comment: "EROFS already supports
> > variable-sized chunks + CDC" [7].
> 
> Please see
> https://erofs.docs.kernel.org/en/latest/comparsion/dedupe.html

Great, I see you've used the same example as I did. Though I must admit
I'm a little surprised there's no mention of PuzzleFS in your document.

> 
> 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
> Compressed OCI (tar.gz)	282.5	28.3	63%
> Uncompressed OCI (tar)	766.1	76.6	0%
> Uncomprssed EROFS	109.5	11.0	86%
> EROFS (DEFLATE,9,32k)	46.4	4.6	94%
> EROFS (LZ4HC,12,64k)	54.2	5.4	93%
> 
> I don't know which compression algorithm are you using (maybe Zstd?),
> but from the result is
>   EROFS (LZ4HC,12,64k)  54.2
>   PuzzleFS compressed   53?
>   EROFS (DEFLATE,9,32k) 46.4
> 
> I could reran with EROFS + Zstd, but it should be smaller. This feature
> has been supported since Linux 6.1, thanks.

The average layer size is very impressive for EROFS, great work.
However, if we multiply the average layer size by 10, we get the total
size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
the average layer size is 30 MIB (for the compressed case), the unified
size is only 53 MiB. So this tells me there's blob sharing between the
different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
with EROFS (what I'm talking about is deduplication across the multiple
versions of Ubuntu Jammy and not within one single version).

Of course, with only 10 images, the space savings don't seem that
impressive for PuzzleFS compared to EROFS, but imagine we are storing
hundreds/thousands of Ubuntu versions. Then we're also building OCI
images on top of these versions. So if the user already has all the
blobs for an Ubuntu version, then we only need to ship the chunks that
have changed / have been added as a result of the specific application
that we've built on top of an existing Ubuntu version.

One more thing: the "Unified size" column is the key for understanding
the space savings offered by PuzzleFS and I see that you've left this
column out of your table.

Regards,
Ariel

> 
> Thanks,
> Gao Xiang

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-26  9:51                           ` Ariel Miculas
@ 2024-09-26 10:46                             ` Gao Xiang
  2024-09-26 11:01                               ` Ariel Miculas
  0 siblings, 1 reply; 69+ messages in thread
From: Gao Xiang @ 2024-09-26 10:46 UTC (permalink / raw)
  To: Ariel Miculas
  Cc: Benno Lossin, rust-for-linux, Greg Kroah-Hartman, LKML,
	Linus Torvalds, Al Viro, Gary Guo, linux-fsdevel, linux-erofs



On 2024/9/26 17:51, Ariel Miculas wrote:
> On 24/09/26 04:25, Gao Xiang wrote:
>>
>>
>> On 2024/9/26 16:10, Ariel Miculas wrote:
>>> On 24/09/26 09:04, Gao Xiang wrote:
>>>>
>>
>>
>> ...
>>
>>>
>>> And here [4] you can see the space savings achieved by PuzzleFS. In
>>> short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
>>> up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
>>> is before applying any compression, the space savings are only due to
>>> the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
>>> seekable compression), which is a fairer comparison (considering that
>>> the OCI image uses gzip compression), then we get down to 53 MB for
>>> storing all 10 Ubuntu Jammy versions using PuzzleFS.
>>>
>>> Here's a summary:
>>> # Steps
>>>
>>> * I’ve downloaded 10 versions of Jammy from hub.docker.com
>>> * These images only have one layer which is in tar.gz format
>>> * I’ve built 10 equivalent puzzlefs images
>>> * Compute the tarball_total_size by summing the sizes of every Jammy
>>>     tarball (uncompressed) => 766 MB (use this as baseline)
>>> * Sum the sizes of every oci/puzzlefs image => total_size
>>> * Compute the total size as if all the versions were stored in a single
>>>     oci/puzzlefs repository => total_unified_size
>>> * Saved space = tarball_total_size - total_unified_size
>>>
>>> # Results
>>> (See [5] if you prefer the video format)
>>>
>>> | Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
>>> | --- | --- | --- | --- | --- |
>>> | Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
>>> | PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
>>> | Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
>>> | PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |
>>>
>>> Here's the script I used to download the Ubuntu Jammy versions and
>>> generate the PuzzleFS images [6] to get an idea about how I got to these
>>> results.
>>>
>>> Can we achieve these results with the current erofs features?  I'm
>>> referring specifically to this comment: "EROFS already supports
>>> variable-sized chunks + CDC" [7].
>>
>> Please see
>> https://erofs.docs.kernel.org/en/latest/comparsion/dedupe.html
> 
> Great, I see you've used the same example as I did. Though I must admit
> I'm a little surprised there's no mention of PuzzleFS in your document.

Why I need to mention and even try PuzzleFS here (there are too many
attempts why I need to try them all)?  It just compares to the EROFS
prior work.

> 
>>
>> 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
>> Compressed OCI (tar.gz)	282.5	28.3	63%
>> Uncompressed OCI (tar)	766.1	76.6	0%
>> Uncomprssed EROFS	109.5	11.0	86%
>> EROFS (DEFLATE,9,32k)	46.4	4.6	94%
>> EROFS (LZ4HC,12,64k)	54.2	5.4	93%
>>
>> I don't know which compression algorithm are you using (maybe Zstd?),
>> but from the result is
>>    EROFS (LZ4HC,12,64k)  54.2
>>    PuzzleFS compressed   53?
>>    EROFS (DEFLATE,9,32k) 46.4
>>
>> I could reran with EROFS + Zstd, but it should be smaller. This feature
>> has been supported since Linux 6.1, thanks.
> 
> The average layer size is very impressive for EROFS, great work.
> However, if we multiply the average layer size by 10, we get the total
> size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
> the average layer size is 30 MIB (for the compressed case), the unified
> size is only 53 MiB. So this tells me there's blob sharing between the
> different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
> with EROFS (what I'm talking about is deduplication across the multiple
> versions of Ubuntu Jammy and not within one single version).

Don't make me wrong, I don't think you got the point.

First, what you asked was `I'm referring specifically to this
comment: "EROFS already supports variable-sized chunks + CDC"`,
so I clearly answered with the result of compressed data global
deduplication with CDC.

Here both EROFS and Squashfs compresses 10 Ubuntu images into
one image for fair comparsion to show the benefit of CDC, so
I believe they basically equal to your `Unified size`s, so
the result is

			Your unified size
	EROFS (LZ4HC,12,64k)  54.2
	PuzzleFS compressed   53?
	EROFS (DEFLATE,9,32k) 46.4

That is why I used your 53 unified size to show EROFS is much
smaller than PuzzleFS.

The reason why EROFS and SquashFS doesn't have the `Total Size`s
is just because we cannot store every individual chunk into some
seperate file.

Currently, I have seen no reason to open arbitary kernel files
(maybe hundreds due to large folio feature at once) in the page
fault context.  If I modified `mkfs.erofs` tool, I could give
some similar numbers, but I don't want to waste time now due
to `open arbitary kernel files in the page fault context`.

As I said, if PuzzleFS finally upstream some work to open kernel
files in page fault context, I will definitely work out the same
feature for EROFS soon, but currently I don't do that just
because it's very controversal and no in-tree kernel filesystem
does that.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-26 10:46                             ` Gao Xiang
@ 2024-09-26 11:01                               ` Ariel Miculas
  2024-09-26 11:05                                 ` Gao Xiang
  2024-09-26 11:23                                 ` Gao Xiang
  0 siblings, 2 replies; 69+ messages in thread
From: Ariel Miculas @ 2024-09-26 11:01 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Gary Guo, rust-for-linux, Greg Kroah-Hartman, linux-erofs, LKML,
	Al Viro, Benno Lossin, linux-fsdevel, Linus Torvalds

On 24/09/26 06:46, Gao Xiang wrote:
> 
> 
> On 2024/9/26 17:51, Ariel Miculas wrote:
> > On 24/09/26 04:25, Gao Xiang wrote:
> > > 
> > > 
> > > On 2024/9/26 16:10, Ariel Miculas wrote:
> > > > On 24/09/26 09:04, Gao Xiang wrote:
> > > > > 
> > > 
> > > 
> > > ...
> > > 
> > > > 
> > > > And here [4] you can see the space savings achieved by PuzzleFS. In
> > > > short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
> > > > up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
> > > > is before applying any compression, the space savings are only due to
> > > > the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
> > > > seekable compression), which is a fairer comparison (considering that
> > > > the OCI image uses gzip compression), then we get down to 53 MB for
> > > > storing all 10 Ubuntu Jammy versions using PuzzleFS.
> > > > 
> > > > Here's a summary:
> > > > # Steps
> > > > 
> > > > * I’ve downloaded 10 versions of Jammy from hub.docker.com
> > > > * These images only have one layer which is in tar.gz format
> > > > * I’ve built 10 equivalent puzzlefs images
> > > > * Compute the tarball_total_size by summing the sizes of every Jammy
> > > >     tarball (uncompressed) => 766 MB (use this as baseline)
> > > > * Sum the sizes of every oci/puzzlefs image => total_size
> > > > * Compute the total size as if all the versions were stored in a single
> > > >     oci/puzzlefs repository => total_unified_size
> > > > * Saved space = tarball_total_size - total_unified_size
> > > > 
> > > > # Results
> > > > (See [5] if you prefer the video format)
> > > > 
> > > > | Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
> > > > | --- | --- | --- | --- | --- |
> > > > | Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
> > > > | PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
> > > > | Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
> > > > | PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |
> > > > 
> > > > Here's the script I used to download the Ubuntu Jammy versions and
> > > > generate the PuzzleFS images [6] to get an idea about how I got to these
> > > > results.
> > > > 
> > > > Can we achieve these results with the current erofs features?  I'm
> > > > referring specifically to this comment: "EROFS already supports
> > > > variable-sized chunks + CDC" [7].
> > > 
> > > Please see
> > > https://erofs.docs.kernel.org/en/latest/comparsion/dedupe.html
> > 
> > Great, I see you've used the same example as I did. Though I must admit
> > I'm a little surprised there's no mention of PuzzleFS in your document.
> 
> Why I need to mention and even try PuzzleFS here (there are too many
> attempts why I need to try them all)?  It just compares to the EROFS
> prior work.
> 
> > 
> > > 
> > > 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
> > > Compressed OCI (tar.gz)	282.5	28.3	63%
> > > Uncompressed OCI (tar)	766.1	76.6	0%
> > > Uncomprssed EROFS	109.5	11.0	86%
> > > EROFS (DEFLATE,9,32k)	46.4	4.6	94%
> > > EROFS (LZ4HC,12,64k)	54.2	5.4	93%
> > > 
> > > I don't know which compression algorithm are you using (maybe Zstd?),
> > > but from the result is
> > >    EROFS (LZ4HC,12,64k)  54.2
> > >    PuzzleFS compressed   53?
> > >    EROFS (DEFLATE,9,32k) 46.4
> > > 
> > > I could reran with EROFS + Zstd, but it should be smaller. This feature
> > > has been supported since Linux 6.1, thanks.
> > 
> > The average layer size is very impressive for EROFS, great work.
> > However, if we multiply the average layer size by 10, we get the total
> > size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
> > the average layer size is 30 MIB (for the compressed case), the unified
> > size is only 53 MiB. So this tells me there's blob sharing between the
> > different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
> > with EROFS (what I'm talking about is deduplication across the multiple
> > versions of Ubuntu Jammy and not within one single version).
> 
> Don't make me wrong, I don't think you got the point.
> 
> First, what you asked was `I'm referring specifically to this
> comment: "EROFS already supports variable-sized chunks + CDC"`,
> so I clearly answered with the result of compressed data global
> deduplication with CDC.
> 
> Here both EROFS and Squashfs compresses 10 Ubuntu images into
> one image for fair comparsion to show the benefit of CDC, so

It might be a fair comparison, but that's not how container images are
distributed. You're trying to argue that I should just use EROFS and I'm
showing you that EROFS doesn't currently support the functionality
provided by PuzzleFS: the deduplication across multiple images.

> I believe they basically equal to your `Unified size`s, so
> the result is
> 
> 			Your unified size
> 	EROFS (LZ4HC,12,64k)  54.2
> 	PuzzleFS compressed   53?
> 	EROFS (DEFLATE,9,32k) 46.4
> 
> That is why I used your 53 unified size to show EROFS is much
> smaller than PuzzleFS.
> 
> The reason why EROFS and SquashFS doesn't have the `Total Size`s
> is just because we cannot store every individual chunk into some
> seperate file.

Well storing individual chunks into separate files is the entire point
of PuzzleFS.

> 
> Currently, I have seen no reason to open arbitary kernel files
> (maybe hundreds due to large folio feature at once) in the page
> fault context.  If I modified `mkfs.erofs` tool, I could give
> some similar numbers, but I don't want to waste time now due
> to `open arbitary kernel files in the page fault context`.
> 
> As I said, if PuzzleFS finally upstream some work to open kernel
> files in page fault context, I will definitely work out the same
> feature for EROFS soon, but currently I don't do that just
> because it's very controversal and no in-tree kernel filesystem
> does that.

The PuzzleFS kernel filesystem driver is still in an early POC stage, so
there's still a lot more work to be done.

Regards,
Ariel

> 
> Thanks,
> Gao Xiang

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-26 11:01                               ` Ariel Miculas
@ 2024-09-26 11:05                                 ` Gao Xiang
  2024-09-26 11:23                                 ` Gao Xiang
  1 sibling, 0 replies; 69+ messages in thread
From: Gao Xiang @ 2024-09-26 11:05 UTC (permalink / raw)
  To: Ariel Miculas
  Cc: Gary Guo, rust-for-linux, Greg Kroah-Hartman, linux-erofs, LKML,
	Al Viro, Benno Lossin, linux-fsdevel, Linus Torvalds



On 2024/9/26 19:01, Ariel Miculas wrote:

..

> 
>> I believe they basically equal to your `Unified size`s, so
>> the result is
>>
>> 			Your unified size
>> 	EROFS (LZ4HC,12,64k)  54.2
>> 	PuzzleFS compressed   53?
>> 	EROFS (DEFLATE,9,32k) 46.4
>>
>> That is why I used your 53 unified size to show EROFS is much
>> smaller than PuzzleFS.
>>
>> The reason why EROFS and SquashFS doesn't have the `Total Size`s
>> is just because we cannot store every individual chunk into some
>> seperate file.
> 
> Well storing individual chunks into separate files is the entire point
> of PuzzleFS.
> 
>>
>> Currently, I have seen no reason to open arbitary kernel files
>> (maybe hundreds due to large folio feature at once) in the page
>> fault context.  If I modified `mkfs.erofs` tool, I could give
>> some similar numbers, but I don't want to waste time now due
>> to `open arbitary kernel files in the page fault context`.
>>
>> As I said, if PuzzleFS finally upstream some work to open kernel
>> files in page fault context, I will definitely work out the same
>> feature for EROFS soon, but currently I don't do that just
>> because it's very controversal and no in-tree kernel filesystem
>> does that.
> 
> The PuzzleFS kernel filesystem driver is still in an early POC stage, so
> there's still a lot more work to be done.

I suggest that you could just ask FS/MM folks about this ("open
kernel files when reading in the page fault")  first.

If they say "no", I suggest please don't waste on this anymore.

Thanks,
Gao Xiang

> 
> Regards,
> Ariel
> 
>>
>> Thanks,
>> Gao Xiang


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-26 11:01                               ` Ariel Miculas
  2024-09-26 11:05                                 ` Gao Xiang
@ 2024-09-26 11:23                                 ` Gao Xiang
  2024-09-26 12:50                                   ` Ariel Miculas
  1 sibling, 1 reply; 69+ messages in thread
From: Gao Xiang @ 2024-09-26 11:23 UTC (permalink / raw)
  To: Ariel Miculas
  Cc: Benno Lossin, rust-for-linux, Greg Kroah-Hartman, LKML,
	Linus Torvalds, Al Viro, Gary Guo, linux-fsdevel, linux-erofs



On 2024/9/26 19:01, Ariel Miculas via Linux-erofs wrote:
> On 24/09/26 06:46, Gao Xiang wrote:

...

>>
>>>
>>>>
>>>> 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
>>>> Compressed OCI (tar.gz)	282.5	28.3	63%
>>>> Uncompressed OCI (tar)	766.1	76.6	0%
>>>> Uncomprssed EROFS	109.5	11.0	86%
>>>> EROFS (DEFLATE,9,32k)	46.4	4.6	94%
>>>> EROFS (LZ4HC,12,64k)	54.2	5.4	93%
>>>>
>>>> I don't know which compression algorithm are you using (maybe Zstd?),
>>>> but from the result is
>>>>     EROFS (LZ4HC,12,64k)  54.2
>>>>     PuzzleFS compressed   53?
>>>>     EROFS (DEFLATE,9,32k) 46.4
>>>>
>>>> I could reran with EROFS + Zstd, but it should be smaller. This feature
>>>> has been supported since Linux 6.1, thanks.
>>>
>>> The average layer size is very impressive for EROFS, great work.
>>> However, if we multiply the average layer size by 10, we get the total
>>> size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
>>> the average layer size is 30 MIB (for the compressed case), the unified
>>> size is only 53 MiB. So this tells me there's blob sharing between the
>>> different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
>>> with EROFS (what I'm talking about is deduplication across the multiple
>>> versions of Ubuntu Jammy and not within one single version).
>>
>> Don't make me wrong, I don't think you got the point.
>>
>> First, what you asked was `I'm referring specifically to this
>> comment: "EROFS already supports variable-sized chunks + CDC"`,
>> so I clearly answered with the result of compressed data global
>> deduplication with CDC.
>>
>> Here both EROFS and Squashfs compresses 10 Ubuntu images into
>> one image for fair comparsion to show the benefit of CDC, so
> 
> It might be a fair comparison, but that's not how container images are
> distributed. You're trying to argue that I should just use EROFS and I'm

First, OCI layer is just distributed like what I said.

For example, I could introduce some common blobs to keep
chunks as chunk dictionary.   And then the each image
will be just some index, and all data will be
deduplicated.  That is also what Nydus works.

> showing you that EROFS doesn't currently support the functionality
> provided by PuzzleFS: the deduplication across multiple images.

No, EROFS supports external devices/blobs to keep a lot of
chunks too (as dictionary to share data among images), but
clearly it has the upper limit.

But PuzzleFS just treat each individual chunk as a seperate
file, that will cause unavoidable "open arbitary number of
files on reading, even in page fault context".

> 
>> I believe they basically equal to your `Unified size`s, so
>> the result is
>>
>> 			Your unified size
>> 	EROFS (LZ4HC,12,64k)  54.2
>> 	PuzzleFS compressed   53?
>> 	EROFS (DEFLATE,9,32k) 46.4
>>
>> That is why I used your 53 unified size to show EROFS is much
>> smaller than PuzzleFS.
>>
>> The reason why EROFS and SquashFS doesn't have the `Total Size`s
>> is just because we cannot store every individual chunk into some
>> seperate file.
> 
> Well storing individual chunks into separate files is the entire point
> of PuzzleFS.
> 
>>
>> Currently, I have seen no reason to open arbitary kernel files
>> (maybe hundreds due to large folio feature at once) in the page
>> fault context.  If I modified `mkfs.erofs` tool, I could give
>> some similar numbers, but I don't want to waste time now due
>> to `open arbitary kernel files in the page fault context`.
>>
>> As I said, if PuzzleFS finally upstream some work to open kernel
>> files in page fault context, I will definitely work out the same
>> feature for EROFS soon, but currently I don't do that just
>> because it's very controversal and no in-tree kernel filesystem
>> does that.
> 
> The PuzzleFS kernel filesystem driver is still in an early POC stage, so
> there's still a lot more work to be done.

I suggest that you could just ask FS/MM folks about this ("open
kernel files when reading in the page fault") first.

If they say "no", I suggest please don't waste on this anymore.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-26 11:23                                 ` Gao Xiang
@ 2024-09-26 12:50                                   ` Ariel Miculas
  2024-09-27  2:18                                     ` Gao Xiang
  0 siblings, 1 reply; 69+ messages in thread
From: Ariel Miculas @ 2024-09-26 12:50 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Benno Lossin, rust-for-linux, Greg Kroah-Hartman, LKML,
	Linus Torvalds, Al Viro, Gary Guo, linux-fsdevel, linux-erofs

On 24/09/26 07:23, Gao Xiang wrote:
> 
> 
> On 2024/9/26 19:01, Ariel Miculas via Linux-erofs wrote:
> > On 24/09/26 06:46, Gao Xiang wrote:
> 
> ...
> 
> > > 
> > > > 
> > > > > 
> > > > > 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
> > > > > Compressed OCI (tar.gz)	282.5	28.3	63%
> > > > > Uncompressed OCI (tar)	766.1	76.6	0%
> > > > > Uncomprssed EROFS	109.5	11.0	86%
> > > > > EROFS (DEFLATE,9,32k)	46.4	4.6	94%
> > > > > EROFS (LZ4HC,12,64k)	54.2	5.4	93%
> > > > > 
> > > > > I don't know which compression algorithm are you using (maybe Zstd?),
> > > > > but from the result is
> > > > >     EROFS (LZ4HC,12,64k)  54.2
> > > > >     PuzzleFS compressed   53?
> > > > >     EROFS (DEFLATE,9,32k) 46.4
> > > > > 
> > > > > I could reran with EROFS + Zstd, but it should be smaller. This feature
> > > > > has been supported since Linux 6.1, thanks.
> > > > 
> > > > The average layer size is very impressive for EROFS, great work.
> > > > However, if we multiply the average layer size by 10, we get the total
> > > > size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
> > > > the average layer size is 30 MIB (for the compressed case), the unified
> > > > size is only 53 MiB. So this tells me there's blob sharing between the
> > > > different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
> > > > with EROFS (what I'm talking about is deduplication across the multiple
> > > > versions of Ubuntu Jammy and not within one single version).
> > > 
> > > Don't make me wrong, I don't think you got the point.
> > > 
> > > First, what you asked was `I'm referring specifically to this
> > > comment: "EROFS already supports variable-sized chunks + CDC"`,
> > > so I clearly answered with the result of compressed data global
> > > deduplication with CDC.
> > > 
> > > Here both EROFS and Squashfs compresses 10 Ubuntu images into
> > > one image for fair comparsion to show the benefit of CDC, so
> > 
> > It might be a fair comparison, but that's not how container images are
> > distributed. You're trying to argue that I should just use EROFS and I'm
> 
> First, OCI layer is just distributed like what I said.
> 
> For example, I could introduce some common blobs to keep
> chunks as chunk dictionary.   And then the each image
> will be just some index, and all data will be
> deduplicated.  That is also what Nydus works.

I don't really follow what Nydus does. Here [1] it says they're using
fixed size chunks of 1 MB. Where is the CDC step exactly?

[1] https://github.com/dragonflyoss/nydus/blob/master/docs/nydus-design.md#2-rafs

> 
> > showing you that EROFS doesn't currently support the functionality
> > provided by PuzzleFS: the deduplication across multiple images.
> 
> No, EROFS supports external devices/blobs to keep a lot of
> chunks too (as dictionary to share data among images), but
> clearly it has the upper limit.
> 
> But PuzzleFS just treat each individual chunk as a seperate
> file, that will cause unavoidable "open arbitary number of
> files on reading, even in page fault context".
> 
> > 
> > > I believe they basically equal to your `Unified size`s, so
> > > the result is
> > > 
> > > 			Your unified size
> > > 	EROFS (LZ4HC,12,64k)  54.2
> > > 	PuzzleFS compressed   53?
> > > 	EROFS (DEFLATE,9,32k) 46.4
> > > 
> > > That is why I used your 53 unified size to show EROFS is much
> > > smaller than PuzzleFS.
> > > 
> > > The reason why EROFS and SquashFS doesn't have the `Total Size`s
> > > is just because we cannot store every individual chunk into some
> > > seperate file.
> > 
> > Well storing individual chunks into separate files is the entire point
> > of PuzzleFS.
> > 
> > > 
> > > Currently, I have seen no reason to open arbitary kernel files
> > > (maybe hundreds due to large folio feature at once) in the page
> > > fault context.  If I modified `mkfs.erofs` tool, I could give
> > > some similar numbers, but I don't want to waste time now due
> > > to `open arbitary kernel files in the page fault context`.
> > > 
> > > As I said, if PuzzleFS finally upstream some work to open kernel
> > > files in page fault context, I will definitely work out the same
> > > feature for EROFS soon, but currently I don't do that just
> > > because it's very controversal and no in-tree kernel filesystem
> > > does that.
> > 
> > The PuzzleFS kernel filesystem driver is still in an early POC stage, so
> > there's still a lot more work to be done.
> 
> I suggest that you could just ask FS/MM folks about this ("open
> kernel files when reading in the page fault") first.
> 
> If they say "no", I suggest please don't waste on this anymore.
> 
> Thanks,
> Gao Xiang

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [RFC PATCH 03/24] erofs: add Errno in Rust
  2024-09-26 12:50                                   ` Ariel Miculas
@ 2024-09-27  2:18                                     ` Gao Xiang
  0 siblings, 0 replies; 69+ messages in thread
From: Gao Xiang @ 2024-09-27  2:18 UTC (permalink / raw)
  To: Ariel Miculas
  Cc: Benno Lossin, rust-for-linux, Greg Kroah-Hartman, LKML,
	Linus Torvalds, Al Viro, Gary Guo, linux-fsdevel, linux-erofs



On 2024/9/26 20:50, Ariel Miculas wrote:
> On 24/09/26 07:23, Gao Xiang wrote:

...

>>>
>>> It might be a fair comparison, but that's not how container images are
>>> distributed. You're trying to argue that I should just use EROFS and I'm
>>
>> First, OCI layer is just distributed like what I said.
>>
>> For example, I could introduce some common blobs to keep
>> chunks as chunk dictionary.   And then the each image
>> will be just some index, and all data will be
>> deduplicated.  That is also what Nydus works.
> 
> I don't really follow what Nydus does. Here [1] it says they're using
> fixed size chunks of 1 MB. Where is the CDC step exactly?

Dragonfly Nydus uses fixed-size chunks of 1MiB by default with
limited external blobs as chunk dictionaries.  And ComposeFS
uses per-file blobs.

Currently, Both are all EROFS users using different EROFS
features.  EROFS itself supports fixed-size chunks (unencoded),
variable-sized extents (encoded, CDC optional) and limited
external blobs.

Honestly, for your testload (10 versions of ubuntu:jammy), I
don't think CDC made a significant difference in the final
result compared to per-file blobs likewise.  Because most of
the files in these images are identical, I think there are
only binary differences due to CVE fixes or similar issues.
Maybe delta compression could do more help, but I never try
this.  So as I asked in [1], does ComposeFS already meet your
requirement?

Again, EROFS could keep every different extent (or each chunk,
whatever) as a seperate file with minor update, it's a trivial
stuff for some userspace archive system, but IMO it's
controversal for an in-tree kernel filesystem.

[1] https://github.com/project-machine/puzzlefs/issues/114#issuecomment-2369971291

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2024-09-27  2:18 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 01/24] erofs: lift up erofs_fill_inode to global Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 02/24] erofs: add superblock data structure in Rust Yiyang Wu
2024-09-16 17:55   ` Greg KH
2024-09-17  0:18     ` Gao Xiang
2024-09-17  5:34       ` Greg KH
2024-09-17  5:45         ` Gao Xiang
2024-09-17  5:27     ` Yiyang Wu
2024-09-17  5:39     ` Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 03/24] erofs: add Errno " Yiyang Wu
2024-09-16 17:51   ` Greg KH
2024-09-16 23:45     ` Gao Xiang
2024-09-20  2:49     ` [PATCH RESEND 0/1] rust: introduce declare_err! autogeneration Yiyang Wu
2024-09-20  2:49       ` [PATCH RESEND 1/1] rust: error: auto-generate error declarations Yiyang Wu
2024-09-20  2:57     ` [RFC PATCH 03/24] erofs: add Errno in Rust Yiyang Wu
2024-09-16 20:01   ` Gary Guo
2024-09-16 23:58     ` Gao Xiang
2024-09-19 13:45       ` Benno Lossin
2024-09-19 15:13         ` Gao Xiang
2024-09-19 19:36           ` Benno Lossin
2024-09-20  0:49             ` Gao Xiang
2024-09-21  8:37               ` Greg Kroah-Hartman
2024-09-21  9:29                 ` Gao Xiang
2024-09-25 15:48             ` Ariel Miculas
2024-09-25 16:35               ` Gao Xiang
2024-09-25 21:45                 ` Ariel Miculas
2024-09-26  0:40                   ` Gao Xiang
2024-09-26  1:04                     ` Gao Xiang
2024-09-26  8:10                       ` Ariel Miculas
2024-09-26  8:25                         ` Gao Xiang
2024-09-26  9:51                           ` Ariel Miculas
2024-09-26 10:46                             ` Gao Xiang
2024-09-26 11:01                               ` Ariel Miculas
2024-09-26 11:05                                 ` Gao Xiang
2024-09-26 11:23                                 ` Gao Xiang
2024-09-26 12:50                                   ` Ariel Miculas
2024-09-27  2:18                                     ` Gao Xiang
2024-09-26  8:48                         ` Gao Xiang
2024-09-16 13:56 ` [RFC PATCH 04/24] erofs: add xattrs data structure " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 05/24] erofs: add inode " Yiyang Wu
2024-09-18 13:04   ` [External Mail][RFC " Huang Jianan
2024-09-16 13:56 ` [RFC PATCH 06/24] erofs: add alloc_helper " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 07/24] erofs: add data abstraction " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 08/24] erofs: add device data structure " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 09/24] erofs: add continuous iterators " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 10/24] erofs: add device_infos implementation " Yiyang Wu
2024-09-21  9:44   ` Jianan Huang
2024-09-16 13:56 ` [RFC PATCH 11/24] erofs: add map data structure " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 12/24] erofs: add directory entry " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 13/24] erofs: add runtime filesystem and inode " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 14/24] erofs: add block mapping capability " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 15/24] erofs: add iter methods in filesystem " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 16/24] erofs: implement dir and inode operations " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 17/24] erofs: introduce Rust SBI to C Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 18/24] erofs: introduce iget alternative " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 19/24] erofs: introduce namei " Yiyang Wu
2024-09-16 17:08   ` Al Viro
2024-09-17  6:48     ` Yiyang Wu
2024-09-17  7:14       ` Gao Xiang
2024-09-17  7:31         ` Al Viro
2024-09-17  7:44           ` Al Viro
2024-09-17  8:08             ` Gao Xiang
2024-09-17 22:22             ` Al Viro
2024-09-17  8:06           ` Gao Xiang
2024-09-16 13:56 ` [RFC PATCH 20/24] erofs: introduce readdir " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 21/24] erofs: introduce erofs_map_blocks " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 22/24] erofs: add skippable iters in Rust Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 23/24] erofs: implement xattrs operations " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 24/24] erofs: introduce xattrs replacement to C Yiyang Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).