* [PATCH 1/3] ocfs2: reject dinodes with non-canonical i_mode type or stray bits
2026-05-17 11:10 [PATCH 0/3] ocfs2: harden inode validators against forged metadata Michael Bommarito
@ 2026-05-17 11:10 ` Michael Bommarito
2026-05-18 1:36 ` Joseph Qi
2026-05-17 11:10 ` [PATCH 2/3] ocfs2: reject dinodes whose i_rdev disagrees with the file type Michael Bommarito
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Michael Bommarito @ 2026-05-17 11:10 UTC (permalink / raw)
To: Joseph Qi, Mark Fasheh, Joel Becker
Cc: ZhengYuan Huang, ocfs2-devel, linux-fsdevel, linux-kernel
ocfs2_validate_inode_block() currently accepts any 16-bit i_mode
value as long as i_mode is non-zero. ocfs2_populate_inode() then
copies that mode verbatim into inode->i_mode and dispatches on
i_mode & S_IFMT to the file/dir/symlink/special_file iops; any
unrecognised type falls through to ocfs2_special_file_iops and
init_special_inode(), which interprets id1.dev1.i_rdev as a
device number.
The result is that anything able to forge or corrupt an inode
block (a hostile cluster peer with raw write access to the
shared LUN, a privileged user mounting an attacker-supplied
image, on-disk corruption) can publish an in-core inode whose
type bits do not name a POSIX file type, or whose permission
bits carry bytes outside S_IFMT|07777. Both shapes propagate
into VFS-visible state that downstream code paths assume is
well-formed.
Reject early in the validator:
- mode bits outside S_IFMT|07777
- S_IFMT values that are not one of S_IFREG, S_IFDIR, S_IFLNK,
S_IFCHR, S_IFBLK, S_IFIFO, S_IFSOCK
mkfs.ocfs2 and the kernel only ever produce these seven types
plus the standard permission, setuid/setgid/sticky bits; an
on-disk i_mode outside this envelope is structurally malformed
regardless of how it got there.
Validated against the existing inline_data, refcount, and
chain-list checks: this hardening fires before any of them and
does not perturb their behaviour for well-formed inodes.
Fixes: b657c95c1108 ("ocfs2: Wrap inode block reads in a dedicated function.")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
---
fs/ocfs2/inode.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index a510a0eb1adcc..fb592bf3e5f31 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -1494,6 +1494,45 @@ int ocfs2_validate_inode_block(struct super_block *sb,
goto bail;
}
+ /*
+ * Reject dinodes whose i_mode does not name one of the seven
+ * canonical POSIX file types, or whose mode carries bits outside
+ * S_IFMT | 07777. ocfs2_populate_inode() copies i_mode verbatim
+ * into inode->i_mode and then dispatches via switch (mode & S_IFMT)
+ * to file/dir/symlink/special_file iops; an unrecognised type
+ * falls into ocfs2_special_file_iops with init_special_inode(),
+ * which interprets i_rdev. Constrain the type byte here so the
+ * dispatch only ever sees a value mkfs.ocfs2 / VFS can produce.
+ */
+ {
+ u16 mode = le16_to_cpu(di->i_mode);
+
+ if (mode & ~(S_IFMT | 07777)) {
+ rc = ocfs2_error(sb,
+ "Invalid dinode #%llu: mode 0%o has bits outside S_IFMT|07777\n",
+ (unsigned long long)bh->b_blocknr,
+ mode);
+ goto bail;
+ }
+
+ switch (mode & S_IFMT) {
+ case S_IFREG:
+ case S_IFDIR:
+ case S_IFLNK:
+ case S_IFCHR:
+ case S_IFBLK:
+ case S_IFIFO:
+ case S_IFSOCK:
+ break;
+ default:
+ rc = ocfs2_error(sb,
+ "Invalid dinode #%llu: mode 0%o has unknown file type\n",
+ (unsigned long long)bh->b_blocknr,
+ mode);
+ goto bail;
+ }
+ }
+
if (le16_to_cpu(di->i_dyn_features) & OCFS2_INLINE_DATA_FL) {
struct ocfs2_inline_data *data = &di->id2.i_data;
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 1/3] ocfs2: reject dinodes with non-canonical i_mode type or stray bits
2026-05-17 11:10 ` [PATCH 1/3] ocfs2: reject dinodes with non-canonical i_mode type or stray bits Michael Bommarito
@ 2026-05-18 1:36 ` Joseph Qi
0 siblings, 0 replies; 10+ messages in thread
From: Joseph Qi @ 2026-05-18 1:36 UTC (permalink / raw)
To: Michael Bommarito, akpm
Cc: ZhengYuan Huang, ocfs2-devel, linux-fsdevel, linux-kernel,
Mark Fasheh, Joel Becker
On 5/17/26 7:10 PM, Michael Bommarito wrote:
> ocfs2_validate_inode_block() currently accepts any 16-bit i_mode
> value as long as i_mode is non-zero. ocfs2_populate_inode() then
> copies that mode verbatim into inode->i_mode and dispatches on
> i_mode & S_IFMT to the file/dir/symlink/special_file iops; any
> unrecognised type falls through to ocfs2_special_file_iops and
> init_special_inode(), which interprets id1.dev1.i_rdev as a
> device number.
>
> The result is that anything able to forge or corrupt an inode
> block (a hostile cluster peer with raw write access to the
> shared LUN, a privileged user mounting an attacker-supplied
> image, on-disk corruption) can publish an in-core inode whose
> type bits do not name a POSIX file type, or whose permission
> bits carry bytes outside S_IFMT|07777. Both shapes propagate
> into VFS-visible state that downstream code paths assume is
> well-formed.
>
> Reject early in the validator:
>
> - mode bits outside S_IFMT|07777
> - S_IFMT values that are not one of S_IFREG, S_IFDIR, S_IFLNK,
> S_IFCHR, S_IFBLK, S_IFIFO, S_IFSOCK
>
> mkfs.ocfs2 and the kernel only ever produce these seven types
> plus the standard permission, setuid/setgid/sticky bits; an
> on-disk i_mode outside this envelope is structurally malformed
> regardless of how it got there.
>
> Validated against the existing inline_data, refcount, and
> chain-list checks: this hardening fires before any of them and
> does not perturb their behaviour for well-formed inodes.
>
> Fixes: b657c95c1108 ("ocfs2: Wrap inode block reads in a dedicated function.")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-7
Looks fine.
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> ---
> fs/ocfs2/inode.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index a510a0eb1adcc..fb592bf3e5f31 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -1494,6 +1494,45 @@ int ocfs2_validate_inode_block(struct super_block *sb,
> goto bail;
> }
>
> + /*
> + * Reject dinodes whose i_mode does not name one of the seven
> + * canonical POSIX file types, or whose mode carries bits outside
> + * S_IFMT | 07777. ocfs2_populate_inode() copies i_mode verbatim
> + * into inode->i_mode and then dispatches via switch (mode & S_IFMT)
> + * to file/dir/symlink/special_file iops; an unrecognised type
> + * falls into ocfs2_special_file_iops with init_special_inode(),
> + * which interprets i_rdev. Constrain the type byte here so the
> + * dispatch only ever sees a value mkfs.ocfs2 / VFS can produce.
> + */
> + {
> + u16 mode = le16_to_cpu(di->i_mode);
> +
> + if (mode & ~(S_IFMT | 07777)) {
> + rc = ocfs2_error(sb,
> + "Invalid dinode #%llu: mode 0%o has bits outside S_IFMT|07777\n",
> + (unsigned long long)bh->b_blocknr,
> + mode);
> + goto bail;
> + }
> +
> + switch (mode & S_IFMT) {
> + case S_IFREG:
> + case S_IFDIR:
> + case S_IFLNK:
> + case S_IFCHR:
> + case S_IFBLK:
> + case S_IFIFO:
> + case S_IFSOCK:
> + break;
> + default:
> + rc = ocfs2_error(sb,
> + "Invalid dinode #%llu: mode 0%o has unknown file type\n",
> + (unsigned long long)bh->b_blocknr,
> + mode);
> + goto bail;
> + }
> + }
> +
> if (le16_to_cpu(di->i_dyn_features) & OCFS2_INLINE_DATA_FL) {
> struct ocfs2_inline_data *data = &di->id2.i_data;
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/3] ocfs2: reject dinodes whose i_rdev disagrees with the file type
2026-05-17 11:10 [PATCH 0/3] ocfs2: harden inode validators against forged metadata Michael Bommarito
2026-05-17 11:10 ` [PATCH 1/3] ocfs2: reject dinodes with non-canonical i_mode type or stray bits Michael Bommarito
@ 2026-05-17 11:10 ` Michael Bommarito
2026-05-18 1:37 ` Joseph Qi
2026-05-17 11:10 ` [PATCH 3/3] ocfs2: reject regular files with non-zero i_size and zero i_clusters Michael Bommarito
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Michael Bommarito @ 2026-05-17 11:10 UTC (permalink / raw)
To: Joseph Qi, Mark Fasheh, Joel Becker
Cc: ZhengYuan Huang, ocfs2-devel, linux-fsdevel, linux-kernel
id1.dev1.i_rdev is the device-number arm of the ocfs2_dinode id1
union and is only meaningful for character and block device
inodes. For any other user-visible file type the on-disk value
must be zero.
ocfs2_populate_inode() currently runs
inode->i_rdev = huge_decode_dev(le64_to_cpu(fe->id1.dev1.i_rdev));
unconditionally, before the S_IFMT switch decides whether the
inode is a special file. As a result, an i_rdev value present on
a non-device inode is silently published into the in-core inode.
A subsequent forced re-read or in-core mode mutation (cluster
peer with raw write access to the shared LUN, on-disk corruption,
or a separately forged dinode) can then expose the attacker-
controlled device number to init_special_inode() without ever
showing an unusual i_mode at validation time.
System inodes (OCFS2_SYSTEM_FL) legitimately use the bitmap1 and
journal1 arms of the same union: allocator inodes encode i_used
/ i_total in the bitmap1 arm and the journal encodes ij_flags /
ij_recovery_generation in the journal1 arm. Those byte
sequences are not an i_rdev and a non-zero pattern there is the
on-disk norm, not an integrity violation. Restrict the cross-
check to non-system inodes; that is the full surface where
i_rdev semantics apply and is also the full surface an
unprivileged consumer of the volume can see.
Following the i_mode canonicalisation in patch 1, S_ISCHR /
S_ISBLK covers the whole device-inode space; this check operates
correctly on its own, but the canonicalised i_mode makes the
predicate exhaustive.
Fixes: b657c95c1108 ("ocfs2: Wrap inode block reads in a dedicated function.")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
---
fs/ocfs2/inode.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index fb592bf3e5f31..305e22cc9b1d9 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -1533,6 +1533,44 @@ int ocfs2_validate_inode_block(struct super_block *sb,
}
}
+ /*
+ * id1.dev1.i_rdev is the device-number arm of the id1 union and
+ * is only meaningful for character and block device inodes. For
+ * any other regular user-visible file type the on-disk value
+ * must be zero. ocfs2_populate_inode() currently runs
+ *
+ * inode->i_rdev = huge_decode_dev(le64_to_cpu(fe->id1.dev1.i_rdev));
+ *
+ * unconditionally, before the S_IFMT switch decides whether the
+ * inode is a special file. As a result, an i_rdev value present
+ * on a non-device inode is silently published into the in-core
+ * inode; a subsequent forced re-read or in-core mode mutation
+ * (cluster peer with raw write access to the shared LUN,
+ * on-disk corruption, or a separately forged dinode) can then
+ * expose the attacker-controlled device number to
+ * init_special_inode() without ever showing an unusual i_mode
+ * at validation time.
+ *
+ * System inodes (OCFS2_SYSTEM_FL) legitimately use the bitmap1
+ * and journal1 arms of the same union (allocator i_used /
+ * i_total counters and the journal ij_flags /
+ * ij_recovery_generation pair); those bytes are not an i_rdev
+ * and must not be checked here. Restrict the cross-check to
+ * non-system inodes, which is the full attacker-controllable
+ * surface.
+ */
+ if (!(le32_to_cpu(di->i_flags) & OCFS2_SYSTEM_FL) &&
+ !S_ISCHR(le16_to_cpu(di->i_mode)) &&
+ !S_ISBLK(le16_to_cpu(di->i_mode)) &&
+ di->id1.dev1.i_rdev != 0) {
+ rc = ocfs2_error(sb,
+ "Invalid dinode #%llu: non-device mode 0%o with i_rdev %llu\n",
+ (unsigned long long)bh->b_blocknr,
+ le16_to_cpu(di->i_mode),
+ (unsigned long long)le64_to_cpu(di->id1.dev1.i_rdev));
+ goto bail;
+ }
+
if (le16_to_cpu(di->i_dyn_features) & OCFS2_INLINE_DATA_FL) {
struct ocfs2_inline_data *data = &di->id2.i_data;
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 2/3] ocfs2: reject dinodes whose i_rdev disagrees with the file type
2026-05-17 11:10 ` [PATCH 2/3] ocfs2: reject dinodes whose i_rdev disagrees with the file type Michael Bommarito
@ 2026-05-18 1:37 ` Joseph Qi
0 siblings, 0 replies; 10+ messages in thread
From: Joseph Qi @ 2026-05-18 1:37 UTC (permalink / raw)
To: Michael Bommarito, akpm
Cc: ZhengYuan Huang, ocfs2-devel, linux-fsdevel, linux-kernel,
Mark Fasheh, Joel Becker
On 5/17/26 7:10 PM, Michael Bommarito wrote:
> id1.dev1.i_rdev is the device-number arm of the ocfs2_dinode id1
> union and is only meaningful for character and block device
> inodes. For any other user-visible file type the on-disk value
> must be zero.
>
> ocfs2_populate_inode() currently runs
>
> inode->i_rdev = huge_decode_dev(le64_to_cpu(fe->id1.dev1.i_rdev));
>
> unconditionally, before the S_IFMT switch decides whether the
> inode is a special file. As a result, an i_rdev value present on
> a non-device inode is silently published into the in-core inode.
> A subsequent forced re-read or in-core mode mutation (cluster
> peer with raw write access to the shared LUN, on-disk corruption,
> or a separately forged dinode) can then expose the attacker-
> controlled device number to init_special_inode() without ever
> showing an unusual i_mode at validation time.
>
> System inodes (OCFS2_SYSTEM_FL) legitimately use the bitmap1 and
> journal1 arms of the same union: allocator inodes encode i_used
> / i_total in the bitmap1 arm and the journal encodes ij_flags /
> ij_recovery_generation in the journal1 arm. Those byte
> sequences are not an i_rdev and a non-zero pattern there is the
> on-disk norm, not an integrity violation. Restrict the cross-
> check to non-system inodes; that is the full surface where
> i_rdev semantics apply and is also the full surface an
> unprivileged consumer of the volume can see.
>
> Following the i_mode canonicalisation in patch 1, S_ISCHR /
> S_ISBLK covers the whole device-inode space; this check operates
> correctly on its own, but the canonicalised i_mode makes the
> predicate exhaustive.
>
> Fixes: b657c95c1108 ("ocfs2: Wrap inode block reads in a dedicated function.")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-7
Looks fine.
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> ---
> fs/ocfs2/inode.c | 38 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 38 insertions(+)
>
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index fb592bf3e5f31..305e22cc9b1d9 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -1533,6 +1533,44 @@ int ocfs2_validate_inode_block(struct super_block *sb,
> }
> }
>
> + /*
> + * id1.dev1.i_rdev is the device-number arm of the id1 union and
> + * is only meaningful for character and block device inodes. For
> + * any other regular user-visible file type the on-disk value
> + * must be zero. ocfs2_populate_inode() currently runs
> + *
> + * inode->i_rdev = huge_decode_dev(le64_to_cpu(fe->id1.dev1.i_rdev));
> + *
> + * unconditionally, before the S_IFMT switch decides whether the
> + * inode is a special file. As a result, an i_rdev value present
> + * on a non-device inode is silently published into the in-core
> + * inode; a subsequent forced re-read or in-core mode mutation
> + * (cluster peer with raw write access to the shared LUN,
> + * on-disk corruption, or a separately forged dinode) can then
> + * expose the attacker-controlled device number to
> + * init_special_inode() without ever showing an unusual i_mode
> + * at validation time.
> + *
> + * System inodes (OCFS2_SYSTEM_FL) legitimately use the bitmap1
> + * and journal1 arms of the same union (allocator i_used /
> + * i_total counters and the journal ij_flags /
> + * ij_recovery_generation pair); those bytes are not an i_rdev
> + * and must not be checked here. Restrict the cross-check to
> + * non-system inodes, which is the full attacker-controllable
> + * surface.
> + */
> + if (!(le32_to_cpu(di->i_flags) & OCFS2_SYSTEM_FL) &&
> + !S_ISCHR(le16_to_cpu(di->i_mode)) &&
> + !S_ISBLK(le16_to_cpu(di->i_mode)) &&
> + di->id1.dev1.i_rdev != 0) {
> + rc = ocfs2_error(sb,
> + "Invalid dinode #%llu: non-device mode 0%o with i_rdev %llu\n",
> + (unsigned long long)bh->b_blocknr,
> + le16_to_cpu(di->i_mode),
> + (unsigned long long)le64_to_cpu(di->id1.dev1.i_rdev));
> + goto bail;
> + }
> +
> if (le16_to_cpu(di->i_dyn_features) & OCFS2_INLINE_DATA_FL) {
> struct ocfs2_inline_data *data = &di->id2.i_data;
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 3/3] ocfs2: reject regular files with non-zero i_size and zero i_clusters
2026-05-17 11:10 [PATCH 0/3] ocfs2: harden inode validators against forged metadata Michael Bommarito
2026-05-17 11:10 ` [PATCH 1/3] ocfs2: reject dinodes with non-canonical i_mode type or stray bits Michael Bommarito
2026-05-17 11:10 ` [PATCH 2/3] ocfs2: reject dinodes whose i_rdev disagrees with the file type Michael Bommarito
@ 2026-05-17 11:10 ` Michael Bommarito
2026-05-18 1:38 ` Joseph Qi
2026-05-18 21:40 ` [PATCH 0/3] ocfs2: harden inode validators against forged metadata Andrew Morton
2026-06-01 17:39 ` Joel Becker
4 siblings, 1 reply; 10+ messages in thread
From: Michael Bommarito @ 2026-05-17 11:10 UTC (permalink / raw)
To: Joseph Qi, Mark Fasheh, Joel Becker
Cc: ZhengYuan Huang, ocfs2-devel, linux-fsdevel, linux-kernel
On a volume mounted WITHOUT OCFS2_FEATURE_INCOMPAT_SPARSE_ALLOC, a
regular file with non-zero i_size, zero i_clusters, and no
OCFS2_INLINE_DATA_FL flag is structurally malformed: the extent
map declares no allocated clusters yet the size header claims
the file has content. ocfs2_populate_inode() copies i_size into
the in-core inode and dispatches to ocfs2_aops; subsequent reads
or truncates then operate on an inconsistent extent state.
This is the shape an attacker who keeps the rest of the extent
list intact (to satisfy the inline-data, refcount, chain-list,
and per-field validators already in this function) would produce
when forging only the inode header to publish a synthetic file
size on a victim node. It is also the shape on-disk corruption
of the i_clusters field produces. Reject early in the
validator.
The check is restricted to non-sparse volumes
(ocfs2_sparse_alloc() returns false). On non-sparse mounts the
allocator path always grows clusters before i_size:
ocfs2_extend_file() takes the !sparse branch into
ocfs2_extend_no_holes(), which calls ocfs2_extend_allocation()
to journal new clusters first, and only then
ocfs2_simple_size_update() journals the larger i_size. The
truncate path likewise lowers i_size in ocfs2_orphan_for_truncate()
and then frees clusters in ocfs2_commit_truncate(), which uses
ocfs2_clusters_for_bytes(new_i_size) as its new_highest_cpos:
when new_i_size > 0 the floor is at least one cluster, so the
on-disk dinode never legitimately exposes a non-inline regular
file with i_size > 0 and i_clusters == 0 on a non-sparse volume.
On sparse-alloc volumes the same shape is legitimate: an
ocfs2_extend_file() call goes through ocfs2_zero_extend() +
ocfs2_simple_size_update(), which grows i_size on its own
without changing i_clusters; a freshly truncate -s 1M of a
sparse regular file is therefore on-disk
(i_size = 1048576, i_clusters = 0). The check therefore opts
out via ocfs2_sparse_alloc(OCFS2_SB(sb)).
System inodes (OCFS2_SYSTEM_FL) carry their own size and
cluster invariants validated by the allocator, journal, quota,
and truncate-log subsystems; skip them here. The inline-data
fast path is filtered separately by its own dedicated branch
below: its well-formed case is exactly i_clusters == 0 with
i_size <= id_count. Symlinks legitimately keep i_clusters ==
0 with non-zero i_size (fast symlinks), so this check is
restricted to S_IFREG.
Fixes: b657c95c1108 ("ocfs2: Wrap inode block reads in a dedicated function.")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
---
fs/ocfs2/inode.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 305e22cc9b1d9..c63d2ced6b338 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -1571,6 +1571,47 @@ int ocfs2_validate_inode_block(struct super_block *sb,
goto bail;
}
+ /*
+ * On a non-sparse volume, a regular file with non-zero i_size
+ * and zero i_clusters that is not marked as inline data is
+ * structurally malformed: the extent map declares no allocated
+ * clusters yet the size header claims the file has content.
+ * ocfs2_populate_inode() would still publish i_size to VFS and
+ * leave the extent state inconsistent for any later read or
+ * truncate. This is the shape an attacker who keeps the rest
+ * of the extent list intact (to satisfy the inline-data,
+ * refcount, chain-list, and per-field validators above) would
+ * produce when forging only the inode header to publish a
+ * synthetic file size on a victim node. It is also the shape
+ * on-disk corruption of the i_clusters field produces.
+ *
+ * The check opts out on sparse-alloc volumes, where the
+ * extend path (ocfs2_extend_file -> ocfs2_zero_extend ->
+ * ocfs2_simple_size_update) legitimately grows i_size without
+ * allocating clusters. On non-sparse volumes the equivalent
+ * path (ocfs2_extend_no_holes) journals clusters first and
+ * i_size second, and truncate-down floors i_clusters at
+ * ocfs2_clusters_for_bytes(new_i_size) which is >= 1 whenever
+ * new_i_size > 0, so the rejected shape never appears on disk.
+ *
+ * Skip system inodes (OCFS2_SYSTEM_FL) and the inline-data
+ * fast path (handled below). Symlinks legitimately keep
+ * i_clusters == 0 with non-zero i_size (fast symlinks), so
+ * restrict to S_IFREG.
+ */
+ if (!ocfs2_sparse_alloc(OCFS2_SB(sb)) &&
+ S_ISREG(le16_to_cpu(di->i_mode)) &&
+ !(le32_to_cpu(di->i_flags) & OCFS2_SYSTEM_FL) &&
+ !(le16_to_cpu(di->i_dyn_features) & OCFS2_INLINE_DATA_FL) &&
+ le64_to_cpu(di->i_size) != 0 &&
+ le32_to_cpu(di->i_clusters) == 0) {
+ rc = ocfs2_error(sb,
+ "Invalid dinode #%llu: regular file i_size %llu with i_clusters 0 and no inline-data flag on non-sparse volume\n",
+ (unsigned long long)bh->b_blocknr,
+ (unsigned long long)le64_to_cpu(di->i_size));
+ goto bail;
+ }
+
if (le16_to_cpu(di->i_dyn_features) & OCFS2_INLINE_DATA_FL) {
struct ocfs2_inline_data *data = &di->id2.i_data;
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 3/3] ocfs2: reject regular files with non-zero i_size and zero i_clusters
2026-05-17 11:10 ` [PATCH 3/3] ocfs2: reject regular files with non-zero i_size and zero i_clusters Michael Bommarito
@ 2026-05-18 1:38 ` Joseph Qi
0 siblings, 0 replies; 10+ messages in thread
From: Joseph Qi @ 2026-05-18 1:38 UTC (permalink / raw)
To: Michael Bommarito, akpm
Cc: ZhengYuan Huang, ocfs2-devel, linux-fsdevel, linux-kernel,
Mark Fasheh, Joel Becker
On 5/17/26 7:10 PM, Michael Bommarito wrote:
> On a volume mounted WITHOUT OCFS2_FEATURE_INCOMPAT_SPARSE_ALLOC, a
> regular file with non-zero i_size, zero i_clusters, and no
> OCFS2_INLINE_DATA_FL flag is structurally malformed: the extent
> map declares no allocated clusters yet the size header claims
> the file has content. ocfs2_populate_inode() copies i_size into
> the in-core inode and dispatches to ocfs2_aops; subsequent reads
> or truncates then operate on an inconsistent extent state.
>
> This is the shape an attacker who keeps the rest of the extent
> list intact (to satisfy the inline-data, refcount, chain-list,
> and per-field validators already in this function) would produce
> when forging only the inode header to publish a synthetic file
> size on a victim node. It is also the shape on-disk corruption
> of the i_clusters field produces. Reject early in the
> validator.
>
> The check is restricted to non-sparse volumes
> (ocfs2_sparse_alloc() returns false). On non-sparse mounts the
> allocator path always grows clusters before i_size:
> ocfs2_extend_file() takes the !sparse branch into
> ocfs2_extend_no_holes(), which calls ocfs2_extend_allocation()
> to journal new clusters first, and only then
> ocfs2_simple_size_update() journals the larger i_size. The
> truncate path likewise lowers i_size in ocfs2_orphan_for_truncate()
> and then frees clusters in ocfs2_commit_truncate(), which uses
> ocfs2_clusters_for_bytes(new_i_size) as its new_highest_cpos:
> when new_i_size > 0 the floor is at least one cluster, so the
> on-disk dinode never legitimately exposes a non-inline regular
> file with i_size > 0 and i_clusters == 0 on a non-sparse volume.
>
> On sparse-alloc volumes the same shape is legitimate: an
> ocfs2_extend_file() call goes through ocfs2_zero_extend() +
> ocfs2_simple_size_update(), which grows i_size on its own
> without changing i_clusters; a freshly truncate -s 1M of a
> sparse regular file is therefore on-disk
> (i_size = 1048576, i_clusters = 0). The check therefore opts
> out via ocfs2_sparse_alloc(OCFS2_SB(sb)).
>
> System inodes (OCFS2_SYSTEM_FL) carry their own size and
> cluster invariants validated by the allocator, journal, quota,
> and truncate-log subsystems; skip them here. The inline-data
> fast path is filtered separately by its own dedicated branch
> below: its well-formed case is exactly i_clusters == 0 with
> i_size <= id_count. Symlinks legitimately keep i_clusters ==
> 0 with non-zero i_size (fast symlinks), so this check is
> restricted to S_IFREG.
>
> Fixes: b657c95c1108 ("ocfs2: Wrap inode block reads in a dedicated function.")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-7
Looks fine.
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> ---
> fs/ocfs2/inode.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 305e22cc9b1d9..c63d2ced6b338 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -1571,6 +1571,47 @@ int ocfs2_validate_inode_block(struct super_block *sb,
> goto bail;
> }
>
> + /*
> + * On a non-sparse volume, a regular file with non-zero i_size
> + * and zero i_clusters that is not marked as inline data is
> + * structurally malformed: the extent map declares no allocated
> + * clusters yet the size header claims the file has content.
> + * ocfs2_populate_inode() would still publish i_size to VFS and
> + * leave the extent state inconsistent for any later read or
> + * truncate. This is the shape an attacker who keeps the rest
> + * of the extent list intact (to satisfy the inline-data,
> + * refcount, chain-list, and per-field validators above) would
> + * produce when forging only the inode header to publish a
> + * synthetic file size on a victim node. It is also the shape
> + * on-disk corruption of the i_clusters field produces.
> + *
> + * The check opts out on sparse-alloc volumes, where the
> + * extend path (ocfs2_extend_file -> ocfs2_zero_extend ->
> + * ocfs2_simple_size_update) legitimately grows i_size without
> + * allocating clusters. On non-sparse volumes the equivalent
> + * path (ocfs2_extend_no_holes) journals clusters first and
> + * i_size second, and truncate-down floors i_clusters at
> + * ocfs2_clusters_for_bytes(new_i_size) which is >= 1 whenever
> + * new_i_size > 0, so the rejected shape never appears on disk.
> + *
> + * Skip system inodes (OCFS2_SYSTEM_FL) and the inline-data
> + * fast path (handled below). Symlinks legitimately keep
> + * i_clusters == 0 with non-zero i_size (fast symlinks), so
> + * restrict to S_IFREG.
> + */
> + if (!ocfs2_sparse_alloc(OCFS2_SB(sb)) &&
> + S_ISREG(le16_to_cpu(di->i_mode)) &&
> + !(le32_to_cpu(di->i_flags) & OCFS2_SYSTEM_FL) &&
> + !(le16_to_cpu(di->i_dyn_features) & OCFS2_INLINE_DATA_FL) &&
> + le64_to_cpu(di->i_size) != 0 &&
> + le32_to_cpu(di->i_clusters) == 0) {
> + rc = ocfs2_error(sb,
> + "Invalid dinode #%llu: regular file i_size %llu with i_clusters 0 and no inline-data flag on non-sparse volume\n",
> + (unsigned long long)bh->b_blocknr,
> + (unsigned long long)le64_to_cpu(di->i_size));
> + goto bail;
> + }
> +
> if (le16_to_cpu(di->i_dyn_features) & OCFS2_INLINE_DATA_FL) {
> struct ocfs2_inline_data *data = &di->id2.i_data;
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] ocfs2: harden inode validators against forged metadata
2026-05-17 11:10 [PATCH 0/3] ocfs2: harden inode validators against forged metadata Michael Bommarito
` (2 preceding siblings ...)
2026-05-17 11:10 ` [PATCH 3/3] ocfs2: reject regular files with non-zero i_size and zero i_clusters Michael Bommarito
@ 2026-05-18 21:40 ` Andrew Morton
2026-05-19 0:57 ` Michael Bommarito
2026-06-01 17:39 ` Joel Becker
4 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2026-05-18 21:40 UTC (permalink / raw)
To: Michael Bommarito
Cc: Joseph Qi, Mark Fasheh, Joel Becker, ZhengYuan Huang, ocfs2-devel,
linux-fsdevel, linux-kernel
On Sun, 17 May 2026 07:10:11 -0400 Michael Bommarito <michael.bommarito@gmail.com> wrote:
>
> This series adds three structural checks to
> ocfs2_validate_inode_block() that catch attacker-controlled bytes
> in a freshly read dinode before ocfs2_populate_inode() copies them
> verbatim into the in-core inode. All three checks fire on the
> mount, lookup, and read-after-cache-invalidation paths and reject
> the block with ocfs2_error(), the same error-propagation
> mechanism the existing suballoc-slot, inline-data, chain-list,
> and refcount checks use.
Thanks. Sashiko review might have found a few issues:
https://sashiko.dev/#/patchset/20260517111015.3187935-1-michael.bommarito@gmail.com
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 0/3] ocfs2: harden inode validators against forged metadata
2026-05-18 21:40 ` [PATCH 0/3] ocfs2: harden inode validators against forged metadata Andrew Morton
@ 2026-05-19 0:57 ` Michael Bommarito
0 siblings, 0 replies; 10+ messages in thread
From: Michael Bommarito @ 2026-05-19 0:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Joseph Qi, Mark Fasheh, Joel Becker, ZhengYuan Huang, ocfs2-devel,
linux-fsdevel, linux-kernel
On Mon, May 18, 2026 at 5:40 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> Thanks. Sashiko review might have found a few issues:
> https://sashiko.dev/#/patchset/20260517111015.3187935-1-michael.bommarito@gmail.com
Yes, I think patches 1 and 3 need a v2. Sashiko found a sibling issue
in patch 3, which is the most important part.
I'll send revisions later tonight.
Thanks,
Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] ocfs2: harden inode validators against forged metadata
2026-05-17 11:10 [PATCH 0/3] ocfs2: harden inode validators against forged metadata Michael Bommarito
` (3 preceding siblings ...)
2026-05-18 21:40 ` [PATCH 0/3] ocfs2: harden inode validators against forged metadata Andrew Morton
@ 2026-06-01 17:39 ` Joel Becker
4 siblings, 0 replies; 10+ messages in thread
From: Joel Becker @ 2026-06-01 17:39 UTC (permalink / raw)
To: Michael Bommarito
Cc: Joseph Qi, Mark Fasheh, ZhengYuan Huang, ocfs2-devel,
linux-fsdevel, linux-kernel
On Sun, May 17, 2026 at 07:10:11AM -0400, Michael Bommarito wrote:
> This series adds three structural checks to
> ocfs2_validate_inode_block() that catch attacker-controlled bytes
> in a freshly read dinode before ocfs2_populate_inode() copies them
> verbatim into the in-core inode. All three checks fire on the
>
> ...
>
> Threat model
> ============
>
> The validator is the chokepoint that protects
> ocfs2_populate_inode() from a malformed dinode whether the
> malformation got there via:
>
> (1) An attacker-supplied disk image mounted by a privileged
> user. The mount path runs every dinode through this
> validator before any unprivileged user opens a file on
> the volume. This is the same threat model the existing
> inline-data, refcount, and chain-list checks in this
> function were written for.
>
> (2) A compromised cluster peer with raw write access to the
> shared block device. OCFS2 is a clustered filesystem;
> the on-disk blocks behind bh->b_data live on shared
> storage that other cluster nodes can write. The local
> node's cache-eviction re-read runs the newly fetched
> block through this validator before ocfs2_populate_inode()
> runs again. Oracle's BlockErrorDetection design document
> scopes the existing CRC32 + Hamming integrity primitive
> explicitly as defense against memory and wire corruption,
> not as authentication of peer writes; the field-level
> validators are therefore the kernel-side defense
> whichever path produced the forged block.
Thank you for the excellent description of the threat model, and for
devising the comprehensive model in the first place. It really helps
consider these patches in the right context.
Can you make sure this info carries on to later versions of this patch
series, so it isn't lost?
Thanks,
Joel
--
"Every day I get up and look through the Forbes list of the richest
people in America. If I'm not there, I go to work."
- Robert Orben
http://www.jlbec.org/
jlbec@evilplan.org
^ permalink raw reply [flat|nested] 10+ messages in thread