All of lore.kernel.org
 help / color / mirror / Atom feed
* [to-be-updated] ocfs2-reject-dinodes-with-non-canonical-i_mode-type-or-stray-bits.patch removed from -mm tree
@ 2026-05-19 17:41 Andrew Morton
  0 siblings, 0 replies; only message in thread
From: Andrew Morton @ 2026-05-19 17:41 UTC (permalink / raw)
  To: mm-commits, michael.bommarito, akpm


The quilt patch titled
     Subject: ocfs2: reject dinodes with non-canonical i_mode type or stray bits
has been removed from the -mm tree.  Its filename was
     ocfs2-reject-dinodes-with-non-canonical-i_mode-type-or-stray-bits.patch

This patch was dropped because an updated version will be issued

------------------------------------------------------
From: Michael Bommarito <michael.bommarito@gmail.com>
Subject: ocfs2: reject dinodes with non-canonical i_mode type or stray bits
Date: Sun, 17 May 2026 07:10:12 -0400

Patch series "ocfs2: harden inode validators against forged metadata".

This series adds three structural checks to ocfs2_validate_inode_block()
that catch attacker-controlled bytes in a freshly read dinode before
ocfs2_populate_inode() copies them verbatim into the in-core inode.  All
three checks fire on the mount, lookup, and read-after-cache-invalidation
paths and reject the block with ocfs2_error(), the same error-propagation
mechanism the existing suballoc-slot, inline-data, chain-list, and
refcount checks use.

Direction
=========

This continues the validator-hardening direction visible in the
recent in-flight ocfs2_validate_inode_block hardening series,
e.g. ZhengYuan Huang's "ocfs2: revalidate the journal dinode
before toggling dirty"
(<20260512024115.4036371-1-gality369@gmail.com>), "ocfs2: add
extent tree depth validation"
(<20260416110229.3283682-1-gality369@gmail.com>), and "ocfs2:
add extent list validation v2"
(<20260423094116.876696-1-gality369@gmail.com>).  Each of those
adds a per-field invariant check against bytes that downstream
code paths trust unconditionally.

The three checks in this series cover three more fields whose
attacker-controlled values currently propagate into VFS-visible state
without question: i_mode (type bits and reserved bits), i_rdev
(cross-checked against the file type), and the i_size / i_clusters pairing
for regular files on non-sparse volumes (patch 3 is gated on
!ocfs2_sparse_alloc(); sparse- allocation mounts legitimately commit
i_size > 0 with i_clusters == 0 via ocfs2_zero_extend()).

Threat model
============

The validator is the chokepoint that protects ocfs2_populate_inode() from
a malformed dinode whether the malformation got there via:

  (1) An attacker-supplied disk image mounted by a privileged
      user.  The mount path runs every dinode through this
      validator before any unprivileged user opens a file on
      the volume.  This is the same threat model the existing
      inline-data, refcount, and chain-list checks in this
      function were written for.

  (2) A compromised cluster peer with raw write access to the
      shared block device.  OCFS2 is a clustered filesystem;
      the on-disk blocks behind bh->b_data live on shared
      storage that other cluster nodes can write.  The local
      node's cache-eviction re-read runs the newly fetched
      block through this validator before ocfs2_populate_inode()
      runs again.  Oracle's BlockErrorDetection design document
      scopes the existing CRC32 + Hamming integrity primitive
      explicitly as defense against memory and wire corruption,
      not as authentication of peer writes; the field-level
      validators are therefore the kernel-side defense
      whichever path produced the forged block.

The three checks in this series are deliberately structural: they each
express an invariant mkfs.ocfs2 and the kernel maintain unconditionally,
and they reject any dinode whose header violates that invariant before its
bytes propagate to the in-core inode.

Scope note: these checks block forge patterns that touch i_mode (outside
the canonical envelope), i_rdev (on a non-device file), or the i_size /
i_clusters pair (regular file with size but no extents, on non-sparse
volumes only).  A forger who keeps the dinode within these structural
envelopes (for example, flipping only the permission bits and uid/gid on a
regular file that already has clusters allocated) can still produce a
dinode that satisfies the field-level invariants; closing that residual
class is outside the scope of this hardening series.

Validation
==========

Each patch builds on top of the previous one against mainline; the series
as a whole builds clean against v7.1-rc1 with zero new warnings. 
checkpatch --strict reports 0 errors, 0 warnings, 0 checks for each patch.

The series was exercised on a two-node QEMU cluster (virtio-blk
shared LUN with share-rw=on, both nodes joining the same o2cb
cluster, mounted ocfs2 with metaecc):

  - Pre-series baseline: a peer-node raw-write forge that adds
    S_ISUID and flips uid/gid to 0 on a regular file is accepted
    by the existing validator chain; the unprivileged user on
    the victim node exec()s the file and gains euid=0.  This
    confirms the cluster-peer write primitive is reachable in
    today's mainline.  Per the Scope note above, this particular
    forge stays within the structural envelope these patches
    enforce and is NOT blocked by them; closing it requires the
    out-of-scope keyed-integrity work.
  - Post-series, structural-variant forge: a peer-node forge
    that, in addition to the setuid + uid/gid changes, stores
    i_rdev = MKDEV(1,1) on the same regular-file dinode (the
    cleanest cluster-context attacker primitive patch 2
    catches) is rejected by ocfs2_validate_inode_block() with
    ocfs2_error "non-device mode 0104755 with i_rdev N".  The
    buffer-head propagates -EFSCORRUPTED to
    ocfs2_read_locked_inode and the user-visible result is
    Permission denied on subsequent stat / open / exec of the
    forged file.  Analogous post-series forge variants that
    flip i_mode outside the canonical envelope, or that set
    i_size > 0 with i_clusters == 0 on a non-inline regular
    file mounted from a non-sparse volume, are rejected by
    patches 1 and 3 respectively.

A separate cluster regression (mount, peer-write a regular file,
drop_caches on the second node, read it back) runs clean post-series, so
the checks do not regress normal operation.

In-tree selftests under tools/testing/selftests/ that reference
fs/ocfs2/inode.c or any changed symbol were checked; no matching selftests
exist for ocfs2_validate_inode_block(), which is consistent with OCFS2
having no in-tree selftest coverage.  The subsystem's standard regression
coverage is xfstests (the generic fs group) plus ocfs2-test, both out of
tree.  Those were not run as part of this series; a full xfstests pass
before merge is recommended and I am happy to run a representative subset
and report results if reviewers would find it useful.


This patch (of 3):

ocfs2_validate_inode_block() currently accepts any 16-bit i_mode value as
long as i_mode is non-zero.  ocfs2_populate_inode() then copies that mode
verbatim into inode->i_mode and dispatches on i_mode & S_IFMT to the
file/dir/symlink/special_file iops; any unrecognised type falls through to
ocfs2_special_file_iops and init_special_inode(), which interprets
id1.dev1.i_rdev as a device number.

The result is that anything able to forge or corrupt an inode block (a
hostile cluster peer with raw write access to the shared LUN, a privileged
user mounting an attacker-supplied image, on-disk corruption) can publish
an in-core inode whose type bits do not name a POSIX file type, or whose
permission bits carry bytes outside S_IFMT|07777.  Both shapes propagate
into VFS-visible state that downstream code paths assume is well-formed.

Reject early in the validator:

  - mode bits outside S_IFMT|07777
  - S_IFMT values that are not one of S_IFREG, S_IFDIR, S_IFLNK,
    S_IFCHR, S_IFBLK, S_IFIFO, S_IFSOCK

mkfs.ocfs2 and the kernel only ever produce these seven types plus the
standard permission, setuid/setgid/sticky bits; an on-disk i_mode outside
this envelope is structurally malformed regardless of how it got there.

Validated against the existing inline_data, refcount, and chain-list
checks: this hardening fires before any of them and does not perturb their
behaviour for well-formed inodes.

Link: https://lore.kernel.org/20260517111015.3187935-1-michael.bommarito@gmail.com
Link: https://lore.kernel.org/20260517111015.3187935-2-michael.bommarito@gmail.com
Fixes: b657c95c1108 ("ocfs2: Wrap inode block reads in a dedicated function.")
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Heming Zhao <heming.zhao@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: <stable@vger.kernel.org>b
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/inode.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

--- a/fs/ocfs2/inode.c~ocfs2-reject-dinodes-with-non-canonical-i_mode-type-or-stray-bits
+++ a/fs/ocfs2/inode.c
@@ -1494,6 +1494,45 @@ int ocfs2_validate_inode_block(struct su
 		goto bail;
 	}
 
+	/*
+	 * Reject dinodes whose i_mode does not name one of the seven
+	 * canonical POSIX file types, or whose mode carries bits outside
+	 * S_IFMT | 07777.  ocfs2_populate_inode() copies i_mode verbatim
+	 * into inode->i_mode and then dispatches via switch (mode & S_IFMT)
+	 * to file/dir/symlink/special_file iops; an unrecognised type
+	 * falls into ocfs2_special_file_iops with init_special_inode(),
+	 * which interprets i_rdev.  Constrain the type byte here so the
+	 * dispatch only ever sees a value mkfs.ocfs2 / VFS can produce.
+	 */
+	{
+		u16 mode = le16_to_cpu(di->i_mode);
+
+		if (mode & ~(S_IFMT | 07777)) {
+			rc = ocfs2_error(sb,
+					 "Invalid dinode #%llu: mode 0%o has bits outside S_IFMT|07777\n",
+					 (unsigned long long)bh->b_blocknr,
+					 mode);
+			goto bail;
+		}
+
+		switch (mode & S_IFMT) {
+		case S_IFREG:
+		case S_IFDIR:
+		case S_IFLNK:
+		case S_IFCHR:
+		case S_IFBLK:
+		case S_IFIFO:
+		case S_IFSOCK:
+			break;
+		default:
+			rc = ocfs2_error(sb,
+					 "Invalid dinode #%llu: mode 0%o has unknown file type\n",
+					 (unsigned long long)bh->b_blocknr,
+					 mode);
+			goto bail;
+		}
+	}
+
 	if (le16_to_cpu(di->i_dyn_features) & OCFS2_INLINE_DATA_FL) {
 		struct ocfs2_inline_data *data = &di->id2.i_data;
 
_

Patches currently in -mm which might be from michael.bommarito@gmail.com are

ocfs2-reject-dinodes-whose-i_rdev-disagrees-with-the-file-type.patch
ocfs2-reject-regular-files-with-non-zero-i_size-and-zero-i_clusters.patch


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-05-19 17:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-19 17:41 [to-be-updated] ocfs2-reject-dinodes-with-non-canonical-i_mode-type-or-stray-bits.patch removed from -mm tree Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.