From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,michael.bommarito@gmail.com,akpm@linux-foundation.org
Subject: [to-be-updated] ocfs2-reject-dinodes-with-non-canonical-i_mode-type-or-stray-bits.patch removed from -mm tree
Date: Tue, 19 May 2026 10:41:47 -0700 [thread overview]
Message-ID: <20260519174148.20882C2BCB3@smtp.kernel.org> (raw)
The quilt patch titled
Subject: ocfs2: reject dinodes with non-canonical i_mode type or stray bits
has been removed from the -mm tree. Its filename was
ocfs2-reject-dinodes-with-non-canonical-i_mode-type-or-stray-bits.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: Michael Bommarito <michael.bommarito@gmail.com>
Subject: ocfs2: reject dinodes with non-canonical i_mode type or stray bits
Date: Sun, 17 May 2026 07:10:12 -0400
Patch series "ocfs2: harden inode validators against forged metadata".
This series adds three structural checks to ocfs2_validate_inode_block()
that catch attacker-controlled bytes in a freshly read dinode before
ocfs2_populate_inode() copies them verbatim into the in-core inode. All
three checks fire on the mount, lookup, and read-after-cache-invalidation
paths and reject the block with ocfs2_error(), the same error-propagation
mechanism the existing suballoc-slot, inline-data, chain-list, and
refcount checks use.
Direction
=========
This continues the validator-hardening direction visible in the
recent in-flight ocfs2_validate_inode_block hardening series,
e.g. ZhengYuan Huang's "ocfs2: revalidate the journal dinode
before toggling dirty"
(<20260512024115.4036371-1-gality369@gmail.com>), "ocfs2: add
extent tree depth validation"
(<20260416110229.3283682-1-gality369@gmail.com>), and "ocfs2:
add extent list validation v2"
(<20260423094116.876696-1-gality369@gmail.com>). Each of those
adds a per-field invariant check against bytes that downstream
code paths trust unconditionally.
The three checks in this series cover three more fields whose
attacker-controlled values currently propagate into VFS-visible state
without question: i_mode (type bits and reserved bits), i_rdev
(cross-checked against the file type), and the i_size / i_clusters pairing
for regular files on non-sparse volumes (patch 3 is gated on
!ocfs2_sparse_alloc(); sparse- allocation mounts legitimately commit
i_size > 0 with i_clusters == 0 via ocfs2_zero_extend()).
Threat model
============
The validator is the chokepoint that protects ocfs2_populate_inode() from
a malformed dinode whether the malformation got there via:
(1) An attacker-supplied disk image mounted by a privileged
user. The mount path runs every dinode through this
validator before any unprivileged user opens a file on
the volume. This is the same threat model the existing
inline-data, refcount, and chain-list checks in this
function were written for.
(2) A compromised cluster peer with raw write access to the
shared block device. OCFS2 is a clustered filesystem;
the on-disk blocks behind bh->b_data live on shared
storage that other cluster nodes can write. The local
node's cache-eviction re-read runs the newly fetched
block through this validator before ocfs2_populate_inode()
runs again. Oracle's BlockErrorDetection design document
scopes the existing CRC32 + Hamming integrity primitive
explicitly as defense against memory and wire corruption,
not as authentication of peer writes; the field-level
validators are therefore the kernel-side defense
whichever path produced the forged block.
The three checks in this series are deliberately structural: they each
express an invariant mkfs.ocfs2 and the kernel maintain unconditionally,
and they reject any dinode whose header violates that invariant before its
bytes propagate to the in-core inode.
Scope note: these checks block forge patterns that touch i_mode (outside
the canonical envelope), i_rdev (on a non-device file), or the i_size /
i_clusters pair (regular file with size but no extents, on non-sparse
volumes only). A forger who keeps the dinode within these structural
envelopes (for example, flipping only the permission bits and uid/gid on a
regular file that already has clusters allocated) can still produce a
dinode that satisfies the field-level invariants; closing that residual
class is outside the scope of this hardening series.
Validation
==========
Each patch builds on top of the previous one against mainline; the series
as a whole builds clean against v7.1-rc1 with zero new warnings.
checkpatch --strict reports 0 errors, 0 warnings, 0 checks for each patch.
The series was exercised on a two-node QEMU cluster (virtio-blk
shared LUN with share-rw=on, both nodes joining the same o2cb
cluster, mounted ocfs2 with metaecc):
- Pre-series baseline: a peer-node raw-write forge that adds
S_ISUID and flips uid/gid to 0 on a regular file is accepted
by the existing validator chain; the unprivileged user on
the victim node exec()s the file and gains euid=0. This
confirms the cluster-peer write primitive is reachable in
today's mainline. Per the Scope note above, this particular
forge stays within the structural envelope these patches
enforce and is NOT blocked by them; closing it requires the
out-of-scope keyed-integrity work.
- Post-series, structural-variant forge: a peer-node forge
that, in addition to the setuid + uid/gid changes, stores
i_rdev = MKDEV(1,1) on the same regular-file dinode (the
cleanest cluster-context attacker primitive patch 2
catches) is rejected by ocfs2_validate_inode_block() with
ocfs2_error "non-device mode 0104755 with i_rdev N". The
buffer-head propagates -EFSCORRUPTED to
ocfs2_read_locked_inode and the user-visible result is
Permission denied on subsequent stat / open / exec of the
forged file. Analogous post-series forge variants that
flip i_mode outside the canonical envelope, or that set
i_size > 0 with i_clusters == 0 on a non-inline regular
file mounted from a non-sparse volume, are rejected by
patches 1 and 3 respectively.
A separate cluster regression (mount, peer-write a regular file,
drop_caches on the second node, read it back) runs clean post-series, so
the checks do not regress normal operation.
In-tree selftests under tools/testing/selftests/ that reference
fs/ocfs2/inode.c or any changed symbol were checked; no matching selftests
exist for ocfs2_validate_inode_block(), which is consistent with OCFS2
having no in-tree selftest coverage. The subsystem's standard regression
coverage is xfstests (the generic fs group) plus ocfs2-test, both out of
tree. Those were not run as part of this series; a full xfstests pass
before merge is recommended and I am happy to run a representative subset
and report results if reviewers would find it useful.
This patch (of 3):
ocfs2_validate_inode_block() currently accepts any 16-bit i_mode value as
long as i_mode is non-zero. ocfs2_populate_inode() then copies that mode
verbatim into inode->i_mode and dispatches on i_mode & S_IFMT to the
file/dir/symlink/special_file iops; any unrecognised type falls through to
ocfs2_special_file_iops and init_special_inode(), which interprets
id1.dev1.i_rdev as a device number.
The result is that anything able to forge or corrupt an inode block (a
hostile cluster peer with raw write access to the shared LUN, a privileged
user mounting an attacker-supplied image, on-disk corruption) can publish
an in-core inode whose type bits do not name a POSIX file type, or whose
permission bits carry bytes outside S_IFMT|07777. Both shapes propagate
into VFS-visible state that downstream code paths assume is well-formed.
Reject early in the validator:
- mode bits outside S_IFMT|07777
- S_IFMT values that are not one of S_IFREG, S_IFDIR, S_IFLNK,
S_IFCHR, S_IFBLK, S_IFIFO, S_IFSOCK
mkfs.ocfs2 and the kernel only ever produce these seven types plus the
standard permission, setuid/setgid/sticky bits; an on-disk i_mode outside
this envelope is structurally malformed regardless of how it got there.
Validated against the existing inline_data, refcount, and chain-list
checks: this hardening fires before any of them and does not perturb their
behaviour for well-formed inodes.
Link: https://lore.kernel.org/20260517111015.3187935-1-michael.bommarito@gmail.com
Link: https://lore.kernel.org/20260517111015.3187935-2-michael.bommarito@gmail.com
Fixes: b657c95c1108 ("ocfs2: Wrap inode block reads in a dedicated function.")
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Heming Zhao <heming.zhao@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: <stable@vger.kernel.org>b
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/ocfs2/inode.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
--- a/fs/ocfs2/inode.c~ocfs2-reject-dinodes-with-non-canonical-i_mode-type-or-stray-bits
+++ a/fs/ocfs2/inode.c
@@ -1494,6 +1494,45 @@ int ocfs2_validate_inode_block(struct su
goto bail;
}
+ /*
+ * Reject dinodes whose i_mode does not name one of the seven
+ * canonical POSIX file types, or whose mode carries bits outside
+ * S_IFMT | 07777. ocfs2_populate_inode() copies i_mode verbatim
+ * into inode->i_mode and then dispatches via switch (mode & S_IFMT)
+ * to file/dir/symlink/special_file iops; an unrecognised type
+ * falls into ocfs2_special_file_iops with init_special_inode(),
+ * which interprets i_rdev. Constrain the type byte here so the
+ * dispatch only ever sees a value mkfs.ocfs2 / VFS can produce.
+ */
+ {
+ u16 mode = le16_to_cpu(di->i_mode);
+
+ if (mode & ~(S_IFMT | 07777)) {
+ rc = ocfs2_error(sb,
+ "Invalid dinode #%llu: mode 0%o has bits outside S_IFMT|07777\n",
+ (unsigned long long)bh->b_blocknr,
+ mode);
+ goto bail;
+ }
+
+ switch (mode & S_IFMT) {
+ case S_IFREG:
+ case S_IFDIR:
+ case S_IFLNK:
+ case S_IFCHR:
+ case S_IFBLK:
+ case S_IFIFO:
+ case S_IFSOCK:
+ break;
+ default:
+ rc = ocfs2_error(sb,
+ "Invalid dinode #%llu: mode 0%o has unknown file type\n",
+ (unsigned long long)bh->b_blocknr,
+ mode);
+ goto bail;
+ }
+ }
+
if (le16_to_cpu(di->i_dyn_features) & OCFS2_INLINE_DATA_FL) {
struct ocfs2_inline_data *data = &di->id2.i_data;
_
Patches currently in -mm which might be from michael.bommarito@gmail.com are
ocfs2-reject-dinodes-whose-i_rdev-disagrees-with-the-file-type.patch
ocfs2-reject-regular-files-with-non-zero-i_size-and-zero-i_clusters.patch
reply other threads:[~2026-05-19 17:41 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260519174148.20882C2BCB3@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=michael.bommarito@gmail.com \
--cc=mm-commits@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.