The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH 0/3] ocfs2: harden inode validators against forged metadata
@ 2026-05-17 11:10 Michael Bommarito
  2026-05-17 11:10 ` [PATCH 1/3] ocfs2: reject dinodes with non-canonical i_mode type or stray bits Michael Bommarito
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Michael Bommarito @ 2026-05-17 11:10 UTC (permalink / raw)
  To: Joseph Qi, Mark Fasheh, Joel Becker
  Cc: ZhengYuan Huang, ocfs2-devel, linux-fsdevel, linux-kernel

This series adds three structural checks to
ocfs2_validate_inode_block() that catch attacker-controlled bytes
in a freshly read dinode before ocfs2_populate_inode() copies them
verbatim into the in-core inode.  All three checks fire on the
mount, lookup, and read-after-cache-invalidation paths and reject
the block with ocfs2_error(), the same error-propagation
mechanism the existing suballoc-slot, inline-data, chain-list,
and refcount checks use.

Direction
=========

This continues the validator-hardening direction visible in the
recent in-flight ocfs2_validate_inode_block hardening series,
e.g. ZhengYuan Huang's "ocfs2: revalidate the journal dinode
before toggling dirty"
(<20260512024115.4036371-1-gality369@gmail.com>), "ocfs2: add
extent tree depth validation"
(<20260416110229.3283682-1-gality369@gmail.com>), and "ocfs2:
add extent list validation v2"
(<20260423094116.876696-1-gality369@gmail.com>).  Each of those
adds a per-field invariant check against bytes that downstream
code paths trust unconditionally.

The three checks in this series cover three more fields whose
attacker-controlled values currently propagate into VFS-visible
state without question: i_mode (type bits and reserved bits),
i_rdev (cross-checked against the file type), and the
i_size / i_clusters pairing for regular files on non-sparse
volumes (patch 3 is gated on !ocfs2_sparse_alloc(); sparse-
allocation mounts legitimately commit i_size > 0 with
i_clusters == 0 via ocfs2_zero_extend()).

Threat model
============

The validator is the chokepoint that protects
ocfs2_populate_inode() from a malformed dinode whether the
malformation got there via:

  (1) An attacker-supplied disk image mounted by a privileged
      user.  The mount path runs every dinode through this
      validator before any unprivileged user opens a file on
      the volume.  This is the same threat model the existing
      inline-data, refcount, and chain-list checks in this
      function were written for.

  (2) A compromised cluster peer with raw write access to the
      shared block device.  OCFS2 is a clustered filesystem;
      the on-disk blocks behind bh->b_data live on shared
      storage that other cluster nodes can write.  The local
      node's cache-eviction re-read runs the newly fetched
      block through this validator before ocfs2_populate_inode()
      runs again.  Oracle's BlockErrorDetection design document
      scopes the existing CRC32 + Hamming integrity primitive
      explicitly as defense against memory and wire corruption,
      not as authentication of peer writes; the field-level
      validators are therefore the kernel-side defense
      whichever path produced the forged block.

The three checks in this series are deliberately structural:
they each express an invariant mkfs.ocfs2 and the kernel
maintain unconditionally, and they reject any dinode whose
header violates that invariant before its bytes propagate to
the in-core inode.

Scope note: these checks block forge patterns that touch i_mode
(outside the canonical envelope), i_rdev (on a non-device file),
or the i_size / i_clusters pair (regular file with size but no
extents, on non-sparse volumes only).  A forger who keeps the
dinode within these structural envelopes (for example, flipping
only the permission bits and uid/gid on a regular file that
already has clusters allocated) can still produce a dinode that
satisfies the field-level invariants; closing that residual
class is outside the scope of this hardening series.

Validation
==========

Each patch builds on top of the previous one against mainline;
the series as a whole builds clean against v7.1-rc1 with zero
new warnings.  checkpatch --strict reports 0 errors, 0 warnings,
0 checks for each patch.

The series was exercised on a two-node QEMU cluster (virtio-blk
shared LUN with share-rw=on, both nodes joining the same o2cb
cluster, mounted ocfs2 with metaecc):

  - Pre-series baseline: a peer-node raw-write forge that adds
    S_ISUID and flips uid/gid to 0 on a regular file is accepted
    by the existing validator chain; the unprivileged user on
    the victim node exec()s the file and gains euid=0.  This
    confirms the cluster-peer write primitive is reachable in
    today's mainline.  Per the Scope note above, this particular
    forge stays within the structural envelope these patches
    enforce and is NOT blocked by them; closing it requires the
    out-of-scope keyed-integrity work.
  - Post-series, structural-variant forge: a peer-node forge
    that, in addition to the setuid + uid/gid changes, stores
    i_rdev = MKDEV(1,1) on the same regular-file dinode (the
    cleanest cluster-context attacker primitive patch 2
    catches) is rejected by ocfs2_validate_inode_block() with
    ocfs2_error "non-device mode 0104755 with i_rdev N".  The
    buffer-head propagates -EFSCORRUPTED to
    ocfs2_read_locked_inode and the user-visible result is
    Permission denied on subsequent stat / open / exec of the
    forged file.  Analogous post-series forge variants that
    flip i_mode outside the canonical envelope, or that set
    i_size > 0 with i_clusters == 0 on a non-inline regular
    file mounted from a non-sparse volume, are rejected by
    patches 1 and 3 respectively.

A separate cluster regression (mount, peer-write a regular
file, drop_caches on the second node, read it back) runs
clean post-series, so the checks do not regress normal
operation.

In-tree selftests under tools/testing/selftests/ that
reference fs/ocfs2/inode.c or any changed symbol were checked;
no matching selftests exist for ocfs2_validate_inode_block(),
which is consistent with OCFS2 having no in-tree selftest
coverage.  The subsystem's standard regression coverage is
xfstests (the generic fs group) plus ocfs2-test, both out of
tree.  Those were not run as part of this series; a full
xfstests pass before merge is recommended and I am happy to
run a representative subset and report results if reviewers
would find it useful.

Patches
=======

Michael Bommarito (3):
  ocfs2: reject dinodes with non-canonical i_mode type or stray bits
  ocfs2: reject dinodes whose i_rdev disagrees with the file type
  ocfs2: reject regular files with non-zero i_size and zero i_clusters

 fs/ocfs2/inode.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)

--
2.53.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-06-01 17:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-17 11:10 [PATCH 0/3] ocfs2: harden inode validators against forged metadata Michael Bommarito
2026-05-17 11:10 ` [PATCH 1/3] ocfs2: reject dinodes with non-canonical i_mode type or stray bits Michael Bommarito
2026-05-18  1:36   ` Joseph Qi
2026-05-17 11:10 ` [PATCH 2/3] ocfs2: reject dinodes whose i_rdev disagrees with the file type Michael Bommarito
2026-05-18  1:37   ` Joseph Qi
2026-05-17 11:10 ` [PATCH 3/3] ocfs2: reject regular files with non-zero i_size and zero i_clusters Michael Bommarito
2026-05-18  1:38   ` Joseph Qi
2026-05-18 21:40 ` [PATCH 0/3] ocfs2: harden inode validators against forged metadata Andrew Morton
2026-05-19  0:57   ` Michael Bommarito
2026-06-01 17:39 ` Joel Becker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox