* [PATCH AUTOSEL 6.19-6.18] hfsplus: fix volume corruption issue for generic/480
[not found] <20260210233123.2905307-1-sashal@kernel.org>
@ 2026-02-10 23:30 ` Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.1] fs/buffer: add alert in try_to_free_buffers() for folios without buffers Sasha Levin
` (5 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-02-10 23:30 UTC (permalink / raw)
To: patches, stable
Cc: Viacheslav Dubeyko, John Paul Adrian Glaubitz, Yangtao Li,
linux-fsdevel, Sasha Levin
From: Viacheslav Dubeyko <slava@dubeyko.com>
[ Upstream commit bea4429eb30190c59b5ac7c8ff6c90176c7c110f ]
The xfstests' test-case generic/480 leaves HFS+ volume
in corrupted state:
sudo ./check generic/480
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.17.0-rc1+ #4 SMP PREEMPT_DYNAMIC Wed Oct 1 15:02:44 PDT 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
generic/480 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent
(see XFSTESTS-2/xfstests-dev/results//generic/480.full for details)
Ran: generic/480
Failures: generic/480
Failed 1 of 1 tests
sudo fsck.hfsplus -d /dev/loop51
** /dev/loop51
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
CheckHardLinks: found 1 pre-Leopard file inodes.
Incorrect number of file hard links
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
invalid VHB nextCatalogID
Volume header needs minor repair
(2, 0)
Verify Status: VIStat = 0x8000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x0000 CatStat = 0x00000002
** Repairing volume.
Incorrect flags for file hard link (id = 19)
(It should be 0x22 instead of 0x2)
Incorrect flags for file inode (id = 18)
(It should be 0x22 instead of 0x2)
first link ID=0 is < 16 for fileinode=18
Error getting first link ID for inode = 18 (result=2)
Invalid first link in hard link chain (id = 18)
(It should be 19 instead of 0)
Indirect node 18 needs link count adjustment
(It should be 1 instead of 2)
** Rechecking volume.
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled was repaired successfully.
The generic/480 test executes such steps on final phase:
"Now remove of the links of our file and create
a new file with the same name and in the same
parent directory, and finally fsync this new file."
unlink $SCRATCH_MNT/testdir/bar
touch $SCRATCH_MNT/testdir/bar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/bar
"Simulate a power failure and mount the filesystem
to check that replaying the fsync log/journal
succeeds, that is the mount operation does not fail."
_flakey_drop_and_remount
The key issue in HFS+ logic is that hfsplus_link(),
hfsplus_unlink(), hfsplus_rmdir(), hfsplus_symlink(),
and hfsplus_mknod() methods don't call
hfsplus_cat_write_inode() for the case of modified
inode objects. As a result, even if hfsplus_file_fsync()
is trying to flush the dirty Catalog File, but because of
not calling hfsplus_cat_write_inode() not all modified
inodes save the new state into Catalog File's records.
Finally, simulation of power failure results in inconsistent
state of Catalog File and FSCK tool reports about
volume corruption.
This patch adds calling of hfsplus_cat_write_inode()
method for modified inodes in hfsplus_link(),
hfsplus_unlink(), hfsplus_rmdir(), hfsplus_symlink(),
and hfsplus_mknod() methods. Also, it adds debug output
in several methods.
sudo ./check generic/480
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc1+ #18 SMP PREEMPT_DYNAMIC Thu Dec 4 12:24:45 PST 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
generic/480 16s ... 16s
Ran: generic/480
Passed all 1 tests
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20251205000054.3670326-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Here is my complete analysis:
---
## Commit Analysis: "hfsplus: fix volume corruption issue for
generic/480"
### 1. COMMIT MESSAGE ANALYSIS
The subject line explicitly says "fix volume corruption" - this is a
high-severity filesystem integrity bug. The commit message is
exceptionally detailed:
- Provides full xfstests output showing the corruption
- Includes `fsck.hfsplus` output documenting the specific corruption
found: "Incorrect number of file hard links", "invalid VHB
nextCatalogID", "Incorrect flags for file hard link"
- Describes the exact test scenario: unlink a hard link, create new file
with same name, fsync, simulate power failure
- Explains the root cause clearly: directory operations modify inodes
in-memory but don't persist changes to the HFS+ Catalog File B-tree
- Shows "after" test results proving the fix works
- Author is Viacheslav Dubeyko, the HFS+ subsystem maintainer
### 2. CODE CHANGE ANALYSIS
**The root cause**: HFS+ has a two-level write system:
1. VFS `mark_inode_dirty()` -> deferred writeback ->
`hfsplus_write_inode()` -> `hfsplus_cat_write_inode()` (asynchronous)
2. Direct `hfsplus_cat_write_inode()` call (synchronous, writes
immediately to catalog B-tree)
Five directory operations — `hfsplus_link()`, `hfsplus_unlink()`,
`hfsplus_rmdir()`, `hfsplus_symlink()`, `hfsplus_mknod()` — modify
multiple inodes (directory entries, link counts, timestamps) but only
called `mark_inode_dirty()` without directly writing to the Catalog
File. If a power failure occurs between the directory operation and the
deferred writeback, the on-disk catalog tree becomes inconsistent.
**The fix adds explicit `hfsplus_cat_write_inode()` calls**:
- `hfsplus_link()`: Writes `src_dir`, `dst_dir`, `sbi->hidden_dir`, and
`inode` to catalog
- `hfsplus_unlink()`: Writes `dir`, `sbi->hidden_dir`, and `inode` to
catalog (only on success)
- `hfsplus_rmdir()`: Writes `dir` to catalog
- `hfsplus_symlink()`: Writes `dir` and `inode` to catalog
- `hfsplus_mknod()`: Writes `dir` and `inode` to catalog
**This pattern already existed in `hfsplus_rename()`** (added by commit
24e17a29cf753 which fixed the same class of bug for generic/073):
```532:563:fs/hfsplus/dir.c
static int hfsplus_rename(struct mnt_idmap *idmap,
struct inode *old_dir, struct dentry
*old_dentry,
struct inode *new_dir, struct dentry
*new_dentry,
unsigned int flags)
{
// ...
if (!res) {
new_dentry->d_fsdata = old_dentry->d_fsdata;
res = hfsplus_cat_write_inode(old_dir);
if (!res)
res = hfsplus_cat_write_inode(new_dir);
}
return res;
}
```
The current commit extends this same proven pattern to the five other
directory operations that were missing it.
**Additional changes**: The commit adds `hfs_dbg()` debug output in
multiple functions. The `hfs_dbg` macro maps to `pr_debug()` (compiled
out by default), so this is harmless at runtime. One minor whitespace
alignment change in `hfsplus_link()`.
### 3. CLASSIFICATION
- **Bug type**: Filesystem data corruption / inconsistency on power
failure
- **Category**: Missing write-through of inode metadata to on-disk
B-tree
- **Severity**: HIGH - filesystem corruption detected by fsck, incorrect
hard link counts, flags, catalog IDs
- **Reproducibility**: Reliably reproducible with xfstests generic/480
### 4. SCOPE AND RISK ASSESSMENT
- **Files changed**: 2 (fs/hfsplus/dir.c, fs/hfsplus/inode.c)
- **Functional change**: ~33 lines of `hfsplus_cat_write_inode()` calls
+ error handling
- **Debug noise**: ~10 lines of `hfs_dbg()` additions
- **Risk**: LOW - `hfsplus_cat_write_inode()` is already called from the
VFS writeback path (`hfsplus_write_inode()`) and from
`hfsplus_rename()`, so it's thoroughly tested. The new calls add it at
the same logical points in other directory operations.
- **Subsystem**: HFS+ filesystem - relatively narrow impact (only
affects HFS+ users)
### 5. USER IMPACT
HFS+ is used for:
- Mac OS X / macOS compatibility
- External storage devices formatted as HFS+
- Dual-boot Linux/Mac systems
Without this fix, operations involving hard links, file deletion,
symlinks, or device node creation can lead to volume corruption after
power failure. The corruption manifests as incorrect link counts, wrong
catalog IDs, and incorrect file flags - requiring fsck repair.
### 6. DEPENDENCY CHECK
**The core fix is self-contained.** The `hfsplus_cat_write_inode()`
function exists in all kernel versions with HFS+ support. The additions
to `hfsplus_link()`, `hfsplus_unlink()`, `hfsplus_rmdir()`,
`hfsplus_symlink()`, and `hfsplus_mknod()` are independent of the
earlier `hfsplus_rename()` fix (24e17a29cf753).
**Backport concern**: The `hfs_dbg()` macro signature changed in v6.18
(commit f32a26fab3672). Before 6.18, `hfs_dbg` required a flag argument
as the first parameter (e.g., `hfs_dbg(INODE, "...")`). The current
commit uses the new format (no flag argument). For stable trees before
6.18, the `hfs_dbg()` additions would need to be stripped from the
backport, but the core fix applies cleanly without them.
### 7. STABILITY INDICATORS
- Author is the HFS+ subsystem maintainer
- This is the third in a series of similar corruption fixes
(generic/073, generic/101, generic/480), demonstrating the author has
deep understanding of the issue class
- Verified with xfstests before and after
- The pattern used (calling `hfsplus_cat_write_inode()` from directory
operations) is already proven in `hfsplus_rename()`
### CONCLUSION
This commit fixes a real, reproducible filesystem corruption bug in HFS+
that manifests during power failure scenarios. The fix is conceptually
simple - ensuring that all modified inodes are persisted to the catalog
B-tree during directory operations, matching the pattern already
established in `hfsplus_rename()`. The risk is low because the called
function is well-tested and the change pattern is proven. The debug
output additions are harmless noise that may need stripping for older
stable trees but don't affect the core fix.
Filesystem corruption is one of the highest-priority categories for
stable backporting, as it directly impacts data integrity.
**YES**
fs/hfsplus/dir.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
fs/hfsplus/inode.c | 5 +++++
2 files changed, 50 insertions(+), 1 deletion(-)
diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
index cadf0b5f93422..ca5f74a140ec1 100644
--- a/fs/hfsplus/dir.c
+++ b/fs/hfsplus/dir.c
@@ -313,6 +313,9 @@ static int hfsplus_link(struct dentry *src_dentry, struct inode *dst_dir,
if (!S_ISREG(inode->i_mode))
return -EPERM;
+ hfs_dbg("src_dir->i_ino %lu, dst_dir->i_ino %lu, inode->i_ino %lu\n",
+ src_dir->i_ino, dst_dir->i_ino, inode->i_ino);
+
mutex_lock(&sbi->vh_mutex);
if (inode->i_ino == (u32)(unsigned long)src_dentry->d_fsdata) {
for (;;) {
@@ -332,7 +335,7 @@ static int hfsplus_link(struct dentry *src_dentry, struct inode *dst_dir,
cnid = sbi->next_cnid++;
src_dentry->d_fsdata = (void *)(unsigned long)cnid;
res = hfsplus_create_cat(cnid, src_dir,
- &src_dentry->d_name, inode);
+ &src_dentry->d_name, inode);
if (res)
/* panic? */
goto out;
@@ -350,6 +353,21 @@ static int hfsplus_link(struct dentry *src_dentry, struct inode *dst_dir,
mark_inode_dirty(inode);
sbi->file_count++;
hfsplus_mark_mdb_dirty(dst_dir->i_sb);
+
+ res = hfsplus_cat_write_inode(src_dir);
+ if (res)
+ goto out;
+
+ res = hfsplus_cat_write_inode(dst_dir);
+ if (res)
+ goto out;
+
+ res = hfsplus_cat_write_inode(sbi->hidden_dir);
+ if (res)
+ goto out;
+
+ res = hfsplus_cat_write_inode(inode);
+
out:
mutex_unlock(&sbi->vh_mutex);
return res;
@@ -367,6 +385,9 @@ static int hfsplus_unlink(struct inode *dir, struct dentry *dentry)
if (HFSPLUS_IS_RSRC(inode))
return -EPERM;
+ hfs_dbg("dir->i_ino %lu, inode->i_ino %lu\n",
+ dir->i_ino, inode->i_ino);
+
mutex_lock(&sbi->vh_mutex);
cnid = (u32)(unsigned long)dentry->d_fsdata;
if (inode->i_ino == cnid &&
@@ -408,6 +429,15 @@ static int hfsplus_unlink(struct inode *dir, struct dentry *dentry)
inode_set_ctime_current(inode);
mark_inode_dirty(inode);
out:
+ if (!res) {
+ res = hfsplus_cat_write_inode(dir);
+ if (!res) {
+ res = hfsplus_cat_write_inode(sbi->hidden_dir);
+ if (!res)
+ res = hfsplus_cat_write_inode(inode);
+ }
+ }
+
mutex_unlock(&sbi->vh_mutex);
return res;
}
@@ -429,6 +459,8 @@ static int hfsplus_rmdir(struct inode *dir, struct dentry *dentry)
inode_set_ctime_current(inode);
hfsplus_delete_inode(inode);
mark_inode_dirty(inode);
+
+ res = hfsplus_cat_write_inode(dir);
out:
mutex_unlock(&sbi->vh_mutex);
return res;
@@ -465,6 +497,12 @@ static int hfsplus_symlink(struct mnt_idmap *idmap, struct inode *dir,
hfsplus_instantiate(dentry, inode, inode->i_ino);
mark_inode_dirty(inode);
+
+ res = hfsplus_cat_write_inode(dir);
+ if (res)
+ goto out;
+
+ res = hfsplus_cat_write_inode(inode);
goto out;
out_err:
@@ -506,6 +544,12 @@ static int hfsplus_mknod(struct mnt_idmap *idmap, struct inode *dir,
hfsplus_instantiate(dentry, inode, inode->i_ino);
mark_inode_dirty(inode);
+
+ res = hfsplus_cat_write_inode(dir);
+ if (res)
+ goto out;
+
+ res = hfsplus_cat_write_inode(inode);
goto out;
failed_mknod:
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index 7ae6745ca7ae1..c762bf909d1aa 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -328,6 +328,9 @@ int hfsplus_file_fsync(struct file *file, loff_t start, loff_t end,
struct hfsplus_vh *vhdr = sbi->s_vhdr;
int error = 0, error2;
+ hfs_dbg("inode->i_ino %lu, start %llu, end %llu\n",
+ inode->i_ino, start, end);
+
error = file_write_and_wait_range(file, start, end);
if (error)
return error;
@@ -616,6 +619,8 @@ int hfsplus_cat_write_inode(struct inode *inode)
hfsplus_cat_entry entry;
int res = 0;
+ hfs_dbg("inode->i_ino %lu\n", inode->i_ino);
+
if (HFSPLUS_IS_RSRC(inode))
main_inode = HFSPLUS_I(inode)->rsrc_inode;
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH AUTOSEL 6.19-6.1] fs/buffer: add alert in try_to_free_buffers() for folios without buffers
[not found] <20260210233123.2905307-1-sashal@kernel.org>
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.18] hfsplus: fix volume corruption issue for generic/480 Sasha Levin
@ 2026-02-10 23:30 ` Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-6.12] statmount: permission check should return EPERM Sasha Levin
` (4 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-02-10 23:30 UTC (permalink / raw)
To: patches, stable
Cc: Deepakkumar Karn, Jan Kara, Christian Brauner, Sasha Levin, viro,
linux-fsdevel
From: Deepakkumar Karn <dkarn@redhat.com>
[ Upstream commit b68f91ef3b3fe82ad78c417de71b675699a8467c ]
try_to_free_buffers() can be called on folios with no buffers attached
when filemap_release_folio() is invoked on a folio belonging to a mapping
with AS_RELEASE_ALWAYS set but no release_folio operation defined.
In such cases, folio_needs_release() returns true because of the
AS_RELEASE_ALWAYS flag, but the folio has no private buffer data. This
causes try_to_free_buffers() to call drop_buffers() on a folio with no
buffers, leading to a null pointer dereference.
Adding a check in try_to_free_buffers() to return early if the folio has no
buffers attached, with WARN_ON_ONCE() to alert about the misconfiguration.
This provides defensive hardening.
Signed-off-by: Deepakkumar Karn <dkarn@redhat.com>
Link: https://patch.msgid.link/20251211131211.308021-1-dkarn@redhat.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
`folio_buffers()` exists in all relevant stable trees. The patch uses
only `WARN_ON_ONCE` and `folio_buffers`, both of which are available
everywhere.
## Comprehensive Analysis
### 1. COMMIT MESSAGE ANALYSIS
The commit describes a clear bug: `try_to_free_buffers()` can be called
on folios with no buffer_heads attached when `filemap_release_folio()`
is invoked on a folio belonging to a mapping with `AS_RELEASE_ALWAYS`
set but no `release_folio` operation defined. This causes a **NULL
pointer dereference** in `drop_buffers()`.
The commit is authored by Deepakkumar Karn (Red Hat), reviewed by Jan
Kara (SUSE, a very well-respected VFS/filesystem developer), and signed
off by Christian Brauner (the VFS subsystem maintainer).
### 2. CODE CHANGE ANALYSIS
The fix adds exactly 3 lines (plus a comment) to
`try_to_free_buffers()`:
```c
/* Misconfigured folio check */
if (WARN_ON_ONCE(!folio_buffers(folio)))
return true;
```
This is inserted after the writeback check and before the `mapping ==
NULL` check. The logic is:
- If the folio has no buffers (`folio_buffers()` returns NULL), emit a
warning (once) and return `true` (success — there's nothing to free)
- This prevents the NULL deref in `drop_buffers()` where `head =
folio_buffers(folio)` returns NULL and then `buffer_busy(bh)`
dereferences it
### 3. THE BUG MECHANISM
The call chain is:
1. Memory reclaim (`shrink_folio_list`) or truncation calls
`filemap_release_folio()`
2. `folio_needs_release()` returns `true` because `AS_RELEASE_ALWAYS` is
set
3. `mapping->a_ops->release_folio` is NULL (e.g., `afs_dir_aops` doesn't
define it in v6.14+)
4. Falls through to `try_to_free_buffers(folio)`
5. `drop_buffers()` dereferences `folio_buffers(folio)` which is NULL
6. **NULL pointer dereference crash**
### 4. AFFECTED VERSIONS
The specific trigger (AFS symlinks/directories with `afs_dir_aops`
lacking `release_folio`) only exists from **v6.14-rc1** onward,
introduced by commit `6dd80936618c` ("afs: Use netfslib for
directories"). Before v6.14, all filesystems that set
`AS_RELEASE_ALWAYS` also define `release_folio`.
This means:
- **v6.6.y**: NOT affected (afs_dir_aops has release_folio)
- **v6.12.y**: NOT affected (afs_dir_aops has release_folio)
- **v6.13.y**: NOT affected (afs_dir_aops has release_folio)
- **v6.14.y**: AFFECTED (afs_dir_aops stripped down)
- **v6.15.y+**: AFFECTED
### 5. RISK ASSESSMENT
**Risk: Very Low**
- The patch adds 3 lines - a NULL check with WARN_ON_ONCE and early
return
- It's completely self-contained, no dependencies
- `folio_buffers()` and `WARN_ON_ONCE` are available in all stable trees
- Returning `true` is correct: no buffers = nothing to free = success
- Reviewed by Jan Kara and Christian Brauner (top VFS experts)
- Could be considered pure defensive hardening in older kernels where
the trigger doesn't exist
**Benefit in affected trees (6.14+): High**
- Prevents a NULL pointer dereference crash reachable from normal memory
reclaim
- AFS symlinks with fscache enabled hitting memory pressure would crash
**Benefit in unaffected trees (6.6, 6.12): Low**
- Defensive hardening only - the specific trigger doesn't exist
- Protects against potential future similar misconfigurations if other
patches are backported
### 6. STABLE BACKPORT SUITABILITY
For trees where the bug is actually triggerable (6.14+), this is clearly
a YES - it fixes a NULL pointer dereference crash in a core kernel path
(memory reclaim). However, for older stable trees (6.6, 6.12), the bug
isn't reachable with the current code because all filesystems that set
`AS_RELEASE_ALWAYS` also properly define `release_folio`.
The fix itself is trivially small (3 lines), obviously correct, has zero
regression risk, and is well-reviewed. Even as defensive hardening, it's
the kind of patch that makes stable kernels more robust.
For the broader question of "should this be backported to stable trees":
The answer depends on which tree. For 6.14.y and later, it's an
unambiguous yes - it fixes a real crash. For 6.6.y and 6.12.y, it's
defensive hardening that prevents a crash scenario that currently isn't
reachable but could become reachable if other patches are backported.
Given the negligible risk and the fact that it hardens a core kernel
function against NULL dereference, even the older trees benefit.
**YES**
fs/buffer.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/buffer.c b/fs/buffer.c
index 838c0c5710229..28e4d53f17173 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2948,6 +2948,10 @@ bool try_to_free_buffers(struct folio *folio)
if (folio_test_writeback(folio))
return false;
+ /* Misconfigured folio check */
+ if (WARN_ON_ONCE(!folio_buffers(folio)))
+ return true;
+
if (mapping == NULL) { /* can this still happen? */
ret = drop_buffers(folio, &buffers_to_free);
goto out;
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH AUTOSEL 6.19-6.12] statmount: permission check should return EPERM
[not found] <20260210233123.2905307-1-sashal@kernel.org>
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.18] hfsplus: fix volume corruption issue for generic/480 Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.1] fs/buffer: add alert in try_to_free_buffers() for folios without buffers Sasha Levin
@ 2026-02-10 23:31 ` Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-5.10] hfsplus: pretend special inodes as regular files Sasha Levin
` (3 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-02-10 23:31 UTC (permalink / raw)
To: patches, stable
Cc: Bhavik Sachdev, Miklos Szeredi, Christian Brauner, Sasha Levin,
viro, linux-fsdevel
From: Bhavik Sachdev <b.sachdev1904@gmail.com>
[ Upstream commit fccbe38a5d06dbe44bcd89196fe1d2c2272a1f4a ]
Currently, statmount() returns ENOENT when caller is not CAP_SYS_ADMIN
in the user namespace owner of target mount namespace. This should be
EPERM instead.
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Link: https://patch.msgid.link/20251129091455.757724-2-b.sachdev1904@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Good - the selftests don't check for a specific errno on the permission
path, so the change won't break tests.
Now let me summarize the analysis.
---
## Detailed Analysis
### 1. COMMIT MESSAGE ANALYSIS
The commit message is clear: `statmount()` returns the wrong errno code
(`ENOENT`) when a caller lacks `CAP_SYS_ADMIN` in the user namespace
owning the target mount namespace. The commit changes it to `EPERM`,
which is the standard POSIX errno for "operation not permitted."
The commit was suggested by Miklos Szeredi (a senior VFS maintainer),
authored by Bhavik Sachdev, and merged by Christian Brauner (the VFS
maintainer). This gives strong confidence in correctness.
### 2. CODE CHANGE ANALYSIS
This is a **one-line change**: `-ENOENT` to `-EPERM` on line 5783 of
`fs/namespace.c`.
The affected code path is in the `statmount()` syscall
(SYSCALL_DEFINE4):
```5781:5783:fs/namespace.c
if (kreq.mnt_ns_id && (ns != current->nsproxy->mnt_ns) &&
!ns_capable_noaudit(ns->user_ns, CAP_SYS_ADMIN))
return -ENOENT;
```
The condition checks: Is the caller requesting a specific mount
namespace ID? If so, is it different from the caller's own mount
namespace? And does the caller lack `CAP_SYS_ADMIN` in that namespace's
user namespace? If all three conditions are true, this is a **permission
denial**, and `-ENOENT` ("No such file or directory") is semantically
incorrect. `-EPERM` ("Operation not permitted") is the correct error
code.
### 3. BUG MECHANISM
This is a **wrong errno** bug introduced by commit `71aacb4c8c3d` ("fs:
Allow statmount() in foreign mount namespace") in v6.11-rc1. The
original author used `-ENOENT` for both "namespace doesn't exist" and
"you don't have permission", but these are semantically different
conditions that userspace needs to distinguish.
**Internal inconsistency**: The inner `do_statmount()` function (line
5572-5574) already correctly returns `-EPERM` for a different permission
check (`!is_path_reachable(...) && !ns_capable_noaudit(...)`). The outer
syscall returning `-ENOENT` for the same type of permission check is
inconsistent.
**Impact on userspace**: A userspace program calling `statmount()` on a
foreign mount namespace without sufficient privileges receives `ENOENT`,
which it would naturally interpret as "the mount namespace doesn't
exist." This misleads debugging and prevents proper error handling. A
container runtime or monitoring tool, for example, would think the
namespace is gone rather than that it lacks the right credentials —
leading to potentially wrong recovery actions.
### 4. SCOPE AND RISK
- **Scope**: Single-line change, single file (`fs/namespace.c`)
- **Risk**: Extremely low. This only changes an error code on a failure
path. No logic is altered. No new code paths are created.
- **Potential concern**: If any userspace program checked `errno ==
ENOENT` specifically after `statmount()` to handle permission
failures, it would need to be updated. However, no such code exists in
kernel selftests, and relying on ENOENT for permission denial would be
poor practice.
### 5. STABLE TREE APPLICABILITY
The buggy code was introduced in **v6.11-rc1** and exists in stable
branches **6.11.y** through **6.19.y**. I confirmed the exact same code
(with `-ENOENT`) exists at the same location in both
`stable/linux-6.11.y` and `stable/linux-6.12.y`. The patch applies
trivially with no dependencies.
**Note**: `listmount()` (line 5943-5945) has the same bug (`-ENOENT`
instead of `-EPERM` for the same permission check). The commit being
analyzed only fixes `statmount()`, not `listmount()`. The Link URL
contains `757724-2` suggesting this was patch 2 of a series; patch 1
likely fixes `listmount()`. However, each fix is independent - fixing
`statmount()` alone is valuable even without the companion `listmount()`
fix.
### 6. CLASSIFICATION
This is a **bug fix** — returning incorrect errno values from syscalls
is a well-understood class of bug that affects userspace programs' error
handling. It is:
- Small and surgical (one line)
- Obviously correct (EPERM is the right code for permission denial)
- Low risk (only changes a failure path's error code)
- Fixes a real userspace-visible issue (misleading errno)
### 7. CONCERNS
- **No companion fix for listmount()**: The same bug exists in
`listmount()` but this commit only addresses `statmount()`. Ideally
both should be backported together.
- **UAPI change**: This technically changes the behavior of a syscall
(different errno), which is visible to userspace. However, since the
previous value was incorrect and misleading, this is a correction, not
a regression.
### 8. VERDICT
This is a clear, minimal, correct bug fix to a syscall's error handling.
It returns the correct standard POSIX errno (`EPERM`) instead of a
misleading one (`ENOENT`) when a permission check fails. It's a one-line
change with zero risk of regression, authored with involvement from
senior VFS maintainers (Miklos Szeredi suggesting, Christian Brauner
merging). It applies cleanly to all affected stable trees (6.11+).
**YES**
fs/namespace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index c58674a20cad5..f6879f282daec 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -5780,7 +5780,7 @@ SYSCALL_DEFINE4(statmount, const struct mnt_id_req __user *, req,
if (kreq.mnt_ns_id && (ns != current->nsproxy->mnt_ns) &&
!ns_capable_noaudit(ns->user_ns, CAP_SYS_ADMIN))
- return -ENOENT;
+ return -EPERM;
ks = kmalloc(sizeof(*ks), GFP_KERNEL_ACCOUNT);
if (!ks)
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH AUTOSEL 6.19-5.10] hfsplus: pretend special inodes as regular files
[not found] <20260210233123.2905307-1-sashal@kernel.org>
` (2 preceding siblings ...)
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-6.12] statmount: permission check should return EPERM Sasha Levin
@ 2026-02-10 23:31 ` Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-5.10] hfsplus: fix volume corruption issue for generic/498 Sasha Levin
` (2 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-02-10 23:31 UTC (permalink / raw)
To: patches, stable
Cc: Tetsuo Handa, syzbot, Viacheslav Dubeyko, Sasha Levin, frank.li,
linux-fsdevel
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[ Upstream commit ed8889ca21b6ab37bc1435c4009ce37a79acb9e6 ]
Since commit af153bb63a33 ("vfs: catch invalid modes in may_open()")
requires any inode be one of S_IFDIR/S_IFLNK/S_IFREG/S_IFCHR/S_IFBLK/
S_IFIFO/S_IFSOCK type, use S_IFREG for special inodes.
Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/d0a07b1b-8b73-4002-8e29-e2bd56871262@I-love.SAKURA.ne.jp
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have a complete picture. Let me compile my analysis.
---
## Detailed Analysis
### 1. Commit Message Analysis
The commit message is clear: HFS+ special/system inodes (catalog, extent
overflow, allocation bitmap, startup, and attributes files) never have
`i_mode` set, leaving it at 0 after `iget_locked()`. This violates the
new VFS requirement (from `af153bb63a33`) that all inodes must have a
valid `S_IF*` type when opened via `may_open()`.
The commit is reported by syzbot with a concrete reproducer and a link
to the syzkaller bug tracker. It's reviewed and signed off by the HFS+
maintainer Viacheslav Dubeyko.
### 2. Code Change Analysis
The fix is a single statement added to `hfsplus_system_read_inode()` in
`fs/hfsplus/super.c`:
```c
inode->i_mode = S_IFREG;
```
This is placed after the switch statement that reads the fork data for
each special inode type (HFSPLUS_EXT_CNID, HFSPLUS_CAT_CNID,
HFSPLUS_ALLOC_CNID, HFSPLUS_START_CNID, HFSPLUS_ATTR_CNID).
**The bug mechanism**: When `hfsplus_iget()` is called for a system
inode (inode number < `HFSPLUS_FIRSTUSER_CNID` and !=
`HFSPLUS_ROOT_CNID`), it calls `hfsplus_system_read_inode()`. This
function sets up the fork data and address space operations but **never
sets `i_mode`**. The `i_mode` remains 0 (the default from
`iget_locked()` → `alloc_inode()`). Zero is not a valid file type - it
doesn't match any of S_IFDIR, S_IFLNK, S_IFREG, S_IFCHR, S_IFBLK,
S_IFIFO, or S_IFSOCK.
When this inode is later opened via a userspace `openat()` syscall
(triggered by syzbot with a crafted filesystem), `may_open()` is called:
```4175:4232:fs/namei.c
static int may_open(struct mnt_idmap *idmap, const struct path *path,
int acc_mode, int flag)
{
// ...
switch (inode->i_mode & S_IFMT) {
case S_IFLNK:
// ...
case S_IFREG:
// ...
default:
VFS_BUG_ON_INODE(!IS_ANON_FILE(inode), inode);
}
```
With `i_mode = 0`, the switch falls to the default case, and since HFS+
special inodes aren't anonymous files, `VFS_BUG_ON_INODE` triggers →
**kernel BUG** (crash). This is confirmed by the syzbot crash report
showing `kernel BUG at fs/namei.c:3474` with a full stack trace through
`may_open` → `do_open` → `path_openat` → `do_filp_open` →
`do_sys_openat2` → `__x64_sys_openat`.
### 3. Relationship to Other Fixes
This is **NOT a duplicate** of `005d4b0d33f6` ("hfsplus: Verify inode
mode when loading from disk"). That fix addresses a DIFFERENT code path:
- `005d4b0d33f6` fixes `hfsplus_get_perms()` in `fs/hfsplus/inode.c` -
validates mode for **user/root inodes** loaded from disk (corrupted
on-disk mode)
- The commit we're analyzing fixes `hfsplus_system_read_inode()` in
`fs/hfsplus/super.c` - sets mode for **special/system inodes** that
never go through `hfsplus_get_perms()` at all
In `hfsplus_iget()`, the two paths are clearly separated:
- `inode->i_ino >= HFSPLUS_FIRSTUSER_CNID || inode->i_ino ==
HFSPLUS_ROOT_CNID` → `hfsplus_cat_read_inode()` →
`hfsplus_get_perms()` (fixed by `005d4b0d33f6`)
- Otherwise → `hfsplus_system_read_inode()` (fixed by THIS commit)
Confirmed via grep: there is literally NO `i_mode` reference in
`fs/hfsplus/super.c` currently.
### 4. CONFIG_DEBUG_VFS Dependency
The `VFS_BUG_ON_INODE` macro is only active when `CONFIG_DEBUG_VFS` is
enabled. Without it, it's compiled to `BUILD_BUG_ON_INVALID(cond)` which
is a no-op at runtime. `CONFIG_DEBUG_VFS` was introduced alongside
`af153bb63a33` in v6.15 (commit `8b17e540969a`).
So the **crash** only occurs on kernels 6.15+ with `CONFIG_DEBUG_VFS`
enabled. However, having `i_mode = 0` is technically incorrect on ALL
kernel versions - any code checking inode type would get wrong results
for these inodes.
### 5. Scope and Risk Assessment
- **Lines changed**: 6 (including comment) - effectively 1 functional
line
- **Files touched**: 1 (`fs/hfsplus/super.c`)
- **Risk**: Essentially zero. Setting `S_IFREG` on internal filesystem
inodes that previously had no type is strictly an improvement. These
inodes are filesystem metadata (btrees, allocation bitmap) - they are
file-like data structures, so `S_IFREG` is appropriate.
- **No behavioral regression**: The fix doesn't change how the
filesystem operates; it only ensures the inode has a valid VFS-level
type.
### 6. User Impact
- **Who's affected**: Anyone mounting HFS+ filesystems on kernels 6.15+
with `CONFIG_DEBUG_VFS`
- **Severity**: Kernel BUG/crash - system goes down
- **Trigger**: syzbot demonstrated this with crafted filesystem images,
but it could potentially also be triggered with corrupted real HFS+
volumes
- **syzbot confirmed**: 17,716 crashes recorded, reproducible with C
reproducer
### 7. Stability Indicators
- Reviewed-by: Viacheslav Dubeyko (HFS+ maintainer)
- Signed-off-by: Viacheslav Dubeyko (maintainer)
- syzbot reported with concrete reproducer
- Part of a series of similar fixes across multiple filesystems (bfs,
cramfs, isofs, jfs, minixfs, nilfs2, ntfs3, squashfs) - the ntfs3
equivalent `4e8011ffec79` is already merged
### 8. Dependencies
The prerequisite VFS commit `af153bb63a33` exists in 6.15+. For older
stable trees (6.14 and below), the `VFS_BUG_ON_INODE` doesn't exist, so
the crash won't happen. However, the fix itself applies cleanly
regardless of version since `hfsplus_system_read_inode()` has been
stable for many years. Setting a proper i_mode is correct behavior
independently of the VFS debug check.
### Conclusion
This is a clear **YES** for stable backporting:
- Fixes a syzbot-reported kernel BUG (crash) - 17,716 crash instances
recorded
- The fix is one functional line: `inode->i_mode = S_IFREG;`
- Zero risk of regression
- Obviously correct - internal inodes should have a valid file type
- Reviewed and accepted by the subsystem maintainer
- Complements the already-merged `005d4b0d33f6` by covering a different
code path
- Part of a broader class of fixes across multiple filesystems for the
same VFS requirement
**YES**
fs/hfsplus/super.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index aaffa9e060a0a..7f327b777ece8 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -53,6 +53,12 @@ static int hfsplus_system_read_inode(struct inode *inode)
return -EIO;
}
+ /*
+ * Assign a dummy file type, for may_open() requires that
+ * an inode has a valid file type.
+ */
+ inode->i_mode = S_IFREG;
+
return 0;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH AUTOSEL 6.19-5.10] hfsplus: fix volume corruption issue for generic/498
[not found] <20260210233123.2905307-1-sashal@kernel.org>
` (3 preceding siblings ...)
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-5.10] hfsplus: pretend special inodes as regular files Sasha Levin
@ 2026-02-10 23:31 ` Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-6.18] netfs: when subreq is marked for retry, do not check if it faced an error Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19] hfs: Replace BUG_ON with error handling for CNID count checks Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-02-10 23:31 UTC (permalink / raw)
To: patches, stable
Cc: Viacheslav Dubeyko, John Paul Adrian Glaubitz, Yangtao Li,
linux-fsdevel, Sasha Levin
From: Viacheslav Dubeyko <slava@dubeyko.com>
[ Upstream commit 9a8c4ad44721da4c48e1ff240ac76286c82837fe ]
The xfstests' test-case generic/498 leaves HFS+ volume
in corrupted state:
sudo ./check generic/498
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc1+ #18 SMP PREEMPT_DYNAMIC Thu Dec 4 12:24:45 PST 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
generic/498 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent
(see XFSTESTS-2/xfstests-dev/results//generic/498.full for details)
Ran: generic/498
Failures: generic/498
Failed 1 of 1 tests
sudo fsck.hfsplus -d /dev/loop51
** /dev/loop51
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
Invalid leaf record count
(It should be 16 instead of 2)
** Checking multi-linked files.
CheckHardLinks: found 1 pre-Leopard file inodes.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x8000 CatStat = 0x00000000
** Repairing volume.
** Rechecking volume.
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
CheckHardLinks: found 1 pre-Leopard file inodes.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled was repaired successfully.
The generic/498 test executes such steps on final phase:
mkdir $SCRATCH_MNT/A
mkdir $SCRATCH_MNT/B
mkdir $SCRATCH_MNT/A/C
touch $SCRATCH_MNT/B/foo
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/B/foo
ln $SCRATCH_MNT/B/foo $SCRATCH_MNT/A/C/foo
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/A
"Simulate a power failure and mount the filesystem
to check that what we explicitly fsync'ed exists."
_flakey_drop_and_remount
The FSCK tool complains about "Invalid leaf record count".
HFS+ b-tree header contains leaf_count field is updated
by hfs_brec_insert() and hfs_brec_remove(). The hfs_brec_insert()
is involved into hard link creation process. However,
modified in-core leaf_count field is stored into HFS+
b-tree header by hfs_btree_write() method. But,
unfortunately, hfs_btree_write() hasn't been called
by hfsplus_cat_write_inode() and hfsplus_file_fsync()
stores not fully consistent state of the Catalog File's
b-tree.
This patch adds calling hfs_btree_write() method in
the hfsplus_cat_write_inode() with the goal of
storing consistent state of Catalog File's b-tree.
Finally, it makes FSCK tool happy.
sudo ./check generic/498
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc1+ #22 SMP PREEMPT_DYNAMIC Sat Dec 6 17:01:31 PST 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
generic/498 33s ... 31s
Ran: generic/498
Passed all 1 tests
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20251207035821.3863657-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Here is my detailed analysis:
---
## Commit Analysis: hfsplus: fix volume corruption issue for generic/498
### 1. The Bug: Filesystem Corruption After Power Failure
This commit fixes a **real volume corruption bug** in the HFS+
filesystem. The corruption occurs in a specific but realistic scenario:
creating a hard link, performing an fsync, then experiencing a power
failure. After recovery, `fsck.hfsplus` reports "Invalid leaf record
count" in the catalog b-tree header.
### 2. Root Cause Analysis
The bug involves a disconnect between in-memory btree metadata updates
and their persistence to disk:
**Step 1: `hfs_brec_insert()` updates `leaf_count` in memory only.**
```105:108:fs/hfsplus/brec.c
if (node->type == HFS_NODE_LEAF) {
tree->leaf_count++;
mark_inode_dirty(tree->inode);
}
```
When a hard link is created (e.g., `ln $SCRATCH_MNT/B/foo
$SCRATCH_MNT/A/C/foo`), `hfs_brec_insert()` is called on the catalog
b-tree. It increments `tree->leaf_count` in the in-memory `struct
hfs_btree` and calls `mark_inode_dirty(tree->inode)` to dirty the tree
inode — but it does NOT write the b-tree header record to disk.
**Step 2: `hfs_btree_write()` is what persists the header to disk.**
```283:311:fs/hfsplus/btree.c
int hfs_btree_write(struct hfs_btree *tree)
{
// ... finds node 0 (header node), maps page ...
head->leaf_count = cpu_to_be32(tree->leaf_count);
// ... other header fields ...
set_page_dirty(page);
hfs_bnode_put(node);
return 0;
}
```
This function serializes the in-memory btree metadata (including
`leaf_count`) to the header node's page and marks it dirty. This is the
**only** way the btree header gets updated on disk.
**Step 3: `hfsplus_cat_write_inode()` was NOT calling
`hfs_btree_write()`.**
Before this fix, `hfsplus_cat_write_inode()` (lines 612-691 in
`inode.c`) writes the catalog entry for a regular inode (folder/file
data, permissions, timestamps, etc.) but never flushes the btree header.
It sets `HFSPLUS_I_CAT_DIRTY`, which tells `hfsplus_file_fsync()` to
call `filemap_write_and_wait()` on the catalog tree's page cache — but
the header page was never dirtied in the first place.
**Step 4: The fsync path misses the btree header.**
In `hfsplus_file_fsync()`:
```339:345:fs/hfsplus/inode.c
sync_inode_metadata(inode, 1);
// ↑ This triggers hfsplus_write_inode() →
hfsplus_cat_write_inode()
// which writes the catalog ENTRY but NOT the btree HEADER
if (test_and_clear_bit(HFSPLUS_I_CAT_DIRTY, &hip->flags))
error =
filemap_write_and_wait(sbi->cat_tree->inode->i_mapping);
// ↑ This flushes dirty pages, but the header page isn't dirty!
```
**Step 5: Where it DID work — system inodes.**
For "system" inodes (the btree inodes themselves),
`hfsplus_system_write_inode()` in `super.c` DOES call
`hfs_btree_write()`:
```114:159:fs/hfsplus/super.c
// hfsplus_system_write_inode:
if (tree) {
int err = hfs_btree_write(tree);
// ...
}
```
But this path is only triggered during `hfsplus_sync_fs()` (full
filesystem sync), not during an individual file's fsync. So the btree
header only gets written on full sync, not on per-file fsync — creating
the corruption window.
### 3. The Fix
The fix adds `hfs_btree_write(tree)` to `hfsplus_cat_write_inode()`:
```diff
+ struct hfs_btree *tree = HFSPLUS_SB(inode->i_sb)->cat_tree;
...
- if (hfs_find_init(HFSPLUS_SB(main_inode->i_sb)->cat_tree, &fd))
+ if (hfs_find_init(tree, &fd))
...
hfs_find_exit(&fd);
+
+ if (!res) {
+ res = hfs_btree_write(tree);
+ if (res) {
+ pr_err("b-tree write err: %d, ino %lu\n",
+ res, inode->i_ino);
+ }
+ }
+
return res;
```
Now when `hfsplus_cat_write_inode()` runs (via `sync_inode_metadata`
during fsync), it also dirties the btree header page. When
`hfsplus_file_fsync()` then calls `filemap_write_and_wait()` on the
catalog tree's mapping, the header page (with the correct `leaf_count`)
gets flushed to disk. After power failure, the on-disk btree header
matches reality.
### 4. Scope and Risk Assessment
- **Lines changed**: ~10 lines added, 1 line modified (extracting `tree`
variable)
- **Files changed**: 1 file (`fs/hfsplus/inode.c`)
- **Function affected**: `hfsplus_cat_write_inode()` only
- **Pattern**: Identical to `hfsplus_system_write_inode()` which already
calls `hfs_btree_write()` for the same tree
- **Performance impact**: Minimal — `hfs_btree_write()` just writes to a
cached page and marks it dirty; the actual I/O happens asynchronously
or during the subsequent `filemap_write_and_wait()`
- **Risk of regression**: Very low — this makes the catalog inode write
path consistent with the system inode write path
### 5. Dependency Analysis
This commit is **completely standalone**:
- `hfs_btree_write()` has existed since the early HFS+ implementation
- The `HFSPLUS_SB()->cat_tree` structure hasn't changed
- The `hfsplus_cat_write_inode()` function's structure is stable across
kernel versions
- It does NOT depend on the companion fix `3f04ee216bc14` ("fix volume
corruption issue for generic/101") which modified
`hfsplus_file_fsync()` and `hfsplus_sync_fs()` — those are completely
different functions
- The patch should apply cleanly to any recent stable tree
### 6. User Impact
- **Severity**: HIGH — this is **filesystem/volume corruption** (data
integrity)
- **Trigger**: Any operation that modifies the catalog btree leaf count
(hard link creation/deletion) followed by fsync + power failure
- **Users affected**: Anyone using HFS+ filesystem (macOS interop,
embedded systems, dual-boot scenarios)
- **Reproducibility**: 100% reproducible via xfstests generic/498
### 7. Classification
This is a clear **data corruption fix**:
- Fixes metadata inconsistency after crash recovery
- The bug has been present since the HFS+ driver was written (the btree
header write was always missing from `hfsplus_cat_write_inode()`)
- Small, surgical fix using an established function
- No new features, APIs, or behavioral changes
- Zero risk to non-HFS+ users
The fix is small, surgical, and meets all stable kernel criteria.
**YES**
fs/hfsplus/inode.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index c762bf909d1aa..6153e5cc6eb65 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -615,6 +615,7 @@ int hfsplus_cat_read_inode(struct inode *inode, struct hfs_find_data *fd)
int hfsplus_cat_write_inode(struct inode *inode)
{
struct inode *main_inode = inode;
+ struct hfs_btree *tree = HFSPLUS_SB(inode->i_sb)->cat_tree;
struct hfs_find_data fd;
hfsplus_cat_entry entry;
int res = 0;
@@ -627,7 +628,7 @@ int hfsplus_cat_write_inode(struct inode *inode)
if (!main_inode->i_nlink)
return 0;
- if (hfs_find_init(HFSPLUS_SB(main_inode->i_sb)->cat_tree, &fd))
+ if (hfs_find_init(tree, &fd))
/* panic? */
return -EIO;
@@ -692,6 +693,15 @@ int hfsplus_cat_write_inode(struct inode *inode)
set_bit(HFSPLUS_I_CAT_DIRTY, &HFSPLUS_I(inode)->flags);
out:
hfs_find_exit(&fd);
+
+ if (!res) {
+ res = hfs_btree_write(tree);
+ if (res) {
+ pr_err("b-tree write err: %d, ino %lu\n",
+ res, inode->i_ino);
+ }
+ }
+
return res;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH AUTOSEL 6.19-6.18] netfs: when subreq is marked for retry, do not check if it faced an error
[not found] <20260210233123.2905307-1-sashal@kernel.org>
` (4 preceding siblings ...)
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-5.10] hfsplus: fix volume corruption issue for generic/498 Sasha Levin
@ 2026-02-10 23:31 ` Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19] hfs: Replace BUG_ON with error handling for CNID count checks Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-02-10 23:31 UTC (permalink / raw)
To: patches, stable
Cc: Shyam Prasad N, David Howells, Steve French, Sasha Levin, pc,
netfs, linux-fsdevel
From: Shyam Prasad N <sprasad@microsoft.com>
[ Upstream commit 82e8885bd7633a36ee9050e6d7f348a4155eed5f ]
The *_subreq_terminated functions today only process the NEED_RETRY
flag when the subreq was successful or failed with EAGAIN error.
However, there could be other retriable errors for network filesystems.
Avoid this by processing the NEED_RETRY irrespective of the error
code faced by the subreq. If it was specifically marked for retry,
the error code must not matter.
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have a complete picture. Let me provide my detailed analysis.
---
## Comprehensive Analysis
### 1. Commit Message Analysis
The commit title "netfs: when subreq is marked for retry, do not check
if it faced an error" clearly describes a bug fix. The commit explains
that `*_subreq_terminated` functions only process the
`NETFS_SREQ_NEED_RETRY` flag when the subrequest was successful or had
an `-EAGAIN` error, but network filesystems may set `NEED_RETRY` for
other error codes.
The commit has:
- **Acked-by: David Howells** — the creator and maintainer of the netfs
subsystem
- **Signed-off-by: Shyam Prasad N** — a CIFS/SMB kernel developer at
Microsoft
- **Signed-off-by: Steve French** — the CIFS/SMB subsystem maintainer
### 2. Code Change Analysis — The Bug
**Read Path Bug** in `netfs_read_subreq_terminated()`
(`read_collect.c`):
When CIFS handles a `MID_REQUEST_SUBMITTED` or `MID_RETRY_NEEDED`
response (as seen in `smb2pdu.c` lines 4633-4640):
```4639:4640:fs/smb/client/smb2pdu.c
__set_bit(NETFS_SREQ_NEED_RETRY, &rdata->subreq.flags);
rdata->result = -EAGAIN;
```
Then at line 4707: `rdata->subreq.error = rdata->result;` (= -EAGAIN),
and line 4710 calls `netfs_read_subreq_terminated()`.
In the current code:
```532:547:fs/netfs/read_collect.c
if (!subreq->error && subreq->transferred < subreq->len) {
// ... this block is SKIPPED because error is -EAGAIN
(non-zero)
}
```
Then:
```549:560:fs/netfs/read_collect.c
if (unlikely(subreq->error < 0)) {
trace_netfs_failure(rreq, subreq, subreq->error,
netfs_fail_read);
if (subreq->source == NETFS_READ_FROM_CACHE) {
netfs_stat(&netfs_n_rh_read_failed);
__set_bit(NETFS_SREQ_NEED_RETRY,
&subreq->flags);
} else {
netfs_stat(&netfs_n_rh_download_failed);
__set_bit(NETFS_SREQ_FAILED, &subreq->flags);
// BUG: overrides NEED_RETRY!
}
```
For server downloads, `NETFS_SREQ_FAILED` is set **despite**
`NETFS_SREQ_NEED_RETRY` already being set by CIFS. Since the collection
code at line 269 checks `FAILED` **before** `NEED_RETRY` at line 277,
and the retry code at line 94 abandons if `FAILED` is set, the
subrequest is **permanently failed** instead of being retried.
**Write Path Bug** in `netfs_write_subrequest_terminated()`
(`write_collect.c`):
```493:498:fs/netfs/write_collect.c
if (IS_ERR_VALUE(transferred_or_error)) {
subreq->error = transferred_or_error;
if (subreq->error == -EAGAIN)
set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
else
set_bit(NETFS_SREQ_FAILED, &subreq->flags);
```
AFS sets `NETFS_SREQ_NEED_RETRY` for auth errors (`-EACCES`, `-EPERM`,
`-ENOKEY`, etc.) at `fs/afs/write.c:165` before calling
`netfs_write_subrequest_terminated()` with those error codes. Since the
error is not `-EAGAIN`, the netfs layer overrides the filesystem's retry
request with `FAILED`.
### 3. The Fix
1. **`read_collect.c`**: Adds a check for `NETFS_SREQ_NEED_RETRY` before
the error handling block. If NEED_RETRY is set, it pauses the request
and jumps past error checks via `goto skip_error_checks`, preventing
FAILED from being set.
2. **`write_collect.c`**: Replaces the EAGAIN-only check with a check
for whether NEED_RETRY is already set. If set, FAILED is not applied.
This delegates retry decisions to the filesystem.
3. **`read_retry.c` / `write_issue.c`**: Adds `subreq->error = 0` in
`netfs_reissue_read()` and `netfs_reissue_write()` to clear stale
error codes before reissuing.
4. **`read_retry.c` abandon path**: Removes the `!subreq->error` check
since subrequests with `NEED_RETRY` can now legitimately carry error
codes.
### 4. User Impact
This bug directly affects:
- **CIFS/SMB users** experiencing network disruptions (laptop lid
close/open, WiFi drops, VPN changes): reads fail with errors instead
of transparently retrying
- **AFS users** needing key rotation during writes: writes fail instead
of retrying with a different key
- These are common real-world scenarios for enterprise and laptop users
### 5. Risk Assessment
**Positive:**
- Small patch (~30 lines across 4 files)
- ACKed by the netfs subsystem maintainer
- Logically sound: filesystems know better than the generic layer when
to retry
- Addresses concrete, user-facing failures
**Concerns:**
- The write path removes the EAGAIN→NEED_RETRY fallback. AFS's
`afs_alloc_operation()` failure (line 133) passes -EAGAIN without pre-
setting NEED_RETRY — this would now fail instead of retry. However,
David Howells (AFS author) ACKed this change, accepting the behavioral
shift.
- 9p similarly doesn't pre-set NEED_RETRY — -EAGAIN from 9p would no
longer trigger retry. This is likely intentional: the filesystem
should decide, not the generic layer.
- The affected code (with the new collector architecture) exists from
v6.14 onward. v6.14 has a slightly different function signature for
`netfs_write_subrequest_terminated` (with `was_async` parameter),
requiring minor backport adjustment.
### 6. Dependency Check
- No missing dependencies: `NETFS_RREQ_PAUSE`,
`netfs_rreq_trace_set_pause`, and all other referenced symbols exist
in the current stable trees.
- The patch is self-contained and doesn't appear to be part of a series.
- For v6.18 stable, the patch should apply cleanly as the code matches
exactly.
### 7. Conclusion
This commit fixes a real, user-visible bug where network filesystem I/O
operations marked for retry by CIFS and AFS are incorrectly treated as
permanent failures by the netfs generic layer. The bug causes spurious
I/O errors during common scenarios like network reconnections and
authentication retries. The fix is small, reviewed by the subsystem
maintainer, and addresses concrete failure paths. While it includes a
deliberate behavioral shift in the write path (removing EAGAIN special-
casing), this was reviewed and ACKed by the relevant maintainers. The
risk is low relative to the benefit of fixing CIFS reconnection failures
for a large user base.
**YES**
fs/netfs/read_collect.c | 10 ++++++++++
fs/netfs/read_retry.c | 4 ++--
fs/netfs/write_collect.c | 8 ++++----
fs/netfs/write_issue.c | 1 +
4 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 7a0ffa675fb17..137f0e28a44c5 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -546,6 +546,15 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq)
}
}
+ /* If need retry is set, error should not matter unless we hit too many
+ * retries. Pause the generation of new subreqs
+ */
+ if (test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
+ trace_netfs_rreq(rreq, netfs_rreq_trace_set_pause);
+ set_bit(NETFS_RREQ_PAUSE, &rreq->flags);
+ goto skip_error_checks;
+ }
+
if (unlikely(subreq->error < 0)) {
trace_netfs_failure(rreq, subreq, subreq->error, netfs_fail_read);
if (subreq->source == NETFS_READ_FROM_CACHE) {
@@ -559,6 +568,7 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq)
set_bit(NETFS_RREQ_PAUSE, &rreq->flags);
}
+skip_error_checks:
trace_netfs_sreq(subreq, netfs_sreq_trace_terminated);
netfs_subreq_clear_in_progress(subreq);
netfs_put_subrequest(subreq, netfs_sreq_trace_put_terminated);
diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
index b99e84a8170af..7793ba5e3e8fc 100644
--- a/fs/netfs/read_retry.c
+++ b/fs/netfs/read_retry.c
@@ -12,6 +12,7 @@
static void netfs_reissue_read(struct netfs_io_request *rreq,
struct netfs_io_subrequest *subreq)
{
+ subreq->error = 0;
__clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
netfs_stat(&netfs_n_rh_retry_read_subreq);
@@ -242,8 +243,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
subreq = list_next_entry(subreq, rreq_link);
abandon:
list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) {
- if (!subreq->error &&
- !test_bit(NETFS_SREQ_FAILED, &subreq->flags) &&
+ if (!test_bit(NETFS_SREQ_FAILED, &subreq->flags) &&
!test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags))
continue;
subreq->error = -ENOMEM;
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index cbf3d9194c7bf..61eab34ea67ef 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -492,11 +492,11 @@ void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error)
if (IS_ERR_VALUE(transferred_or_error)) {
subreq->error = transferred_or_error;
- if (subreq->error == -EAGAIN)
- set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
- else
+ /* if need retry is set, error should not matter */
+ if (!test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
set_bit(NETFS_SREQ_FAILED, &subreq->flags);
- trace_netfs_failure(wreq, subreq, transferred_or_error, netfs_fail_write);
+ trace_netfs_failure(wreq, subreq, transferred_or_error, netfs_fail_write);
+ }
switch (subreq->source) {
case NETFS_WRITE_TO_CACHE:
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index dd8743bc8d7fe..34894da5a23ec 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -250,6 +250,7 @@ void netfs_reissue_write(struct netfs_io_stream *stream,
iov_iter_truncate(&subreq->io_iter, size);
subreq->retry_count++;
+ subreq->error = 0;
__clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
netfs_stat(&netfs_n_wh_retry_write_subreq);
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH AUTOSEL 6.19] hfs: Replace BUG_ON with error handling for CNID count checks
[not found] <20260210233123.2905307-1-sashal@kernel.org>
` (5 preceding siblings ...)
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-6.18] netfs: when subreq is marked for retry, do not check if it faced an error Sasha Levin
@ 2026-02-10 23:31 ` Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-02-10 23:31 UTC (permalink / raw)
To: patches, stable
Cc: Jori Koolstra, syzbot+17cc9bb6d8d69b4139f0, Viacheslav Dubeyko,
Sasha Levin, frank.li, linux-fsdevel
From: Jori Koolstra <jkoolstra@xs4all.nl>
[ Upstream commit b226804532a875c10276168dc55ce752944096bd ]
In a06ec283e125 next_id, folder_count, and file_count in the super block
info were expanded to 64 bits, and BUG_ONs were added to detect
overflow. This triggered an error reported by syzbot: if the MDB is
corrupted, the BUG_ON is triggered. This patch replaces this mechanism
with proper error handling and resolves the syzbot reported bug.
Singed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Reported-by: syzbot+17cc9bb6d8d69b4139f0@syzkaller.appspotmail.com
Closes: https://syzbot.org/bug?extid=17cc9bb6d8d69b4139f0
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20251220191006.2465256-1-jkoolstra@xs4all.nl
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have the complete picture. Let me write the final analysis.
---
## Comprehensive Analysis
### 1. COMMIT MESSAGE ANALYSIS
The commit message clearly states:
- It fixes BUG_ON() calls introduced by commit `a06ec283e125` ("hfs: add
logic of correcting a next unused CNID")
- The BUG_ON triggers on corrupted MDB (Master Directory Block) data
- Reported by syzbot with a reproducible crash
- Reviewed by the HFS subsystem maintainer (Viacheslav Dubeyko)
### 2. CODE CHANGE ANALYSIS
The patch touches 5 files within `fs/hfs/`:
**a) `fs/hfs/inode.c` - `hfs_new_inode()`:** Replaces three
`BUG_ON(value > U32_MAX)` checks with proper error handling:
- If `next_id > U32_MAX`: reverses the `atomic64_inc`, prints error,
jumps to `out_discard` which calls `iput(inode)` then returns
`ERR_PTR(-ERANGE)`
- If `folder_count > U32_MAX`: same pattern with `atomic64_dec` rollback
- If `file_count > U32_MAX`: same pattern
- The function return type changes from `NULL` on error to `ERR_PTR()`
on error
**b) `fs/hfs/inode.c` - `hfs_delete_inode()`:** Removes two `BUG_ON()`
calls that checked counts before decrementing. These are safe to remove
because the increment path now prevents exceeding U32_MAX.
**c) `fs/hfs/dir.c` - `hfs_create()` and `hfs_mkdir()`:** Updated
callers to check `IS_ERR(inode)` / `PTR_ERR(inode)` instead of `!inode`,
matching the new return convention.
**d) `fs/hfs/dir.c` - `hfs_remove()`:** Adds a pre-check via
`is_hfs_cnid_counts_valid()` before removing files/directories to
prevent operations on a corrupted filesystem.
**e) `fs/hfs/mdb.c` - `hfs_mdb_commit()`:** Removes three `BUG_ON()`
calls in the MDB commit path. These were the most dangerous - they could
trigger during periodic writeback work, causing a panic even without
user interaction after mount.
**f) `fs/hfs/mdb.c` - `hfs_mdb_get()`:** Adds mount-time validation via
`is_hfs_cnid_counts_valid()`. If the on-disk values are corrupted (>
U32_MAX), the filesystem is forced to read-only mode, preventing the
BUG_ON paths from ever being reached.
**g) `fs/hfs/mdb.c` - `is_hfs_cnid_counts_valid()`:** New helper
function that validates all three CNID-related counters.
**h) `fs/hfs/super.c` - `hfs_sync_fs()` and `flush_mdb()`:** Adds
`is_hfs_cnid_counts_valid()` calls before `hfs_mdb_commit()` as an extra
safety check.
### 3. THE BUG MECHANISM
Commit `a06ec283e125` (merged in v6.18) expanded `next_id`,
`file_count`, and `folder_count` from `u32` to `atomic64_t` for
atomicity, and added `BUG_ON()` to detect overflow past `U32_MAX`.
However, these values are read from the on-disk MDB at mount time via
`be32_to_cpu()` into `atomic64_t`. If the MDB is corrupted such that
these 32-bit on-disk values, when interpreted, lead to increments past
`U32_MAX` (the on-disk `drNxtCNID` could be `0xFFFFFFFF` or similar),
then creating a new file triggers `BUG_ON(next_id > U32_MAX)` at line
222 of `inode.c`.
The syzbot crash report confirms this exact scenario:
- **Crash site:** `kernel BUG at fs/hfs/inode.c:222` in
`hfs_new_inode()`
- **Call path:** `openat() -> path_openat() -> lookup_open() ->
hfs_create() -> hfs_new_inode()`
- **Trigger:** Opening a file for creation on a corrupted HFS filesystem
image
- **Impact:** Instant kernel panic/oops (invalid opcode trap from
BUG_ON)
### 4. DEPENDENCY ANALYSIS
**Critical dependency:** This fix ONLY applies to kernels containing
`a06ec283e125`, which first appeared in v6.18-rc1. The BUG_ON code does
not exist in v6.17 or earlier kernels. In those older kernels,
`next_id`, `file_count`, and `folder_count` are plain `u32` with simple
`++`/`--` operations and no overflow checks.
- **v6.18.y stable:** Has the BUG_ON (confirmed in v6.18.5). NEEDS this
fix.
- **v6.12.y and older:** Does NOT have the BUG_ON. Does NOT need this
fix.
### 5. SEVERITY AND IMPACT
- **Severity:** HIGH. This is a kernel BUG/panic triggered by corrupted
filesystem data. Any user mounting a damaged or maliciously crafted
HFS image can crash the kernel.
- **Attack surface:** HFS images can come from USB sticks, CD-ROMs, disk
images, or network mounts. The crash is reachable from the `openat()`
syscall (creating a file on a mounted corrupted HFS).
- **Security implications:** A crafted HFS image can crash the kernel,
constituting a denial-of-service vulnerability. This is especially
concerning for systems that auto-mount removable media.
### 6. RISK ASSESSMENT
- **Size:** ~130 lines changed across 5 files, all within `fs/hfs/`
- **Scope:** Entirely contained to the HFS filesystem subsystem
- **Pattern:** Well-understood "replace BUG_ON with error handling"
pattern
- **Quality:** Reviewed by the HFS maintainer, tested by syzbot (patch
testing passed), and went through 4 patch revisions (v1 through v4)
- **Risk:** LOW. The changes are:
- Mount-time validation forces corrupted filesystems read-only (safe)
- BUG_ON replaced with `return ERR_PTR(-ERANGE)` (graceful failure)
- Callers updated to handle the new error convention
- Atomic rollbacks on error (correct)
### 7. MINOR CONCERNS
- There's a minor whitespace issue (spaces vs tabs in `out_discard:`
label indentation and `hfs_remove()`) - cosmetic only, no functional
impact
- The `is_hfs_cnid_counts_valid()` calls in `hfs_sync_fs()` and
`flush_mdb()` only print warnings but don't prevent `hfs_mdb_commit()`
from running - however, the BUG_ONs in `hfs_mdb_commit()` are already
removed, so this is a soft warning rather than crash prevention
- The commit has a typo "Singed-off-by" (should be "Signed-off-by") but
this doesn't affect the code
### 8. CONCLUSION
This is a textbook stable backport candidate:
- Fixes a syzbot-reported, reproducible kernel BUG/panic
- The bug is triggered from userspace via a common syscall path
(`openat()`)
- Corrupted filesystem images are a well-known attack vector
- The fix is contained, reviewed, and follows the standard "BUG_ON ->
error handling" pattern
- It applies specifically to v6.18.y stable which contains the buggy
BUG_ON code
The fix is small, surgical, and meets all stable kernel criteria.
**YES**
fs/hfs/dir.c | 15 +++++++++++----
fs/hfs/hfs_fs.h | 1 +
fs/hfs/inode.c | 30 ++++++++++++++++++++++++------
fs/hfs/mdb.c | 31 +++++++++++++++++++++++++++----
fs/hfs/super.c | 3 +++
5 files changed, 66 insertions(+), 14 deletions(-)
diff --git a/fs/hfs/dir.c b/fs/hfs/dir.c
index 86a6b317b474a..0c615c078650c 100644
--- a/fs/hfs/dir.c
+++ b/fs/hfs/dir.c
@@ -196,8 +196,8 @@ static int hfs_create(struct mnt_idmap *idmap, struct inode *dir,
int res;
inode = hfs_new_inode(dir, &dentry->d_name, mode);
- if (!inode)
- return -ENOMEM;
+ if (IS_ERR(inode))
+ return PTR_ERR(inode);
res = hfs_cat_create(inode->i_ino, dir, &dentry->d_name, inode);
if (res) {
@@ -226,8 +226,8 @@ static struct dentry *hfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
int res;
inode = hfs_new_inode(dir, &dentry->d_name, S_IFDIR | mode);
- if (!inode)
- return ERR_PTR(-ENOMEM);
+ if (IS_ERR(inode))
+ return ERR_CAST(inode);
res = hfs_cat_create(inode->i_ino, dir, &dentry->d_name, inode);
if (res) {
@@ -254,11 +254,18 @@ static struct dentry *hfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
*/
static int hfs_remove(struct inode *dir, struct dentry *dentry)
{
+ struct super_block *sb = dir->i_sb;
struct inode *inode = d_inode(dentry);
int res;
if (S_ISDIR(inode->i_mode) && inode->i_size != 2)
return -ENOTEMPTY;
+
+ if (unlikely(!is_hfs_cnid_counts_valid(sb))) {
+ pr_err("cannot remove file/folder\n");
+ return -ERANGE;
+ }
+
res = hfs_cat_delete(inode->i_ino, dir, &dentry->d_name);
if (res)
return res;
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index e94dbc04a1e43..ac0e83f77a0f1 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -199,6 +199,7 @@ extern void hfs_delete_inode(struct inode *inode);
extern const struct xattr_handler * const hfs_xattr_handlers[];
/* mdb.c */
+extern bool is_hfs_cnid_counts_valid(struct super_block *sb);
extern int hfs_mdb_get(struct super_block *sb);
extern void hfs_mdb_commit(struct super_block *sb);
extern void hfs_mdb_close(struct super_block *sb);
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index 524db1389737d..878535db64d67 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -187,16 +187,23 @@ struct inode *hfs_new_inode(struct inode *dir, const struct qstr *name, umode_t
s64 next_id;
s64 file_count;
s64 folder_count;
+ int err = -ENOMEM;
if (!inode)
- return NULL;
+ goto out_err;
+
+ err = -ERANGE;
mutex_init(&HFS_I(inode)->extents_lock);
INIT_LIST_HEAD(&HFS_I(inode)->open_dir_list);
spin_lock_init(&HFS_I(inode)->open_dir_lock);
hfs_cat_build_key(sb, (btree_key *)&HFS_I(inode)->cat_key, dir->i_ino, name);
next_id = atomic64_inc_return(&HFS_SB(sb)->next_id);
- BUG_ON(next_id > U32_MAX);
+ if (next_id > U32_MAX) {
+ atomic64_dec(&HFS_SB(sb)->next_id);
+ pr_err("cannot create new inode: next CNID exceeds limit\n");
+ goto out_discard;
+ }
inode->i_ino = (u32)next_id;
inode->i_mode = mode;
inode->i_uid = current_fsuid();
@@ -210,7 +217,11 @@ struct inode *hfs_new_inode(struct inode *dir, const struct qstr *name, umode_t
if (S_ISDIR(mode)) {
inode->i_size = 2;
folder_count = atomic64_inc_return(&HFS_SB(sb)->folder_count);
- BUG_ON(folder_count > U32_MAX);
+ if (folder_count> U32_MAX) {
+ atomic64_dec(&HFS_SB(sb)->folder_count);
+ pr_err("cannot create new inode: folder count exceeds limit\n");
+ goto out_discard;
+ }
if (dir->i_ino == HFS_ROOT_CNID)
HFS_SB(sb)->root_dirs++;
inode->i_op = &hfs_dir_inode_operations;
@@ -220,7 +231,11 @@ struct inode *hfs_new_inode(struct inode *dir, const struct qstr *name, umode_t
} else if (S_ISREG(mode)) {
HFS_I(inode)->clump_blocks = HFS_SB(sb)->clumpablks;
file_count = atomic64_inc_return(&HFS_SB(sb)->file_count);
- BUG_ON(file_count > U32_MAX);
+ if (file_count > U32_MAX) {
+ atomic64_dec(&HFS_SB(sb)->file_count);
+ pr_err("cannot create new inode: file count exceeds limit\n");
+ goto out_discard;
+ }
if (dir->i_ino == HFS_ROOT_CNID)
HFS_SB(sb)->root_files++;
inode->i_op = &hfs_file_inode_operations;
@@ -244,6 +259,11 @@ struct inode *hfs_new_inode(struct inode *dir, const struct qstr *name, umode_t
hfs_mark_mdb_dirty(sb);
return inode;
+
+ out_discard:
+ iput(inode);
+ out_err:
+ return ERR_PTR(err);
}
void hfs_delete_inode(struct inode *inode)
@@ -252,7 +272,6 @@ void hfs_delete_inode(struct inode *inode)
hfs_dbg("ino %lu\n", inode->i_ino);
if (S_ISDIR(inode->i_mode)) {
- BUG_ON(atomic64_read(&HFS_SB(sb)->folder_count) > U32_MAX);
atomic64_dec(&HFS_SB(sb)->folder_count);
if (HFS_I(inode)->cat_key.ParID == cpu_to_be32(HFS_ROOT_CNID))
HFS_SB(sb)->root_dirs--;
@@ -261,7 +280,6 @@ void hfs_delete_inode(struct inode *inode)
return;
}
- BUG_ON(atomic64_read(&HFS_SB(sb)->file_count) > U32_MAX);
atomic64_dec(&HFS_SB(sb)->file_count);
if (HFS_I(inode)->cat_key.ParID == cpu_to_be32(HFS_ROOT_CNID))
HFS_SB(sb)->root_files--;
diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
index 53f3fae602179..e0150945cf13b 100644
--- a/fs/hfs/mdb.c
+++ b/fs/hfs/mdb.c
@@ -64,6 +64,27 @@ static int hfs_get_last_session(struct super_block *sb,
return 0;
}
+bool is_hfs_cnid_counts_valid(struct super_block *sb)
+{
+ struct hfs_sb_info *sbi = HFS_SB(sb);
+ bool corrupted = false;
+
+ if (unlikely(atomic64_read(&sbi->next_id) > U32_MAX)) {
+ pr_warn("next CNID exceeds limit\n");
+ corrupted = true;
+ }
+ if (unlikely(atomic64_read(&sbi->file_count) > U32_MAX)) {
+ pr_warn("file count exceeds limit\n");
+ corrupted = true;
+ }
+ if (unlikely(atomic64_read(&sbi->folder_count) > U32_MAX)) {
+ pr_warn("folder count exceeds limit\n");
+ corrupted = true;
+ }
+
+ return !corrupted;
+}
+
/*
* hfs_mdb_get()
*
@@ -156,6 +177,11 @@ int hfs_mdb_get(struct super_block *sb)
atomic64_set(&HFS_SB(sb)->file_count, be32_to_cpu(mdb->drFilCnt));
atomic64_set(&HFS_SB(sb)->folder_count, be32_to_cpu(mdb->drDirCnt));
+ if (!is_hfs_cnid_counts_valid(sb)) {
+ pr_warn("filesystem possibly corrupted, running fsck.hfs is recommended. Mounting read-only.\n");
+ sb->s_flags |= SB_RDONLY;
+ }
+
/* TRY to get the alternate (backup) MDB. */
sect = part_start + part_size - 2;
bh = sb_bread512(sb, sect, mdb2);
@@ -209,7 +235,7 @@ int hfs_mdb_get(struct super_block *sb)
attrib = mdb->drAtrb;
if (!(attrib & cpu_to_be16(HFS_SB_ATTRIB_UNMNT))) {
- pr_warn("filesystem was not cleanly unmounted, running fsck.hfs is recommended. mounting read-only.\n");
+ pr_warn("filesystem was not cleanly unmounted, running fsck.hfs is recommended. Mounting read-only.\n");
sb->s_flags |= SB_RDONLY;
}
if ((attrib & cpu_to_be16(HFS_SB_ATTRIB_SLOCK))) {
@@ -273,15 +299,12 @@ void hfs_mdb_commit(struct super_block *sb)
/* These parameters may have been modified, so write them back */
mdb->drLsMod = hfs_mtime();
mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)->free_ablocks);
- BUG_ON(atomic64_read(&HFS_SB(sb)->next_id) > U32_MAX);
mdb->drNxtCNID =
cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->next_id));
mdb->drNmFls = cpu_to_be16(HFS_SB(sb)->root_files);
mdb->drNmRtDirs = cpu_to_be16(HFS_SB(sb)->root_dirs);
- BUG_ON(atomic64_read(&HFS_SB(sb)->file_count) > U32_MAX);
mdb->drFilCnt =
cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->file_count));
- BUG_ON(atomic64_read(&HFS_SB(sb)->folder_count) > U32_MAX);
mdb->drDirCnt =
cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->folder_count));
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index 47f50fa555a45..70e118c27e200 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -34,6 +34,7 @@ MODULE_LICENSE("GPL");
static int hfs_sync_fs(struct super_block *sb, int wait)
{
+ is_hfs_cnid_counts_valid(sb);
hfs_mdb_commit(sb);
return 0;
}
@@ -65,6 +66,8 @@ static void flush_mdb(struct work_struct *work)
sbi->work_queued = 0;
spin_unlock(&sbi->work_lock);
+ is_hfs_cnid_counts_valid(sb);
+
hfs_mdb_commit(sb);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread