[PATCH AUTOSEL 6.18] btrfs: sync read disk super and set block size

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.18] btrfs: sync read disk super and set block size
       [not found] <20260128223332.2806589-1-sashal@kernel.org>
@ 2026-01-28 22:33 ` Sasha Levin
  0 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-01-28 22:33 UTC (permalink / raw)
  To: patches, stable
  Cc: Edward Adam Davis, syzbot+b4a2af3000eaa84d95d5, Filipe Manana,
	David Sterba, Sasha Levin, clm, linux-btrfs

From: Edward Adam Davis <eadavis@qq.com>

[ Upstream commit 3f29d661e5686f3aa14e6f11537ff5c49846f2e2 ]

When the user performs a btrfs mount, the block device is not set
correctly. The user sets the block size of the block device to 0x4000
by executing the BLKBSZSET command.
Since the block size change also changes the mapping->flags value, this
further affects the result of the mapping_min_folio_order() calculation.

Let's analyze the following two scenarios:

Scenario 1: Without executing the BLKBSZSET command, the block size is
0x1000, and mapping_min_folio_order() returns 0;

Scenario 2: After executing the BLKBSZSET command, the block size is
0x4000, and mapping_min_folio_order() returns 2.

do_read_cache_folio() allocates a folio before the BLKBSZSET command
is executed. This results in the allocated folio having an order value
of 0. Later, after BLKBSZSET is executed, the block size increases to
0x4000, and the mapping_min_folio_order() calculation result becomes 2.

This leads to two undesirable consequences:

1. filemap_add_folio() triggers a VM_BUG_ON_FOLIO(folio_order(folio) <
mapping_min_folio_order(mapping)) assertion.

2. The syzbot report [1] shows a null pointer dereference in
create_empty_buffers() due to a buffer head allocation failure.

Synchronization should be established based on the inode between the
BLKBSZSET command and read cache page to prevent inconsistencies in
block size or mapping flags before and after folio allocation.

[1]
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:create_empty_buffers+0x4d/0x480 fs/buffer.c:1694
Call Trace:
 folio_create_buffers+0x109/0x150 fs/buffer.c:1802
 block_read_full_folio+0x14c/0x850 fs/buffer.c:2403
 filemap_read_folio+0xc8/0x2a0 mm/filemap.c:2496
 do_read_cache_folio+0x266/0x5c0 mm/filemap.c:4096
 do_read_cache_page mm/filemap.c:4162 [inline]
 read_cache_page_gfp+0x29/0x120 mm/filemap.c:4195
 btrfs_read_disk_super+0x192/0x500 fs/btrfs/volumes.c:1367

Reported-by: syzbot+b4a2af3000eaa84d95d5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b4a2af3000eaa84d95d5
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

So this commit is going into v6.19. Let me now analyze the complete
picture.

## Analysis Summary

### 1. COMMIT MESSAGE ANALYSIS

**Subject:** "btrfs: sync read disk super and set block size"

**Key indicators:**
- **Reported-by: syzbot**: Indicates a real bug found by fuzzing
- **Closes: syzkaller bug link**: Confirms this is fixing a reported
  issue
- **Reviewed-by: Filipe Manana**: Core btrfs maintainer reviewed and
  approved
- **Signed-off-by: David Sterba**: Btrfs maintainer signed off

The commit message describes a race condition that leads to:
1. `VM_BUG_ON_FOLIO` assertion failure
2. Null pointer dereference in `create_empty_buffers()`

### 2. CODE CHANGE ANALYSIS

The fix is **extremely simple** - just 2 lines added:
```c
+       filemap_invalidate_lock(mapping);
        page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT,
GFP_NOFS);
+       filemap_invalidate_unlock(mapping);
```

**Root cause:** A race between:
1. `btrfs_read_disk_super()` allocating a folio via
   `read_cache_page_gfp()`
2. User space calling `BLKBSZSET` ioctl to change block size

When the block size changes mid-operation, the folio order requirements
change, causing either a BUG_ON or null pointer dereference.

**Why the fix works:** The `filemap_invalidate_lock()` provides
synchronization between the page cache reader and the block size setter
(in `set_blocksize()`), preventing the race.

### 3. CLASSIFICATION

- **Bug type:** Race condition leading to kernel crash (null pointer
  deref) or BUG_ON
- **Severity:** High - causes kernel panic/crash
- **Category:** Synchronization fix

### 4. SCOPE AND RISK ASSESSMENT

- **Lines changed:** 2 additions only
- **Files touched:** 1 (fs/btrfs/volumes.c)
- **Complexity:** Very low - just lock/unlock pair around existing call
- **Risk:** LOW - the lock is well-understood and used elsewhere for
  exactly this purpose

### 5. USER IMPACT

- **Who is affected:** Any btrfs user where concurrent block device
  operations occur
- **Triggering scenario:** User space modifying block device size while
  btrfs is mounting/reading superblock
- **Severity of impact:** Kernel crash (HIGH)

### 6. STABILITY INDICATORS

- **Reviewed-by:** Filipe Manana (senior btrfs developer at SUSE)
- **Multiple sign-offs:** Through the btrfs maintainer chain
- **Syzbot reported:** Reproducible issue

### 7. CRITICAL DEPENDENCY CHECK

**THIS IS THE KEY ISSUE:** The fix relies on `set_blocksize()` also
taking `filemap_invalidate_lock()`. This was added by commit
`c0e473a0d2264` ("block: fix race between set_blocksize and read paths")
which is **only in kernel 6.15+**.

Without that block layer fix:
- On kernels < 6.15: `set_blocksize()` does NOT take
  `filemap_invalidate_lock`, so this btrfs fix would NOT provide proper
  synchronization
- The btrfs fix by itself would be ineffective in preventing the race

### Risk vs Benefit Analysis

**Benefits:**
- Fixes real crash reported by syzbot
- Very small, surgical fix (2 lines)
- Well-reviewed by btrfs maintainers
- Follows established synchronization pattern

**Risks/Concerns:**
- **CRITICAL DEPENDENCY**: Requires `c0e473a0d2264` to be present for
  the synchronization to work
- Without the block layer fix, this btrfs fix is essentially a no-op
  (won't actually synchronize)
- The block layer fix is only in 6.15+, so backporting this btrfs fix to
  6.14 or earlier would be pointless without also backporting the block
  layer changes

### Conclusion

This commit fixes a real kernel crash (null pointer dereference) in
btrfs that was reported by syzbot. The fix itself is minimal and correct
- it adds proper locking around the `read_cache_page_gfp()` call to
synchronize with block size changes.

However, there is a **critical dependency issue**: This fix only
provides effective synchronization when paired with commit
`c0e473a0d2264` ("block: fix race between set_blocksize and read paths")
which adds the corresponding lock acquisition in `set_blocksize()`. That
commit is only in kernel 6.15+.

For stable trees **6.15.y and later**: This is a clear YES - fixes a
real crash with minimal risk.

For stable trees **6.14 and earlier**: The fix would be ineffective
without also backporting the block layer changes, which are more
invasive.

Given that the question is about backporting in general and this commit
would only be effective in 6.15+ (where the block layer infrastructure
exists), and the fix is correct, small, and addresses a real crash, the
answer depends on the target kernel. For the current stable (6.15+),
this should be backported.

**YES**

 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 48e717c105c35..8e7dcb12af4c4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1365,7 +1365,9 @@ struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
 				      (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
 	}

+	filemap_invalidate_lock(mapping);
 	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
+	filemap_invalidate_unlock(mapping);
 	if (IS_ERR(page))
 		return ERR_CAST(page);

-- 
2.51.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.18] btrfs: fix Wmaybe-uninitialized warning in replay_one_buffer()
       [not found] <20260202214643.212290-1-sashal@kernel.org>
@ 2026-02-02 21:46 ` Sasha Levin
  2026-02-02 21:46 ` [PATCH AUTOSEL 6.18] btrfs: sync read disk super and set block size Sasha Levin
  2026-02-02 21:46 ` [PATCH AUTOSEL 6.18-6.12] btrfs: reject new transactions if the fs is fully read-only Sasha Levin
  2 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-02-02 21:46 UTC (permalink / raw)
  To: patches, stable; +Cc: Qiang Ma, David Sterba, Sasha Levin, clm, linux-btrfs

From: Qiang Ma <maqianga@uniontech.com>

[ Upstream commit 9c7e71c97c8cd086b148d0d3d1cd84a1deab023c ]

Warning was found when compiling using loongarch64-gcc 12.3.1:

  $ make CFLAGS_tree-log.o=-Wmaybe-uninitialized

  In file included from fs/btrfs/ctree.h:21,
		   from fs/btrfs/tree-log.c:12:
  fs/btrfs/accessors.h: In function 'replay_one_buffer':
  fs/btrfs/accessors.h:66:16: warning: 'inode_item' may be used uninitialized [-Wmaybe-uninitialized]
     66 |         return btrfs_get_##bits(eb, s, offsetof(type, member));         \
	|                ^~~~~~~~~~
  fs/btrfs/tree-log.c:2803:42: note: 'inode_item' declared here
   2803 |                 struct btrfs_inode_item *inode_item;
	|                                          ^~~~~~~~~~

Initialize the inode_item to NULL, the compiler does not seem to see the
relation between the first 'wc->log_key.type == BTRFS_INODE_ITEM_KEY'
check and the other one that also checks the replay phase.

Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have a clear understanding of the code flow. Let me verify what
the commit changes and analyze it:

## Analysis Summary

### 1. COMMIT MESSAGE ANALYSIS

The commit addresses a compiler warning (`-Wmaybe-uninitialized`) for
the `inode_item` pointer in `replay_one_buffer()`. The warning is
triggered by certain compilers (loongarch64-gcc 12.3.1 in this case)
that don't recognize the relationship between:
- The first check: `wc->log_key.type == BTRFS_INODE_ITEM_KEY` (which
  assigns `inode_item`)
- The later use: `wc->log_key.type == BTRFS_INODE_ITEM_KEY && wc->stage
  == LOG_WALK_REPLAY_INODES` (which uses `inode_item`)

The compiler can't prove that `inode_item` is always initialized before
use due to the complex conditional logic.

### 2. CODE CHANGE ANALYSIS

The change is trivial:
```c
- struct btrfs_inode_item *inode_item;
+ struct btrfs_inode_item *inode_item = NULL;
```

Looking at the control flow:
- Line 2807: `if (wc->log_key.type == BTRFS_INODE_ITEM_KEY)` -
  inode_item is assigned via `btrfs_item_ptr()`
- Line 2838-2845: `if (wc->log_key.type == BTRFS_INODE_ITEM_KEY &&
  wc->stage == LOG_WALK_REPLAY_INODES)` - inode_item is used via
  `btrfs_inode_mode(eb, inode_item)`

Logically, whenever the second condition is true, the first condition
must also have been true (both check `wc->log_key.type ==
BTRFS_INODE_ITEM_KEY`), so `inode_item` will have been assigned. But
some compilers cannot deduce this relationship due to the second
condition also checking `wc->stage`.

### 3. CLASSIFICATION

This is a **build fix** - it silences a compiler warning. It falls under
the "BUILD FIXES" exception category:
- Fixes for compilation warnings
- Zero risk of runtime regression (initializing a pointer to NULL that
  will always be properly assigned before use)

### 4. SCOPE AND RISK ASSESSMENT

- **Lines changed**: 1 line
- **Files touched**: 1 file (fs/btrfs/tree-log.c)
- **Complexity**: Trivial - just adding `= NULL` initialization
- **Risk**: Extremely low - this is a defensive initialization that
  doesn't change runtime behavior

The only possible risk would be if the compiler warning was actually
detecting a real bug. However, analyzing the code:
- Line 2838 checks `wc->log_key.type == BTRFS_INODE_ITEM_KEY` AND
  `wc->stage == LOG_WALK_REPLAY_INODES`
- Line 2807 checks `wc->log_key.type == BTRFS_INODE_ITEM_KEY`
- If we reach line 2845 (where `inode_item` is used), we must have
  passed the check at 2838, which implies `wc->log_key.type ==
  BTRFS_INODE_ITEM_KEY` is true, which means we also entered the block
  at 2807 and assigned `inode_item`.

The warning is a false positive.

### 5. USER IMPACT

- This fixes a compiler warning that can be annoying for
  developers/users who enable `-Wmaybe-uninitialized`
- Some build configurations with stricter warning settings (e.g.,
  `-Werror`) could fail to compile without this fix
- The btrfs subsystem is widely used, so build issues affect many users

### 6. STABILITY INDICATORS

- **Reviewed-by: David Sterba <dsterba@suse.com>** - David Sterba is the
  btrfs maintainer
- **Signed-off-by: David Sterba <dsterba@suse.com>** - Merged by the
  maintainer
- This is a simple, obviously correct fix

### 7. DEPENDENCY CHECK

- No dependencies on other commits
- This is a self-contained one-line change
- The code exists in stable trees (tree-log.c is present in all kernel
  versions)

## Conclusion

This commit is an ideal candidate for stable backporting:

1. **Meets stable criteria**:
   - Obviously correct - trivial initialization
   - Fixes a real issue - compiler warning that could break builds with
     strict settings
   - Small and contained - single line change
   - No new features

2. **Falls under BUILD FIXES exception**: Compiler warning fixes are
   explicitly allowed in stable

3. **Zero risk**: Initializing a pointer to NULL that is guaranteed to
   be assigned before use cannot cause any runtime regression

4. **Maintainer approved**: Reviewed and merged by David Sterba, the
   btrfs maintainer

**YES**

 fs/btrfs/tree-log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 1444857de9fe8..ae2e035d013e2 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2800,7 +2800,7 @@ static int replay_one_buffer(struct extent_buffer *eb,
 
 	nritems = btrfs_header_nritems(eb);
 	for (wc->log_slot = 0; wc->log_slot < nritems; wc->log_slot++) {
-		struct btrfs_inode_item *inode_item;
+		struct btrfs_inode_item *inode_item = NULL;
 
 		btrfs_item_key_to_cpu(eb, &wc->log_key, wc->log_slot);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.18] btrfs: sync read disk super and set block size
       [not found] <20260202214643.212290-1-sashal@kernel.org>
  2026-02-02 21:46 ` [PATCH AUTOSEL 6.18] btrfs: fix Wmaybe-uninitialized warning in replay_one_buffer() Sasha Levin
@ 2026-02-02 21:46 ` Sasha Levin
  2026-02-02 21:46 ` [PATCH AUTOSEL 6.18-6.12] btrfs: reject new transactions if the fs is fully read-only Sasha Levin
  2 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-02-02 21:46 UTC (permalink / raw)
  To: patches, stable
  Cc: Edward Adam Davis, syzbot+b4a2af3000eaa84d95d5, Filipe Manana,
	David Sterba, Sasha Levin, clm, linux-btrfs

From: Edward Adam Davis <eadavis@qq.com>

[ Upstream commit 3f29d661e5686f3aa14e6f11537ff5c49846f2e2 ]

When the user performs a btrfs mount, the block device is not set
correctly. The user sets the block size of the block device to 0x4000
by executing the BLKBSZSET command.
Since the block size change also changes the mapping->flags value, this
further affects the result of the mapping_min_folio_order() calculation.

Let's analyze the following two scenarios:

Scenario 1: Without executing the BLKBSZSET command, the block size is
0x1000, and mapping_min_folio_order() returns 0;

Scenario 2: After executing the BLKBSZSET command, the block size is
0x4000, and mapping_min_folio_order() returns 2.

do_read_cache_folio() allocates a folio before the BLKBSZSET command
is executed. This results in the allocated folio having an order value
of 0. Later, after BLKBSZSET is executed, the block size increases to
0x4000, and the mapping_min_folio_order() calculation result becomes 2.

This leads to two undesirable consequences:

1. filemap_add_folio() triggers a VM_BUG_ON_FOLIO(folio_order(folio) <
mapping_min_folio_order(mapping)) assertion.

2. The syzbot report [1] shows a null pointer dereference in
create_empty_buffers() due to a buffer head allocation failure.

Synchronization should be established based on the inode between the
BLKBSZSET command and read cache page to prevent inconsistencies in
block size or mapping flags before and after folio allocation.

[1]
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:create_empty_buffers+0x4d/0x480 fs/buffer.c:1694
Call Trace:
 folio_create_buffers+0x109/0x150 fs/buffer.c:1802
 block_read_full_folio+0x14c/0x850 fs/buffer.c:2403
 filemap_read_folio+0xc8/0x2a0 mm/filemap.c:2496
 do_read_cache_folio+0x266/0x5c0 mm/filemap.c:4096
 do_read_cache_page mm/filemap.c:4162 [inline]
 read_cache_page_gfp+0x29/0x120 mm/filemap.c:4195
 btrfs_read_disk_super+0x192/0x500 fs/btrfs/volumes.c:1367

Reported-by: syzbot+b4a2af3000eaa84d95d5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b4a2af3000eaa84d95d5
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

So this commit is going into v6.19. Let me now analyze the complete
picture.

## Analysis Summary

### 1. COMMIT MESSAGE ANALYSIS

**Subject:** "btrfs: sync read disk super and set block size"

**Key indicators:**
- **Reported-by: syzbot**: Indicates a real bug found by fuzzing
- **Closes: syzkaller bug link**: Confirms this is fixing a reported
  issue
- **Reviewed-by: Filipe Manana**: Core btrfs maintainer reviewed and
  approved
- **Signed-off-by: David Sterba**: Btrfs maintainer signed off

The commit message describes a race condition that leads to:
1. `VM_BUG_ON_FOLIO` assertion failure
2. Null pointer dereference in `create_empty_buffers()`

### 2. CODE CHANGE ANALYSIS

The fix is **extremely simple** - just 2 lines added:
```c
+       filemap_invalidate_lock(mapping);
        page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT,
GFP_NOFS);
+       filemap_invalidate_unlock(mapping);
```

**Root cause:** A race between:
1. `btrfs_read_disk_super()` allocating a folio via
   `read_cache_page_gfp()`
2. User space calling `BLKBSZSET` ioctl to change block size

When the block size changes mid-operation, the folio order requirements
change, causing either a BUG_ON or null pointer dereference.

**Why the fix works:** The `filemap_invalidate_lock()` provides
synchronization between the page cache reader and the block size setter
(in `set_blocksize()`), preventing the race.

### 3. CLASSIFICATION

- **Bug type:** Race condition leading to kernel crash (null pointer
  deref) or BUG_ON
- **Severity:** High - causes kernel panic/crash
- **Category:** Synchronization fix

### 4. SCOPE AND RISK ASSESSMENT

- **Lines changed:** 2 additions only
- **Files touched:** 1 (fs/btrfs/volumes.c)
- **Complexity:** Very low - just lock/unlock pair around existing call
- **Risk:** LOW - the lock is well-understood and used elsewhere for
  exactly this purpose

### 5. USER IMPACT

- **Who is affected:** Any btrfs user where concurrent block device
  operations occur
- **Triggering scenario:** User space modifying block device size while
  btrfs is mounting/reading superblock
- **Severity of impact:** Kernel crash (HIGH)

### 6. STABILITY INDICATORS

- **Reviewed-by:** Filipe Manana (senior btrfs developer at SUSE)
- **Multiple sign-offs:** Through the btrfs maintainer chain
- **Syzbot reported:** Reproducible issue

### 7. CRITICAL DEPENDENCY CHECK

**THIS IS THE KEY ISSUE:** The fix relies on `set_blocksize()` also
taking `filemap_invalidate_lock()`. This was added by commit
`c0e473a0d2264` ("block: fix race between set_blocksize and read paths")
which is **only in kernel 6.15+**.

Without that block layer fix:
- On kernels < 6.15: `set_blocksize()` does NOT take
  `filemap_invalidate_lock`, so this btrfs fix would NOT provide proper
  synchronization
- The btrfs fix by itself would be ineffective in preventing the race

### Risk vs Benefit Analysis

**Benefits:**
- Fixes real crash reported by syzbot
- Very small, surgical fix (2 lines)
- Well-reviewed by btrfs maintainers
- Follows established synchronization pattern

**Risks/Concerns:**
- **CRITICAL DEPENDENCY**: Requires `c0e473a0d2264` to be present for
  the synchronization to work
- Without the block layer fix, this btrfs fix is essentially a no-op
  (won't actually synchronize)
- The block layer fix is only in 6.15+, so backporting this btrfs fix to
  6.14 or earlier would be pointless without also backporting the block
  layer changes

### Conclusion

This commit fixes a real kernel crash (null pointer dereference) in
btrfs that was reported by syzbot. The fix itself is minimal and correct
- it adds proper locking around the `read_cache_page_gfp()` call to
synchronize with block size changes.

However, there is a **critical dependency issue**: This fix only
provides effective synchronization when paired with commit
`c0e473a0d2264` ("block: fix race between set_blocksize and read paths")
which adds the corresponding lock acquisition in `set_blocksize()`. That
commit is only in kernel 6.15+.

For stable trees **6.15.y and later**: This is a clear YES - fixes a
real crash with minimal risk.

For stable trees **6.14 and earlier**: The fix would be ineffective
without also backporting the block layer changes, which are more
invasive.

Given that the question is about backporting in general and this commit
would only be effective in 6.15+ (where the block layer infrastructure
exists), and the fix is correct, small, and addresses a real crash, the
answer depends on the target kernel. For the current stable (6.15+),
this should be backported.

**YES**

 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 48e717c105c35..8e7dcb12af4c4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1365,7 +1365,9 @@ struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
 				      (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
 	}

+	filemap_invalidate_lock(mapping);
 	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
+	filemap_invalidate_unlock(mapping);
 	if (IS_ERR(page))
 		return ERR_CAST(page);

-- 
2.51.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.18-6.12] btrfs: reject new transactions if the fs is fully read-only
       [not found] <20260202214643.212290-1-sashal@kernel.org>
  2026-02-02 21:46 ` [PATCH AUTOSEL 6.18] btrfs: fix Wmaybe-uninitialized warning in replay_one_buffer() Sasha Levin
  2026-02-02 21:46 ` [PATCH AUTOSEL 6.18] btrfs: sync read disk super and set block size Sasha Levin
@ 2026-02-02 21:46 ` Sasha Levin
  2 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-02-02 21:46 UTC (permalink / raw)
  To: patches, stable
  Cc: Qu Wenruo, Jiaming Zhang, Boris Burkov, Johannes Thumshirn,
	David Sterba, Sasha Levin, clm, linux-btrfs

From: Qu Wenruo <wqu@suse.com>

[ Upstream commit 1972f44c189c8aacde308fa9284e474c1a5cbd9f ]

[BUG]
There is a bug report where a heavily fuzzed fs is mounted with all
rescue mount options, which leads to the following warnings during
unmount:

  BTRFS: Transaction aborted (error -22)
  Modules linked in:
  CPU: 0 UID: 0 PID: 9758 Comm: repro.out Not tainted
  6.19.0-rc5-00002-gb71e635feefc #7 PREEMPT(full)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  RIP: 0010:find_free_extent_update_loop fs/btrfs/extent-tree.c:4208 [inline]
  RIP: 0010:find_free_extent+0x52f0/0x5d20 fs/btrfs/extent-tree.c:4611
  Call Trace:
   <TASK>
   btrfs_reserve_extent+0x2cd/0x790 fs/btrfs/extent-tree.c:4705
   btrfs_alloc_tree_block+0x1e1/0x10e0 fs/btrfs/extent-tree.c:5157
   btrfs_force_cow_block+0x578/0x2410 fs/btrfs/ctree.c:517
   btrfs_cow_block+0x3c4/0xa80 fs/btrfs/ctree.c:708
   btrfs_search_slot+0xcad/0x2b50 fs/btrfs/ctree.c:2130
   btrfs_truncate_inode_items+0x45d/0x2350 fs/btrfs/inode-item.c:499
   btrfs_evict_inode+0x923/0xe70 fs/btrfs/inode.c:5628
   evict+0x5f4/0xae0 fs/inode.c:837
   __dentry_kill+0x209/0x660 fs/dcache.c:670
   finish_dput+0xc9/0x480 fs/dcache.c:879
   shrink_dcache_for_umount+0xa0/0x170 fs/dcache.c:1661
   generic_shutdown_super+0x67/0x2c0 fs/super.c:621
   kill_anon_super+0x3b/0x70 fs/super.c:1289
   btrfs_kill_super+0x41/0x50 fs/btrfs/super.c:2127
   deactivate_locked_super+0xbc/0x130 fs/super.c:474
   cleanup_mnt+0x425/0x4c0 fs/namespace.c:1318
   task_work_run+0x1d4/0x260 kernel/task_work.c:233
   exit_task_work include/linux/task_work.h:40 [inline]
   do_exit+0x694/0x22f0 kernel/exit.c:971
   do_group_exit+0x21c/0x2d0 kernel/exit.c:1112
   __do_sys_exit_group kernel/exit.c:1123 [inline]
   __se_sys_exit_group kernel/exit.c:1121 [inline]
   __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1121
   x64_sys_call+0x2210/0x2210 arch/x86/include/generated/asm/syscalls_64.h:232
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0xe8/0xf80 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x44f639
  Code: Unable to access opcode bytes at 0x44f60f.
  RSP: 002b:00007ffc15c4e088 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
  RAX: ffffffffffffffda RBX: 00000000004c32f0 RCX: 000000000044f639
  RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001
  RBP: 0000000000000001 R08: ffffffffffffffc0 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004c32f0
  R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
   </TASK>

Since rescue mount options will mark the full fs read-only, there should
be no new transaction triggered.

But during unmount we will evict all inodes, which can trigger a new
transaction, and triggers warnings on a heavily corrupted fs.

[CAUSE]
Btrfs allows new transaction even on a read-only fs, this is to allow
log replay happen even on read-only mounts, just like what ext4/xfs do.

However with rescue mount options, the fs is fully read-only and cannot
be remounted read-write, thus in that case we should also reject any new
transactions.

[FIX]
If we find the fs has rescue mount options, we should treat the fs as
error, so that no new transaction can be started.

Reported-by: Jiaming Zhang <r772577952@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CANypQFYw8Nt8stgbhoycFojOoUmt+BoZ-z8WJOZVxcogDdwm=Q@mail.gmail.com/
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information I need to analyze this commit. Let me
provide a comprehensive analysis.

## Commit Analysis

### 1. COMMIT MESSAGE ANALYSIS

**Subject:** "btrfs: reject new transactions if the fs is fully read-
only"

The commit clearly describes a bug fix with:
- A detailed BUG section with a full kernel stack trace showing warnings
  during unmount with rescue mount options
- A CAUSE section explaining why this bug happens
- A FIX section describing the solution
- The commit is well-documented with `Reported-by:` tag and a link to
  the bug report
- Multiple reviews: `Reviewed-by: Boris Burkov`, `Reviewed-by: Johannes
  Thumshirn`, `Reviewed-by: David Sterba` (the btrfs maintainer)
- Signed off by the btrfs maintainer David Sterba

### 2. CODE CHANGE ANALYSIS

The fix consists of two parts:

**Part 1 - New helper function (fs/btrfs/disk-io.c):**
```c
static bool fs_is_full_ro(const struct btrfs_fs_info *fs_info)
{
    if (!sb_rdonly(fs_info->sb))
        return false;
    if (unlikely(fs_info->mount_opt & BTRFS_MOUNT_FULL_RO_MASK))
        return true;
    return false;
}
```
This helper checks if the filesystem is both read-only AND has any of
the rescue mount options set.

**Part 2 - Setting fs_error during mount (fs/btrfs/disk-io.c):**
```c
/* If the fs has any rescue options, no transaction is allowed. */
if (fs_is_full_ro(fs_info))
    WRITE_ONCE(fs_info->fs_error, -EROFS);
```
This sets the `fs_error` field during mount if rescue options are used.

**Part 3 - New macro (fs/btrfs/fs.h):**
```c
#define BTRFS_MOUNT_FULL_RO_MASK        \
    (BTRFS_MOUNT_NOLOGREPLAY |      \
     BTRFS_MOUNT_IGNOREBADROOTS |       \
     BTRFS_MOUNT_IGNOREDATACSUMS |      \
     BTRFS_MOUNT_IGNOREMETACSUMS |      \
     BTRFS_MOUNT_IGNORESUPERFLAGS)
```

**Why the fix works:**
The existing code in `start_transaction()` (transaction.c:612-613)
checks:
```c
if (BTRFS_FS_ERROR(fs_info))
    return ERR_PTR(-EROFS);
```

By setting `fs_error` to `-EROFS` during mount when rescue options are
present, any later attempt to start a transaction (including during
inode eviction at unmount time) will fail early and return `-EROFS`
instead of proceeding and hitting allocation failures.

### 3. CLASSIFICATION

This is a **bug fix**, not a feature addition. It fixes a real-world
issue where:
- Mounting a corrupted filesystem with rescue mount options (which are
  read-only only)
- Trying to unmount causes inode eviction
- Inode eviction triggers transaction creation
- Transaction starts but fails because the fs can't actually write (it's
  read-only with rescue options)
- Results in WARN_ON/oops and potential issues

The fix leverages the existing `fs_error` mechanism to reject
transactions early instead of letting them start and fail later with
confusing errors.

### 4. SCOPE AND RISK ASSESSMENT

**Lines changed:** ~20 lines added, no lines removed
**Files touched:** 2 files (fs/btrfs/disk-io.c, fs/btrfs/fs.h)
**Complexity:** Low - simple check and assignment

**Risk assessment:**
- **LOW RISK**: The change is surgical and leverages existing mechanisms
- The `fs_error` field and `BTRFS_FS_ERROR()` check are well-established
  in btrfs
- The rescue mount options already require read-only mode - this just
  ensures transaction rejection is enforced
- The fix is in the mount path, not a hot path
- Multiple reviews from btrfs maintainers including David Sterba

**Potential impact:**
- Could theoretically affect behavior if some code expected transactions
  to start (even if they'd fail later)
- But this is the correct behavior - rescue options should never allow
  writes

### 5. USER IMPACT

**Who is affected:**
- Users mounting heavily corrupted btrfs filesystems with rescue mount
  options (e.g., `rescue=all`, `rescue=ibadroots`, etc.)
- This is a common data recovery scenario

**Severity:**
- Without the fix: Users get confusing WARN_ON messages and potential
  issues during unmount
- With the fix: Clean rejection of transactions with clear `-EROFS`
  error

### 6. STABILITY INDICATORS

- The commit has been reviewed by 3 btrfs experts including the
  maintainer
- `Reviewed-by: Boris Burkov <boris@bur.io>`
- `Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>`
- `Reviewed-by: David Sterba <dsterba@suse.com>`
- Signed off by the maintainer

### 7. DEPENDENCY CHECK

- No dependencies on other commits
- The code uses existing infrastructure (`fs_error`, mount option flags)
- All the mount option flags (BTRFS_MOUNT_NOLOGREPLAY, etc.) exist in
  all stable kernels where btrfs rescue options exist
- The fix is self-contained

### CONCLUSION

**Recommendation: YES for backport**

This commit fixes a real bug that affects users mounting corrupted btrfs
filesystems with rescue mount options - a legitimate and important data
recovery scenario. The fix is:

1. **Small and surgical** - Only ~20 lines of simple, readable code
2. **Fixes a real bug** - Prevents warnings and potential issues during
   unmount with rescue options
3. **Low risk** - Uses existing `fs_error` mechanism, well-tested
   pattern
4. **Well reviewed** - Multiple reviews from btrfs maintainers
5. **No new features** - Just ensures correct rejection of transactions
   that can't succeed anyway
6. **Self-contained** - No dependencies on other patches

The fix ensures that when rescue mount options are used (which are
inherently read-only and cannot be remounted read-write), new
transactions are rejected cleanly at start rather than failing later
with confusing errors and warnings.

**YES**

 fs/btrfs/disk-io.c | 13 +++++++++++++
 fs/btrfs/fs.h      |  8 ++++++++
 2 files changed, 21 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a5336f530c8ed..932cd05043785 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3270,6 +3270,15 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount)
 	return 0;
 }

+static bool fs_is_full_ro(const struct btrfs_fs_info *fs_info)
+{
+	if (!sb_rdonly(fs_info->sb))
+		return false;
+	if (unlikely(fs_info->mount_opt & BTRFS_MOUNT_FULL_RO_MASK))
+		return true;
+	return false;
+}
+
 int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices)
 {
 	u32 sectorsize;
@@ -3378,6 +3387,10 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_ERROR)
 		WRITE_ONCE(fs_info->fs_error, -EUCLEAN);

+	/* If the fs has any rescue options, no transaction is allowed. */
+	if (fs_is_full_ro(fs_info))
+		WRITE_ONCE(fs_info->fs_error, -EROFS);
+
 	/* Set up fs_info before parsing mount options */
 	nodesize = btrfs_super_nodesize(disk_super);
 	sectorsize = btrfs_super_sectorsize(disk_super);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 814bbc9417d2a..37aa8d141a83d 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -250,6 +250,14 @@ enum {
 	BTRFS_MOUNT_REF_TRACKER			= (1ULL << 33),
 };

+/* These mount options require a full read-only fs, no new transaction is allowed. */
+#define BTRFS_MOUNT_FULL_RO_MASK		\
+	(BTRFS_MOUNT_NOLOGREPLAY |		\
+	 BTRFS_MOUNT_IGNOREBADROOTS |		\
+	 BTRFS_MOUNT_IGNOREDATACSUMS |		\
+	 BTRFS_MOUNT_IGNOREMETACSUMS |		\
+	 BTRFS_MOUNT_IGNORESUPERFLAGS)
+
 /*
  * Compat flags that we support.  If any incompat flags are set other than the
  * ones specified below then we will fail to mount
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-02 21:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260202214643.212290-1-sashal@kernel.org>
2026-02-02 21:46 ` [PATCH AUTOSEL 6.18] btrfs: fix Wmaybe-uninitialized warning in replay_one_buffer() Sasha Levin
2026-02-02 21:46 ` [PATCH AUTOSEL 6.18] btrfs: sync read disk super and set block size Sasha Levin
2026-02-02 21:46 ` [PATCH AUTOSEL 6.18-6.12] btrfs: reject new transactions if the fs is fully read-only Sasha Levin
     [not found] <20260128223332.2806589-1-sashal@kernel.org>
2026-01-28 22:33 ` [PATCH AUTOSEL 6.18] btrfs: sync read disk super and set block size Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox