public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Andreas Gruenbacher <agruenba@redhat.com>,
	Sasha Levin <sashal@kernel.org>,
	gfs2@lists.linux.dev
Subject: [PATCH AUTOSEL 6.19-5.15] gfs2: fiemap page fault fix
Date: Tue, 10 Feb 2026 18:30:47 -0500	[thread overview]
Message-ID: <20260210233123.2905307-2-sashal@kernel.org> (raw)
In-Reply-To: <20260210233123.2905307-1-sashal@kernel.org>

From: Andreas Gruenbacher <agruenba@redhat.com>

[ Upstream commit e411d74cc5ba290f85d0dd5e4d1df8f1d6d975d2 ]

In gfs2_fiemap(), we are calling iomap_fiemap() while holding the inode
glock.  This can lead to recursive glock taking if the fiemap buffer is
memory mapped to the same inode and accessing it triggers a page fault.

Fix by disabling page faults for iomap_fiemap() and faulting in the
buffer by hand if necessary.

Fixes xfstest generic/742.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have the complete picture. Let me write up the full analysis.

---

## Detailed Analysis

### 1. Commit Message Analysis

The commit subject is "gfs2: fiemap page fault fix" — a clear bug fix.
The body explains a concrete deadlock scenario: calling `iomap_fiemap()`
while holding the inode glock can lead to **recursive glock taking** if
the fiemap buffer is memory-mapped to the same inode. It also says it
"Fixes xfstest generic/742," which is a test specifically designed to
catch this class of bug.

The author is Andreas Gruenbacher, the GFS2 maintainer and the same
person who wrote the original mmap + page fault deadlock fixes for
GFS2's buffered I/O (commit `00bfe02f47968`) and direct I/O (commit
`b01b2d72da25c`) paths — both merged in v5.16.

### 2. Code Change Analysis — The Deadlock Mechanism

The deadlock path is clear and concrete:

1. **`gfs2_fiemap()`** acquires the inode glock `ip->i_gl` in
   `LM_ST_SHARED` mode via `gfs2_glock_nq_init()`
2. **`iomap_fiemap()`** iterates over file extents and calls
   **`iomap_to_fiemap()`** → **`fiemap_fill_next_extent()`**
3. `fiemap_fill_next_extent()` uses **`copy_to_user()`** to write extent
   data to the user-supplied buffer (line 144 of `fs/ioctl.c`)
4. If that user buffer is **mmap'd to the same file**, `copy_to_user()`
   triggers a page fault
5. The page fault enters **`gfs2_page_mkwrite()`** (for write faults) or
   **`gfs2_fault()`** (for read faults)
6. **`gfs2_page_mkwrite()`** tries to acquire `ip->i_gl` in
   `LM_ST_EXCLUSIVE` mode — **deadlock**: it conflicts with the already-
   held `LM_ST_SHARED`. Even `gfs2_fault()` would attempt another
   `LM_ST_SHARED`, which in GFS2's distributed locking model can also
   cause issues (the glock holder tracks are per-process, and in a
   clustered scenario this can cause distributed deadlocks).

### 3. The Fix — Well-Established Pattern

The fix follows the **exact same pattern** already used in 4+ locations
within `fs/gfs2/file.c`:

```2195:2212:fs/gfs2/inode.c
retry:
        ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, 0, &gh);
        if (ret)
                goto out;

        pagefault_disable();
        ret = iomap_fiemap(inode, fieinfo, start, len, &gfs2_iomap_ops);
        pagefault_enable();

        gfs2_glock_dq_uninit(&gh);

        if (ret == -EFAULT && fault_in_fiemap(fieinfo)) {
                fieinfo->fi_extents_mapped = 0;
                goto retry;
        }
```

The pattern is:
- **`pagefault_disable()`** before the operation that touches user
  memory while holding the glock
- **`pagefault_enable()`** after
- If `-EFAULT` is returned (page fault was needed but prevented), **drop
  the glock**, **manually fault in** the user pages, and **retry**

The helper `fault_in_fiemap()` calls `fault_in_safe_writeable()` on the
destination buffer, which is a non-destructive way to ensure pages are
resident without holding any filesystem locks.

### 4. Cross-Subsystem Validation

The **exact same bug class** was found and fixed in OCFS2
(`04100f775c2ea`), reported by syzbot, and explicitly tagged `Cc:
stable@vger.kernel.org`. That fix had a stack trace showing the deadlock
happening during `copy_to_user` in `fiemap_fill_next_extent` → recursive
semaphore acquisition in `ocfs2_page_mkwrite`. The GFS2 case is
structurally identical.

### 5. Scope and Risk Assessment

- **Lines changed**: +20 lines in a single file (`fs/gfs2/inode.c`)
- **New helper**: `fault_in_fiemap()` — 6 lines, trivial wrapper around
  `fault_in_safe_writeable()`
- **Dependencies**: `fault_in_safe_writeable()` exists in all current
  LTS trees (v6.1, v6.6, v6.12)
- **Clean apply**: The `gfs2_fiemap()` function is byte-for-byte
  identical in v6.1, v6.6, v6.12, and mainline. The patch applies
  cleanly.
- **Risk**: Very low. The fix uses a well-established pattern from the
  same subsystem. The only behavioral change is adding retry logic for
  the (previously impossible) case where the user buffer is mmap'd to
  the same file. Normal fiemap calls are unaffected (no page faults, no
  retry).

### 6. User Impact

- **Severity**: **Deadlock** — system hang for the process, and on
  clustered GFS2, potential cluster-wide impact
- **Trigger**: User-space can trigger this by mmap'ing a GFS2 file and
  passing the mmap'd buffer to `FS_IOC_FIEMAP`
- **Affected users**: All GFS2 users, especially those running xfstests
  or using fiemap on clustered filesystems
- **Test coverage**: xfstest generic/742 specifically tests this
  scenario

### 7. Stability and Quality Indicators

- Author is the GFS2 subsystem maintainer (Andreas Gruenbacher)
- Pattern is proven and battle-tested in the same codebase since v5.16
- The analogous OCFS2 fix was syzbot-reported and stable-tagged
- The fix has a dedicated xfstest (generic/742)

### 8. Dependency Check

No external dependencies. The fix is self-contained:
- `pagefault_disable/enable` — core kernel functions, always available
- `fault_in_safe_writeable` — available since v5.16, present in all
  current LTS trees
- The `gfs2_fiemap` function is unchanged across all LTS trees

### Conclusion

This commit fixes a **real deadlock** in GFS2's fiemap path that can be
triggered from user space. The fix is small (20 lines), self-contained,
follows a well-established pattern already used in the same subsystem,
applies cleanly to all current stable trees, and has a dedicated
xfstest. The same bug class in OCFS2 was considered important enough for
syzbot reporting and explicit stable tagging. This is a textbook stable
backport candidate.

**YES**

 fs/gfs2/inode.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 36618e3531996..ea03c23bd147b 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -2182,6 +2182,14 @@ static int gfs2_getattr(struct mnt_idmap *idmap,
 	return 0;
 }
 
+static bool fault_in_fiemap(struct fiemap_extent_info *fi)
+{
+	struct fiemap_extent __user *dest = fi->fi_extents_start;
+	size_t size = sizeof(*dest) * fi->fi_extents_max;
+
+	return fault_in_safe_writeable((char __user *)dest, size) == 0;
+}
+
 static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		       u64 start, u64 len)
 {
@@ -2191,14 +2199,22 @@ static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 
 	inode_lock_shared(inode);
 
+retry:
 	ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, 0, &gh);
 	if (ret)
 		goto out;
 
+	pagefault_disable();
 	ret = iomap_fiemap(inode, fieinfo, start, len, &gfs2_iomap_ops);
+	pagefault_enable();
 
 	gfs2_glock_dq_uninit(&gh);
 
+	if (ret == -EFAULT && fault_in_fiemap(fieinfo)) {
+		fieinfo->fi_extents_mapped = 0;
+		goto retry;
+	}
+
 out:
 	inode_unlock_shared(inode);
 	return ret;
-- 
2.51.0


  reply	other threads:[~2026-02-10 23:31 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 23:30 [PATCH AUTOSEL 6.19-6.12] i3c: mipi-i3c-hci: Reset RING_OPERATION1 fields during init Sasha Levin
2026-02-10 23:30 ` Sasha Levin [this message]
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.18] dlm: fix recovery pending middle conversion Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.6] smb: client: prevent races in ->query_interfaces() Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.12] i3c: mipi-i3c-hci: Ensure proper bus clean-up Sasha Levin
2026-02-11  7:56   ` Adrian Hunter
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-5.10] audit: add fchmodat2() to change attributes class Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.12] btrfs: fallback to buffered IO if the data profile has duplication Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19] btrfs: don't BUG() on unexpected delayed ref type in run_one_delayed_ref() Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.12] i3c: mipi-i3c-hci-pci: Add System Suspend support Sasha Levin
2026-02-11  7:57   ` Adrian Hunter
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.18] hfsplus: fix volume corruption issue for generic/480 Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.18] kselftest/kublk: include message in _Static_assert for C11 compatibility Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.12] dlm: validate length in dlm_search_rsb_tree Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.18] i3c: mipi-i3c-hci: Stop reading Extended Capabilities if capability ID is 0 Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.1] fs/buffer: add alert in try_to_free_buffers() for folios without buffers Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-5.15] i3c: master: svc: Initialize 'dev' to NULL in svc_i3c_master_ibi_isr() Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-6.12] statmount: permission check should return EPERM Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-5.10] audit: add missing syscalls to read class Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-5.10] hfsplus: pretend special inodes as regular files Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-5.10] hfsplus: fix volume corruption issue for generic/498 Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-6.18] netfs: when subreq is marked for retry, do not check if it faced an error Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19] hfs: Replace BUG_ON with error handling for CNID count checks Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-6.1] smb: client: add proper locking around ses->iface_last_update Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-6.6] btrfs: handle user interrupt properly in btrfs_trim_fs() Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-5.10] minix: Add required sanity checking to minix_check_superblock() Sasha Levin
2026-02-11  7:56 ` [PATCH AUTOSEL 6.19-6.12] i3c: mipi-i3c-hci: Reset RING_OPERATION1 fields during init Adrian Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260210233123.2905307-2-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=agruenba@redhat.com \
    --cc=gfs2@lists.linux.dev \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox