[PATCH AUTOSEL 6.17-5.4] allow finish_no_open(file, ERR

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.17-5.4] allow finish_no_open(file, ERR_PTR(-E...))
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
@ 2025-10-25 15:54 ` Sasha Levin
  2025-10-25 15:55 ` [PATCH AUTOSEL 6.17-6.6] fuse: zero initialize inode private data Sasha Levin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2025-10-25 15:54 UTC (permalink / raw)
  To: patches, stable; +Cc: Al Viro, NeilBrown, Sasha Levin, brauner, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

[ Upstream commit fe91e078b60d1beabf5cef4a37c848457a6d2dfb ]

... allowing any ->lookup() return value to be passed to it.

Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES – this commit closes a real VFS bug that lets `finish_no_open()`
report success after being handed an `ERR_PTR`, so it needs to go to
stable.

- `fs/open.c:1072-1077` now returns `PTR_ERR(dentry)` when `dentry`
  encodes an error, instead of stashing that poison pointer in
  `file->__f_path` and reporting success. This is the core fix.
- Without it, any filesystem that forwards a `->lookup()` result
  straight into `finish_no_open()`—for example FUSE
  (`fs/fuse/dir.c:746-753`), CIFS/SMB (`fs/smb/client/dir.c:488-496`),
  NFS (`fs/nfs/dir.c:2174-2201`), 9p, Ceph, vboxsf—will propagate
  negative dentries as if they were successes. `atomic_open()` then
  dereferences the bogus pointer in its success path
  (`fs/namei.c:3668-3673`), leading to immediate crashes on routine
  errors like `-EACCES`, `-EIO`, or allocation failures.
- The documentation update in `fs/open.c:1061-1070` captures the
  intended ABI: `finish_no_open()` must accept every `->lookup()` return
  value (valid, `NULL`, or `ERR_PTR`). The previous implementation
  violated that contract, so this is a bugfix, not a feature change.
- Risk is minimal: the change is self-contained, touches no callers, and
  simply short-circuits on the already-known error condition.
  Backporting does not require the later “simplify …atomic_open”
  cleanups; it just hardens the exported helper so existing stable code
  can’t corrupt `file->f_path`.

Natural follow-up: run the usual filesystem open/lookup regression tests
(especially on FUSE/CIFS/NFS) after picking the patch.

 fs/open.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 9655158c38853..4890b13461c7b 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1059,18 +1059,20 @@ EXPORT_SYMBOL(finish_open);
  * finish_no_open - finish ->atomic_open() without opening the file
  *
  * @file: file pointer
- * @dentry: dentry or NULL (as returned from ->lookup())
+ * @dentry: dentry, ERR_PTR(-E...) or NULL (as returned from ->lookup())
  *
- * This can be used to set the result of a successful lookup in ->atomic_open().
+ * This can be used to set the result of a lookup in ->atomic_open().
  *
  * NB: unlike finish_open() this function does consume the dentry reference and
  * the caller need not dput() it.
  *
- * Returns "0" which must be the return value of ->atomic_open() after having
- * called this function.
+ * Returns 0 or -E..., which must be the return value of ->atomic_open() after
+ * having called this function.
  */
 int finish_no_open(struct file *file, struct dentry *dentry)
 {
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
 	file->f_path.dentry = dentry;
 	return 0;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 6.17-6.6] fuse: zero initialize inode private data
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
  2025-10-25 15:54 ` [PATCH AUTOSEL 6.17-5.4] allow finish_no_open(file, ERR_PTR(-E...)) Sasha Levin
@ 2025-10-25 15:55 ` Sasha Levin
  2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] virtio_fs: fix the hash table using in virtio_fs_enqueue_req() Sasha Levin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2025-10-25 15:55 UTC (permalink / raw)
  To: patches, stable
  Cc: Miklos Szeredi, Chunsheng Luo, Sasha Levin, miklos, linux-fsdevel

From: Miklos Szeredi <mszeredi@redhat.com>

[ Upstream commit 3ca1b311181072415b6432a169de765ac2034e5a ]

This is slightly tricky, since the VFS uses non-zeroing allocation to
preserve some fields that are left in a consistent state.

Reported-by: Chunsheng Luo <luochunsheng@ustc.edu>
Closes: https://lore.kernel.org/all/20250818083224.229-1-luochunsheng@ustc.edu/
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

- What changed
  - In `fs/fuse/inode.c:98`, `fuse_alloc_inode()` now zero-initializes
    the entire FUSE-private part of `struct fuse_inode` while explicitly
    preserving the embedded VFS `struct inode`:
    - `fs/fuse/inode.c:106` adds a layout check:
      `BUILD_BUG_ON(offsetof(struct fuse_inode, inode) != 0);`
    - `fs/fuse/inode.c:108` zeroes everything beyond `fi->inode`:
      `memset((void *)fi + sizeof(fi->inode), 0, sizeof(*fi) -
      sizeof(fi->inode));`
  - The manual piecemeal initialization of only a handful of fields
    (e.g., `i_time`, `nodeid`, `nlookup`, `attr_version`, `orig_ino`,
    `state`, `submount_lookup`) is removed and replaced by the blanket
    private-data zeroing.
  - The invariants and locks are still set after zeroing:
    - `fi->inval_mask = ~0;` (`fs/fuse/inode.c:110`)
    - `mutex_init(&fi->mutex);` (`fs/fuse/inode.c:111`)
    - `spin_lock_init(&fi->lock);` (`fs/fuse/inode.c:112`)
    - `fi->forget = fuse_alloc_forget();` (`fs/fuse/inode.c:113`)
    - DAX and passthrough helpers remain unchanged
      (`fs/fuse/inode.c:117`, `fs/fuse/inode.c:120`).

- Why this fixes a real bug
  - Inode objects are allocated via `alloc_inode_sb()`, which is a non-
    zeroing slab allocation (`include/linux/fs.h:3407` →
    `kmem_cache_alloc_lru`). This means previously freed memory content
    can persist in new `struct fuse_inode` instances unless explicitly
    cleared.
  - Before this change, FUSE only zeroed a subset of private fields,
    leaving many newly added or less obvious fields uninitialized/stale,
    which can lead to incorrect behavior. Examples:
    - `fi->cached_i_blkbits` is used by cached getattr to compute
      `stat->blksize` without a server roundtrip (`fs/fuse/dir.c:1373`).
      If not initialized, userspace can observe garbage or stale block
      sizes when using cached attributes.
    - `fi->i_time` controls attribute staleness; it must start from a
      known baseline to force initial refresh (it’s now guaranteed
      zeroed before being set; previously it was explicitly written, but
      other related fields were not).
    - Readdir cache state in `fi->rdc.*` (e.g. `cached`, `pos`, `size`,
      `version`) must start clean, and is explicitly initialized only in
      `fuse_init_dir()` (`fs/fuse/dir.c:2266`). Zeroing ensures no stale
      values leak in the interim.
    - File-io cache accounting (`fi->iocachectr`, waitqueues and lists)
      is initialized in `fuse_init_file_inode()`
      (`fs/fuse/file.c:3121`–`fs/fuse/file.c:3136`); zeroing up front
      prevents spurious non-zero counters or garbage pointers before
      that init runs.
    - Passthrough backing file pointer `fi->fb` (present with
      `CONFIG_FUSE_PASSTHROUGH`) is now guaranteed NULL initially; the
      code also explicitly sets it via `fuse_inode_backing_set(fi,
      NULL)` (`fs/fuse/inode.c:120`). Zeroing avoids any transient stale
      pointer exposure.
  - This change conforms to the VFS model of non-zeroing allocation: it
    deliberately preserves `struct inode` (the part the VFS expects to
    keep stable) and only clears the FUSE-private tail. The
    `BUILD_BUG_ON` enforces the assumption that `inode` is the first
    field.

- Scope and risk
  - The fix is small, localized to a single function in FUSE, and does
    not modify any public interfaces or core VFS behavior.
  - It reduces risk by eliminating uninitialized data usage and
    potential state inconsistencies from inode slab reuse.
  - It is defensive across existing and future FUSE private fields,
    avoiding the need to remember to add manual zeroing for every new
    field.

- Dependencies and backport considerations
  - The code relies on standard kernel primitives: `offsetof`,
    `BUILD_BUG_ON`, and existing FUSE helpers. No architectural changes.
  - `alloc_inode_sb()` non-zeroing semantics are already present in
    stable series (see `include/linux/fs.h:3407`), so the bug exists
    there too.
  - The patch does not depend on other new features; it should apply
    cleanly or be trivial to adapt in stable trees that have the nearby
    code structure.

- User impact
  - Prevents user-visible inconsistencies (e.g., wrong `blksize` values)
    and eliminates potential undefined behavior from stale per-inode
    private state across reuse.
  - Also improves robustness against uninitialized reads that could
    manifest as rare warnings or subtle regressions.

Given it fixes a correctness bug with minimal, contained changes and
clear safety benefits, this commit is a good candidate for backporting
to stable trees.

 fs/fuse/inode.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 7ddfd2b3cc9c4..7c0403a002e75 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -101,14 +101,11 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
 	if (!fi)
 		return NULL;
 
-	fi->i_time = 0;
+	/* Initialize private data (i.e. everything except fi->inode) */
+	BUILD_BUG_ON(offsetof(struct fuse_inode, inode) != 0);
+	memset((void *) fi + sizeof(fi->inode), 0, sizeof(*fi) - sizeof(fi->inode));
+
 	fi->inval_mask = ~0;
-	fi->nodeid = 0;
-	fi->nlookup = 0;
-	fi->attr_version = 0;
-	fi->orig_ino = 0;
-	fi->state = 0;
-	fi->submount_lookup = NULL;
 	mutex_init(&fi->mutex);
 	spin_lock_init(&fi->lock);
 	fi->forget = fuse_alloc_forget();
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 6.17] virtio_fs: fix the hash table using in virtio_fs_enqueue_req()
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
  2025-10-25 15:54 ` [PATCH AUTOSEL 6.17-5.4] allow finish_no_open(file, ERR_PTR(-E...)) Sasha Levin
  2025-10-25 15:55 ` [PATCH AUTOSEL 6.17-6.6] fuse: zero initialize inode private data Sasha Levin
@ 2025-10-25 15:57 ` Sasha Levin
  2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-6.12] exfat: validate cluster allocation bits of the allocation bitmap Sasha Levin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2025-10-25 15:57 UTC (permalink / raw)
  To: patches, stable
  Cc: Li RongQing, Fushuai Wang, Stefan Hajnoczi, Miklos Szeredi,
	Sasha Levin, miklos, vgoyal, linux-fsdevel, virtualization

From: Li RongQing <lirongqing@baidu.com>

[ Upstream commit 7dbe6442487743ad492d9143f1f404c1f4a05e0e ]

The original commit be2ff42c5d6e ("fuse: Use hash table to link
processing request") converted fuse_pqueue->processing to a hash table,
but virtio_fs_enqueue_req() was not updated to use it correctly.
So use fuse_pqueue->processing as a hash table, this make the code
more coherent

Co-developed-by: Fushuai Wang <wangfushuai@baidu.com>
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

**Why This Is A Bugfix**
- The earlier change “fuse: Use hash table to link processing request”
  (be2ff42c5d6e) converted `fuse_pqueue->processing` from a single list
  to a hash table. You can see the hash table parameters and the data
  structure:
  - `FUSE_PQ_HASH_BITS` and `FUSE_PQ_HASH_SIZE`: fs/fuse/fuse_i.h:546
  - `struct fuse_pqueue { struct list_head *processing; }`:
    fs/fuse/fuse_i.h:556
  - Allocation as an array of `list_head` buckets: fs/fuse/inode.c:1622
- Responses are looked up by hashing the request ID and searching only
  that bucket:
  - `fuse_request_find()` iterates `&fpq->processing[hash]`:
    fs/fuse/dev.c:2131
- Before this fix, `virtio_fs_enqueue_req()` added every request to the
  list head pointer (effectively bucket 0) instead of the hashed bucket.
  That makes replies unfindable for non-zero buckets, leading to -ENOENT
  on reply processing and stuck/hung requests.

**What The Patch Changes**
- Export the hash function so virtio-fs can use it:
  - `fuse_req_hash()` now exported: fs/fuse/dev.c:321
- Ensure virtio-fs adds requests to the correct bucket:
  - Include FUSE device internals: fs/fuse/virtio_fs.c:23
  - Compute the bucket: `hash = fuse_req_hash(req->in.h.unique);`:
    fs/fuse/virtio_fs.c:1445
  - Enqueue into the correct bucket: `list_add_tail(&req->list,
    &fpq->processing[hash]);`: fs/fuse/virtio_fs.c:1447
  - Function definition location for context: fs/fuse/virtio_fs.c:1370

**Impact and Risk**
- User-visible bugfix: Without this, replies cannot be matched to
  requests (except those hashing to bucket 0), causing request
  completion failures and potential hangs in virtio-fs workloads.
- Small and contained: Two files touched; logic change is limited to
  correctly hashing and inserting into the right bucket, plus exporting
  a helper symbol.
- No architectural changes: Keeps the existing hash-table design; simply
  uses it correctly.
- Stable-friendly: Minimal risk of regression, no new features, fixes
  incorrect behavior.

**Backport Conditions**
- This should be backported to any stable series that already includes
  the conversion of `processing` to a hash table (be2ff42c5d6e). If a
  stable series predates that change (i.e., `processing` is still a
  single list), this patch is not applicable.
- The export `EXPORT_SYMBOL_GPL(fuse_req_hash)` (fs/fuse/dev.c:321) is
  required so `virtio_fs` can link against it. This is an internal, GPL-
  only symbol used by in-tree code and is appropriate for stable.

 fs/fuse/dev.c       | 1 +
 fs/fuse/virtio_fs.c | 6 ++++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index dbf53c7bc8535..612d4da6d7d91 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -322,6 +322,7 @@ unsigned int fuse_req_hash(u64 unique)
 {
 	return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS);
 }
+EXPORT_SYMBOL_GPL(fuse_req_hash);
 
 /*
  * A new request is available, wake fiq->waitq
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 76c8fd0bfc75d..1751cd6e3d42b 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -20,6 +20,7 @@
 #include <linux/cleanup.h>
 #include <linux/uio.h>
 #include "fuse_i.h"
+#include "fuse_dev_i.h"
 
 /* Used to help calculate the FUSE connection's max_pages limit for a request's
  * size. Parts of the struct fuse_req are sliced into scattergather lists in
@@ -1384,7 +1385,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 	unsigned int out_sgs = 0;
 	unsigned int in_sgs = 0;
 	unsigned int total_sgs;
-	unsigned int i;
+	unsigned int i, hash;
 	int ret;
 	bool notify;
 	struct fuse_pqueue *fpq;
@@ -1444,8 +1445,9 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 
 	/* Request successfully sent. */
 	fpq = &fsvq->fud->pq;
+	hash = fuse_req_hash(req->in.h.unique);
 	spin_lock(&fpq->lock);
-	list_add_tail(&req->list, fpq->processing);
+	list_add_tail(&req->list, &fpq->processing[hash]);
 	spin_unlock(&fpq->lock);
 	set_bit(FR_SENT, &req->flags);
 	/* matches barrier in request_wait_answer() */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 6.17-6.12] exfat: validate cluster allocation bits of the allocation bitmap
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] virtio_fs: fix the hash table using in virtio_fs_enqueue_req() Sasha Levin
@ 2025-10-25 15:58 ` Sasha Levin
  2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-5.10] exfat: limit log print for IO error Sasha Levin
  2025-10-25 16:01 ` [PATCH AUTOSEL 6.17] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2025-10-25 15:58 UTC (permalink / raw)
  To: patches, stable
  Cc: Namjae Jeon, syzbot+a725ab460fc1def9896f, Yuezhang Mo,
	Sungjong Seo, Sasha Levin, linux-fsdevel

From: Namjae Jeon <linkinjeon@kernel.org>

[ Upstream commit 79c1587b6cda74deb0c86fc7ba194b92958c793c ]

syzbot created an exfat image with cluster bits not set for the allocation
bitmap. exfat-fs reads and uses the allocation bitmap without checking
this. The problem is that if the start cluster of the allocation bitmap
is 6, cluster 6 can be allocated when creating a directory with mkdir.
exfat zeros out this cluster in exfat_mkdir, which can delete existing
entries. This can reallocate the allocated entries. In addition,
the allocation bitmap is also zeroed out, so cluster 6 can be reallocated.
This patch adds exfat_test_bitmap_range to validate that clusters used for
the allocation bitmap are correctly marked as in-use.

Reported-by: syzbot+a725ab460fc1def9896f@syzkaller.appspotmail.com
Tested-by: syzbot+a725ab460fc1def9896f@syzkaller.appspotmail.com
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES
- The commit adds `exfat_test_bitmap_range()` to verify that every
  cluster backing the allocation bitmap file has its allocation bit set
  before the filesystem accepts the bitmap (`fs/exfat/balloc.c:29`).
  Without this guard, the mount path would happily proceed even when the
  bitmap’s own clusters are marked free, exactly the corruption pattern
  syzbot reported.
- The new helper simply walks the existing bitmap pages
  (`sbi->vol_amap`) and checks the relevant bits with the existing
  macros; on any mismatch it rejects the volume with `-EIO`, preventing
  us from ever reaching the allocator that can hand the bitmap’s cluster
  to new directories (`fs/exfat/balloc.c:108`, `fs/exfat/balloc.c:114`).
  This is a small, self-contained mount-time validation step.
- The bug being fixed is high severity: when the bitmap cluster is
  falsely free, `exfat_alloc_cluster()` can select it and zero the data
  while creating a directory (`fs/exfat/fatent.c:381` onward),
  destroying the bitmap and any directory entries stored there. The
  patch blocks that corruption before it can happen.
- Risk of regression is minimal—the helper only reads data we already
  loaded, relies on longstanding helpers/macros, and touches no runtime
  paths once the bitmap validates. If the check fails we already have to
  bail out because the on-disk image is inconsistent; no new behavior
  appears for well-formed volumes.
- The change stands on its own (no functional dependencies on later
  commits), fixes a real user-visible corruption scenario, and adheres
  to stable-tree guidance (bug fix, limited scope, no architectural
  churn). Backporting will materially improve resilience of exFAT mounts
  against malformed media.

 fs/exfat/balloc.c | 72 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 60 insertions(+), 12 deletions(-)

diff --git a/fs/exfat/balloc.c b/fs/exfat/balloc.c
index cc01556c9d9b3..071448adbd5d9 100644
--- a/fs/exfat/balloc.c
+++ b/fs/exfat/balloc.c
@@ -26,12 +26,55 @@
 /*
  *  Allocation Bitmap Management Functions
  */
+static bool exfat_test_bitmap_range(struct super_block *sb, unsigned int clu,
+		unsigned int count)
+{
+	struct exfat_sb_info *sbi = EXFAT_SB(sb);
+	unsigned int start = clu;
+	unsigned int end = clu + count;
+	unsigned int ent_idx, i, b;
+	unsigned int bit_offset, bits_to_check;
+	__le_long *bitmap_le;
+	unsigned long mask, word;
+
+	if (!is_valid_cluster(sbi, start) || !is_valid_cluster(sbi, end - 1))
+		return false;
+
+	while (start < end) {
+		ent_idx = CLUSTER_TO_BITMAP_ENT(start);
+		i = BITMAP_OFFSET_SECTOR_INDEX(sb, ent_idx);
+		b = BITMAP_OFFSET_BIT_IN_SECTOR(sb, ent_idx);
+
+		bitmap_le = (__le_long *)sbi->vol_amap[i]->b_data;
+
+		/* Calculate how many bits we can check in the current word */
+		bit_offset = b % BITS_PER_LONG;
+		bits_to_check = min(end - start,
+				    (unsigned int)(BITS_PER_LONG - bit_offset));
+
+		/* Create a bitmask for the range of bits to check */
+		if (bits_to_check >= BITS_PER_LONG)
+			mask = ~0UL;
+		else
+			mask = ((1UL << bits_to_check) - 1) << bit_offset;
+		word = lel_to_cpu(bitmap_le[b / BITS_PER_LONG]);
+
+		/* Check if all bits in the mask are set */
+		if ((word & mask) != mask)
+			return false;
+
+		start += bits_to_check;
+	}
+
+	return true;
+}
+
 static int exfat_allocate_bitmap(struct super_block *sb,
 		struct exfat_dentry *ep)
 {
 	struct exfat_sb_info *sbi = EXFAT_SB(sb);
 	long long map_size;
-	unsigned int i, need_map_size;
+	unsigned int i, j, need_map_size;
 	sector_t sector;
 
 	sbi->map_clu = le32_to_cpu(ep->dentry.bitmap.start_clu);
@@ -58,20 +101,25 @@ static int exfat_allocate_bitmap(struct super_block *sb,
 	sector = exfat_cluster_to_sector(sbi, sbi->map_clu);
 	for (i = 0; i < sbi->map_sectors; i++) {
 		sbi->vol_amap[i] = sb_bread(sb, sector + i);
-		if (!sbi->vol_amap[i]) {
-			/* release all buffers and free vol_amap */
-			int j = 0;
-
-			while (j < i)
-				brelse(sbi->vol_amap[j++]);
-
-			kvfree(sbi->vol_amap);
-			sbi->vol_amap = NULL;
-			return -EIO;
-		}
+		if (!sbi->vol_amap[i])
+			goto err_out;
 	}
 
+	if (exfat_test_bitmap_range(sb, sbi->map_clu,
+		EXFAT_B_TO_CLU_ROUND_UP(map_size, sbi)) == false)
+		goto err_out;
+
 	return 0;
+
+err_out:
+	j = 0;
+	/* release all buffers and free vol_amap */
+	while (j < i)
+		brelse(sbi->vol_amap[j++]);
+
+	kvfree(sbi->vol_amap);
+	sbi->vol_amap = NULL;
+	return -EIO;
 }
 
 int exfat_load_bitmap(struct super_block *sb)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 6.17-5.10] exfat: limit log print for IO error
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-6.12] exfat: validate cluster allocation bits of the allocation bitmap Sasha Levin
@ 2025-10-25 16:00 ` Sasha Levin
  2025-10-25 16:01 ` [PATCH AUTOSEL 6.17] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2025-10-25 16:00 UTC (permalink / raw)
  To: patches, stable
  Cc: Chi Zhiling, Namjae Jeon, Sasha Levin, sj1557.seo, linux-fsdevel

From: Chi Zhiling <chizhiling@kylinos.cn>

[ Upstream commit 6dfba108387bf4e71411b3da90b2d5cce48ba054 ]

For exFAT filesystems with 4MB read_ahead_size, removing the storage device
when the read operation is in progress, which cause the last read syscall
spent 150s [1]. The main reason is that exFAT generates excessive log
messages [2].

After applying this patch, approximately 300,000 lines of log messages
were suppressed, and the delay of the last read() syscall was reduced
to about 4 seconds.

[1]:
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000120>
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000032>
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000119>
read(4, 0x7fccf28ae000, 131072)         = -1 EIO (Input/output error) <150.186215>

[2]:
[  333.696603] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5)
[  333.697378] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5)
[  333.698156] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5)

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES. The change switches the hot error sites in `exfat_ent_get()` to the
already-available rate-limited logger (`fs/exfat/fatent.c:96`, `:104`,
`:112`, `:120`, `:128`), so the driver still returns the same `-EIO`
codes but no longer emits hundreds of thousands of synchronous
`exfat_fs_error()` messages that were stretching the failing read path
to ~150 s. Because `exfat_fs_error_ratelimit()` simply gates the printk
while preserving the remount/panic handling in `__exfat_fs_error()`
(`fs/exfat/exfat_fs.h:555-561`, `fs/exfat/misc.c:26-47`), correctness
isn’t touched. The rate limiter is already initialized per superblock
(`fs/exfat/super.c:803-808`) and used in other exFAT paths
(`fs/exfat/inode.c:288`), so the patch aligns this high-frequency error
path with existing practice. This is a contained fix for a user-visible
regression (extreme latency when media disappears), introduces no
architectural changes, and applies cleanly to stable trees that already
ship the same infrastructure.

 fs/exfat/fatent.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/exfat/fatent.c b/fs/exfat/fatent.c
index 232cc7f8ab92f..825083634ba2d 100644
--- a/fs/exfat/fatent.c
+++ b/fs/exfat/fatent.c
@@ -89,35 +89,36 @@ int exfat_ent_get(struct super_block *sb, unsigned int loc,
 	int err;

 	if (!is_valid_cluster(sbi, loc)) {
-		exfat_fs_error(sb, "invalid access to FAT (entry 0x%08x)",
+		exfat_fs_error_ratelimit(sb,
+			"invalid access to FAT (entry 0x%08x)",
 			loc);
 		return -EIO;
 	}

 	err = __exfat_ent_get(sb, loc, content);
 	if (err) {
-		exfat_fs_error(sb,
+		exfat_fs_error_ratelimit(sb,
 			"failed to access to FAT (entry 0x%08x, err:%d)",
 			loc, err);
 		return err;
 	}

 	if (*content == EXFAT_FREE_CLUSTER) {
-		exfat_fs_error(sb,
+		exfat_fs_error_ratelimit(sb,
 			"invalid access to FAT free cluster (entry 0x%08x)",
 			loc);
 		return -EIO;
 	}

 	if (*content == EXFAT_BAD_CLUSTER) {
-		exfat_fs_error(sb,
+		exfat_fs_error_ratelimit(sb,
 			"invalid access to FAT bad cluster (entry 0x%08x)",
 			loc);
 		return -EIO;
 	}

 	if (*content != EXFAT_EOF_CLUSTER && !is_valid_cluster(sbi, *content)) {
-		exfat_fs_error(sb,
+		exfat_fs_error_ratelimit(sb,
 			"invalid access to FAT (entry 0x%08x) bogus content (0x%08x)",
 			loc, *content);
 		return -EIO;
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 6.17] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount()
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-5.10] exfat: limit log print for IO error Sasha Levin
@ 2025-10-25 16:01 ` Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2025-10-25 16:01 UTC (permalink / raw)
  To: patches, stable; +Cc: Al Viro, Christian Brauner, Sasha Levin, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

[ Upstream commit d29da1a8f119130e6fc7d5d71029d402dabe2cb0 ]

We want to mount beneath the given location.  For that operation to
make sense, location must be the root of some mount that has something
under it.  Currently we let it proceed if those requirements are not met,
with rather meaningless results, and have that bogosity caught further
down the road; let's fail early instead - do_lock_mount() doesn't make
sense unless those conditions hold, and checking them there makes
things simpler.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

- What it fixes
  - Prevents nonsensical MOVE_MOUNT_BENEATH operations from proceeding
    when the destination is not a mount root or when the topmost mount
    has no parent (i.e., namespace root). Previously this could “proceed
    with meaningless results” and fail later; now it fails early with
    -EINVAL as intended.

- Key changes
  - Early validation in do_lock_mount():
    - Adds `if (unlikely(beneath) && !path_mounted(path)) return
      -EINVAL;` so non-mount-root targets are rejected immediately
      (fs/namespace.c:do_lock_mount()).
    - Adds a parent check under `mount_lock` in the ‘beneath’ path: `if
      (unlikely(!mnt_has_parent(m))) { ... return -EINVAL; }` to reject
      attempts beneath a namespace root before proceeding
      (fs/namespace.c:do_lock_mount()).
  - De-duplication: Removes the equivalent checks from
    can_move_mount_beneath(), centralizing them where the mountpoint and
    parent are actually determined
    (fs/namespace.c:can_move_mount_beneath()).

- Context in current tree
  - The tree already performs an early `beneath && !path_mounted(path)`
    rejection in do_lock_mount (see `fs/namespace.c:2732`), so
    moving/keeping this check in do_lock_mount is aligned with the
    patch’s intent.
  - The explicit `mnt_has_parent()` guard is not currently enforced at
    lock acquisition time in do_lock_mount; adding it there (while
    holding `mount_lock`) closes a race and ensures the operation only
    proceeds when a real parent exists.
  - can_move_mount_beneath in this tree already focuses on
    propagation/relationship checks and does not contain those
    path/parent assertions (see around `fs/namespace.c:3417`), so
    consolidating sanity checks into do_lock_mount is consistent and low
    risk.

- Why it’s a good stable candidate
  - Bug fix: Enforces semantic preconditions for MOVE_MOUNT_BENEATH,
    avoiding misleading or late failures.
  - Small and contained: Changes are limited to fs/namespace.c, mostly
    simple condition checks and code movement.
  - No feature or architectural change: Just earlier, clearer
    validation; the end result remains a failure for invalid usage.
  - Concurrency-safe: Parent check is done while holding `mount_lock`,
    reducing race windows between `mount_lock` and `namespace_sem`.

- Regression risk
  - Low. Users attempting invalid MOVE_MOUNT_BENEATH operations will now
    get -EINVAL earlier rather than later. Valid usages are unaffected.

 fs/namespace.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index c8c2376bb2424..fa7c034ac4a69 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2785,12 +2785,19 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 	struct path under = {};
 	int err = -ENOENT;
 
+	if (unlikely(beneath) && !path_mounted(path))
+		return -EINVAL;
+
 	for (;;) {
 		struct mount *m = real_mount(mnt);
 
 		if (beneath) {
 			path_put(&under);
 			read_seqlock_excl(&mount_lock);
+			if (unlikely(!mnt_has_parent(m))) {
+				read_sequnlock_excl(&mount_lock);
+				return -EINVAL;
+			}
 			under.mnt = mntget(&m->mnt_parent->mnt);
 			under.dentry = dget(m->mnt_mountpoint);
 			read_sequnlock_excl(&mount_lock);
@@ -3462,8 +3469,6 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * @to:   mount under which to mount
  * @mp:   mountpoint of @to
  *
- * - Make sure that @to->dentry is actually the root of a mount under
- *   which we can mount another mount.
  * - Make sure that nothing can be mounted beneath the caller's current
  *   root or the rootfs of the namespace.
  * - Make sure that the caller can unmount the topmost mount ensuring
@@ -3485,12 +3490,6 @@ static int can_move_mount_beneath(const struct path *from,
 		     *mnt_to = real_mount(to->mnt),
 		     *parent_mnt_to = mnt_to->mnt_parent;
 
-	if (!mnt_has_parent(mnt_to))
-		return -EINVAL;
-
-	if (!path_mounted(to))
-		return -EINVAL;
-
 	if (IS_MNT_LOCKED(mnt_to))
 		return -EINVAL;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-25 16:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20251025160905.3857885-1-sashal@kernel.org>
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17-5.4] allow finish_no_open(file, ERR_PTR(-E...)) Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17-6.6] fuse: zero initialize inode private data Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] virtio_fs: fix the hash table using in virtio_fs_enqueue_req() Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-6.12] exfat: validate cluster allocation bits of the allocation bitmap Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-5.10] exfat: limit log print for IO error Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).