[PATCH AUTOSEL 6.16-6.6] exfat: add cluster chain loop check for dir

Archive-only list for patches
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.16-6.6] exfat: add cluster chain loop check for dir
@ 2025-08-08 17:41 Sasha Levin
  2025-08-08 17:41 ` [PATCH AUTOSEL 6.16-6.6] f2fs: check the generic conditions first Sasha Levin
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Sasha Levin @ 2025-08-08 17:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Yuezhang Mo, Namjae Jeon, Sasha Levin, sj1557.seo, linux-fsdevel

From: Yuezhang Mo <Yuezhang.Mo@sony.com>

[ Upstream commit 99f9a97dce39ad413c39b92c90393bbd6778f3fd ]

An infinite loop may occur if the following conditions occur due to
file system corruption.

(1) Condition for exfat_count_dir_entries() to loop infinitely.
    - The cluster chain includes a loop.
    - There is no UNUSED entry in the cluster chain.

(2) Condition for exfat_create_upcase_table() to loop infinitely.
    - The cluster chain of the root directory includes a loop.
    - There are no UNUSED entry and up-case table entry in the cluster
      chain of the root directory.

(3) Condition for exfat_load_bitmap() to loop infinitely.
    - The cluster chain of the root directory includes a loop.
    - There are no UNUSED entry and bitmap entry in the cluster chain
      of the root directory.

(4) Condition for exfat_find_dir_entry() to loop infinitely.
    - The cluster chain includes a loop.
    - The unused directory entries were exhausted by some operation.

(5) Condition for exfat_check_dir_empty() to loop infinitely.
    - The cluster chain includes a loop.
    - The unused directory entries were exhausted by some operation.
    - All files and sub-directories under the directory are deleted.

This commit adds checks to break the above infinite loop.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

**YES**

This commit should be backported to stable kernel trees for the
following critical reasons:

## Security Impact - Prevents Multiple Infinite Loop Vulnerabilities

The commit fixes **five distinct infinite loop conditions** that can
occur due to filesystem corruption, each representing a potential
denial-of-service vulnerability:

1. **exfat_count_dir_entries()** - Adds loop detection using
   `sbi->used_clusters` as a bound
2. **exfat_create_upcase_table()** - Addressed through root directory
   chain validation
3. **exfat_load_bitmap()** - Addressed through root directory chain
   validation
4. **exfat_find_dir_entry()** - Adds loop detection using
   `EXFAT_DATA_CLUSTER_COUNT(sbi)`
5. **exfat_check_dir_empty()** - Adds loop detection using
   `EXFAT_DATA_CLUSTER_COUNT(sbi)`

## Critical Bug Fix Characteristics

1. **Fixes Real Security Issues**: The infinite loops can cause system
   hangs and DoS conditions when mounting corrupted/malicious exFAT
   filesystems
2. **Small, Contained Changes**: The fix adds simple counter checks (4-5
   lines per location) without architectural changes
3. **Clear Root Cause**: Addresses missing validation of cluster chain
   loops in directory traversal
4. **Pattern of Similar Fixes**: This follows three previous infinite
   loop fixes in the same subsystem (commits b0522303f672, a5324b3a488d,
   fee873761bd9), all of which fix similar issues dating back to the
   original exfat implementation

## Code Analysis Shows Low Risk

The changes are minimal and safe:
- Adds `unsigned int clu_count = 0` declarations
- Increments counter when following cluster chains
- Breaks traversal if counter exceeds valid cluster count
- In `exfat_count_num_clusters()`: adds explicit loop detection with
  error message

## Follows Stable Kernel Rules

✓ Fixes critical bugs (infinite loops/DoS)
✓ Minimal code changes (~50 lines total)
✓ No new features or API changes
✓ Similar fixes already backported (the three previous infinite loop
fixes)
✓ Clear error conditions with proper error returns (-EIO)

The commit message explicitly states these are corruption-triggered
infinite loops, and the pattern matches previous fixes that have
"Fixes:" tags pointing to the original exfat implementation. This is a
critical reliability and security fix that prevents system hangs when
handling corrupted exFAT filesystems.

 fs/exfat/dir.c    | 12 ++++++++++++
 fs/exfat/fatent.c | 10 ++++++++++
 fs/exfat/namei.c  |  5 +++++
 fs/exfat/super.c  | 32 +++++++++++++++++++++-----------
 4 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index 3103b932b674..ee060e26f51d 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -996,6 +996,7 @@ int exfat_find_dir_entry(struct super_block *sb, struct exfat_inode_info *ei,
 	struct exfat_hint_femp candi_empty;
 	struct exfat_sb_info *sbi = EXFAT_SB(sb);
 	int num_entries = exfat_calc_num_entries(p_uniname);
+	unsigned int clu_count = 0;
 
 	if (num_entries < 0)
 		return num_entries;
@@ -1133,6 +1134,10 @@ int exfat_find_dir_entry(struct super_block *sb, struct exfat_inode_info *ei,
 		} else {
 			if (exfat_get_next_cluster(sb, &clu.dir))
 				return -EIO;
+
+			/* break if the cluster chain includes a loop */
+			if (unlikely(++clu_count > EXFAT_DATA_CLUSTER_COUNT(sbi)))
+				goto not_found;
 		}
 	}
 
@@ -1195,6 +1200,7 @@ int exfat_count_dir_entries(struct super_block *sb, struct exfat_chain *p_dir)
 	int i, count = 0;
 	int dentries_per_clu;
 	unsigned int entry_type;
+	unsigned int clu_count = 0;
 	struct exfat_chain clu;
 	struct exfat_dentry *ep;
 	struct exfat_sb_info *sbi = EXFAT_SB(sb);
@@ -1227,6 +1233,12 @@ int exfat_count_dir_entries(struct super_block *sb, struct exfat_chain *p_dir)
 		} else {
 			if (exfat_get_next_cluster(sb, &(clu.dir)))
 				return -EIO;
+
+			if (unlikely(++clu_count > sbi->used_clusters)) {
+				exfat_fs_error(sb, "FAT or bitmap is corrupted");
+				return -EIO;
+			}
+
 		}
 	}
 
diff --git a/fs/exfat/fatent.c b/fs/exfat/fatent.c
index 23065f948ae7..232cc7f8ab92 100644
--- a/fs/exfat/fatent.c
+++ b/fs/exfat/fatent.c
@@ -490,5 +490,15 @@ int exfat_count_num_clusters(struct super_block *sb,
 	}
 
 	*ret_count = count;
+
+	/*
+	 * since exfat_count_used_clusters() is not called, sbi->used_clusters
+	 * cannot be used here.
+	 */
+	if (unlikely(i == sbi->num_clusters && clu != EXFAT_EOF_CLUSTER)) {
+		exfat_fs_error(sb, "The cluster chain has a loop");
+		return -EIO;
+	}
+
 	return 0;
 }
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index fede0283d6e2..f5f1c4e8a29f 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -890,6 +890,7 @@ static int exfat_check_dir_empty(struct super_block *sb,
 {
 	int i, dentries_per_clu;
 	unsigned int type;
+	unsigned int clu_count = 0;
 	struct exfat_chain clu;
 	struct exfat_dentry *ep;
 	struct exfat_sb_info *sbi = EXFAT_SB(sb);
@@ -926,6 +927,10 @@ static int exfat_check_dir_empty(struct super_block *sb,
 		} else {
 			if (exfat_get_next_cluster(sb, &(clu.dir)))
 				return -EIO;
+
+			/* break if the cluster chain includes a loop */
+			if (unlikely(++clu_count > EXFAT_DATA_CLUSTER_COUNT(sbi)))
+				break;
 		}
 	}
 
diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index 7ed858937d45..3a9ec75ab452 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -341,13 +341,12 @@ static void exfat_hash_init(struct super_block *sb)
 		INIT_HLIST_HEAD(&sbi->inode_hashtable[i]);
 }
 
-static int exfat_read_root(struct inode *inode)
+static int exfat_read_root(struct inode *inode, struct exfat_chain *root_clu)
 {
 	struct super_block *sb = inode->i_sb;
 	struct exfat_sb_info *sbi = EXFAT_SB(sb);
 	struct exfat_inode_info *ei = EXFAT_I(inode);
-	struct exfat_chain cdir;
-	int num_subdirs, num_clu = 0;
+	int num_subdirs;
 
 	exfat_chain_set(&ei->dir, sbi->root_dir, 0, ALLOC_FAT_CHAIN);
 	ei->entry = -1;
@@ -360,12 +359,9 @@ static int exfat_read_root(struct inode *inode)
 	ei->hint_stat.clu = sbi->root_dir;
 	ei->hint_femp.eidx = EXFAT_HINT_NONE;
 
-	exfat_chain_set(&cdir, sbi->root_dir, 0, ALLOC_FAT_CHAIN);
-	if (exfat_count_num_clusters(sb, &cdir, &num_clu))
-		return -EIO;
-	i_size_write(inode, num_clu << sbi->cluster_size_bits);
+	i_size_write(inode, EXFAT_CLU_TO_B(root_clu->size, sbi));
 
-	num_subdirs = exfat_count_dir_entries(sb, &cdir);
+	num_subdirs = exfat_count_dir_entries(sb, root_clu);
 	if (num_subdirs < 0)
 		return -EIO;
 	set_nlink(inode, num_subdirs + EXFAT_MIN_SUBDIR);
@@ -578,7 +574,8 @@ static int exfat_verify_boot_region(struct super_block *sb)
 }
 
 /* mount the file system volume */
-static int __exfat_fill_super(struct super_block *sb)
+static int __exfat_fill_super(struct super_block *sb,
+		struct exfat_chain *root_clu)
 {
 	int ret;
 	struct exfat_sb_info *sbi = EXFAT_SB(sb);
@@ -595,6 +592,18 @@ static int __exfat_fill_super(struct super_block *sb)
 		goto free_bh;
 	}
 
+	/*
+	 * Call exfat_count_num_cluster() before searching for up-case and
+	 * bitmap directory entries to avoid infinite loop if they are missing
+	 * and the cluster chain includes a loop.
+	 */
+	exfat_chain_set(root_clu, sbi->root_dir, 0, ALLOC_FAT_CHAIN);
+	ret = exfat_count_num_clusters(sb, root_clu, &root_clu->size);
+	if (ret) {
+		exfat_err(sb, "failed to count the number of clusters in root");
+		goto free_bh;
+	}
+
 	ret = exfat_create_upcase_table(sb);
 	if (ret) {
 		exfat_err(sb, "failed to load upcase table");
@@ -627,6 +636,7 @@ static int exfat_fill_super(struct super_block *sb, struct fs_context *fc)
 	struct exfat_sb_info *sbi = sb->s_fs_info;
 	struct exfat_mount_options *opts = &sbi->options;
 	struct inode *root_inode;
+	struct exfat_chain root_clu;
 	int err;
 
 	if (opts->allow_utime == (unsigned short)-1)
@@ -645,7 +655,7 @@ static int exfat_fill_super(struct super_block *sb, struct fs_context *fc)
 	sb->s_time_min = EXFAT_MIN_TIMESTAMP_SECS;
 	sb->s_time_max = EXFAT_MAX_TIMESTAMP_SECS;
 
-	err = __exfat_fill_super(sb);
+	err = __exfat_fill_super(sb, &root_clu);
 	if (err) {
 		exfat_err(sb, "failed to recognize exfat type");
 		goto check_nls_io;
@@ -680,7 +690,7 @@ static int exfat_fill_super(struct super_block *sb, struct fs_context *fc)
 
 	root_inode->i_ino = EXFAT_ROOT_INO;
 	inode_set_iversion(root_inode, 1);
-	err = exfat_read_root(root_inode);
+	err = exfat_read_root(root_inode, &root_clu);
 	if (err) {
 		exfat_err(sb, "failed to initialize root inode");
 		goto put_inode;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH AUTOSEL 6.16-6.6] f2fs: check the generic conditions first
  2025-08-08 17:41 [PATCH AUTOSEL 6.16-6.6] exfat: add cluster chain loop check for dir Sasha Levin
@ 2025-08-08 17:41 ` Sasha Levin
  2025-08-08 17:41 ` [PATCH AUTOSEL 6.16-5.10] i2c: Force DLL0945 touchpad i2c freq to 100khz Sasha Levin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2025-08-08 17:41 UTC (permalink / raw)
  To: patches, stable; +Cc: Jaegeuk Kim, Chao Yu, Sasha Levin, linux-f2fs-devel

From: Jaegeuk Kim <jaegeuk@kernel.org>

[ Upstream commit e23ab8028de0d92df5921a570f5212c0370db3b5 ]

Let's return errors caught by the generic checks. This fixes generic/494 where
it expects to see EBUSY by setattr_prepare instead of EINVAL by f2fs for active
swapfile.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit should be backported to stable kernel trees for the
following reasons:

## Bug Fix Analysis

1. **Fixes a real bug affecting users**: The commit fixes incorrect
   error code handling for swapfiles. When attempting to modify an
   active swapfile, f2fs was returning `-EINVAL` instead of the correct
   `-EBUSY` error code. This breaks userspace expectations and causes
   test failures in `generic/494`.

2. **Small and contained fix**: The change is minimal - it simply
   reorders the error checking sequence in `f2fs_setattr()` to call
   generic checks (`setattr_prepare`, `fscrypt_prepare_setattr`,
   `fsverity_prepare_setattr`) before f2fs-specific checks. The code
   movement involves only 12 lines being relocated within the same
   function.

## Technical Details

The commit moves three generic preparation calls from lines 1055-1065
(after f2fs-specific checks) to lines 1055-1065 (before f2fs-specific
checks). This ensures that:

- `setattr_prepare()` gets called first, which contains the
  `IS_SWAPFILE()` check that returns `-ETXTBSY` (which gets translated
  to `-EBUSY`)
- The generic VFS layer error codes are returned consistently with other
  filesystems
- F2fs-specific validation (like compression, pinned file checks) only
  happens after generic validation passes

## Risk Assessment

1. **Minimal regression risk**: The change only reorders existing checks
   without adding new logic or modifying the checks themselves. All the
   same validation still occurs, just in a different order.

2. **Follows stable tree rules**: This is a clear bugfix that:
   - Fixes incorrect error reporting to userspace
   - Makes f2fs behavior consistent with VFS expectations
   - Fixes a specific test case (`generic/494`) that validates correct
     swapfile handling
   - Has no feature additions or architectural changes

3. **Limited scope**: The change is confined to a single function in the
   f2fs subsystem and doesn't affect any other kernel components.

4. **Already reviewed**: The commit has been reviewed by a subsystem
   maintainer (Chao Yu) and merged by the f2fs maintainer (Jaegeuk Kim).

The incorrect error code could potentially confuse userspace
applications that rely on specific error codes to determine why an
operation failed. Returning `-EINVAL` (invalid argument) instead of
`-EBUSY` (resource busy) for an active swapfile is semantically
incorrect and breaks POSIX compliance expectations.

 fs/f2fs/file.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 696131e655ed..bb3fd6a8416f 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1047,6 +1047,18 @@ int f2fs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 	if (unlikely(f2fs_cp_error(F2FS_I_SB(inode))))
 		return -EIO;

+	err = setattr_prepare(idmap, dentry, attr);
+	if (err)
+		return err;
+
+	err = fscrypt_prepare_setattr(dentry, attr);
+	if (err)
+		return err;
+
+	err = fsverity_prepare_setattr(dentry, attr);
+	if (err)
+		return err;
+
 	if (unlikely(IS_IMMUTABLE(inode)))
 		return -EPERM;

@@ -1065,18 +1077,6 @@ int f2fs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 			return -EINVAL;
 	}

-	err = setattr_prepare(idmap, dentry, attr);
-	if (err)
-		return err;
-
-	err = fscrypt_prepare_setattr(dentry, attr);
-	if (err)
-		return err;
-
-	err = fsverity_prepare_setattr(dentry, attr);
-	if (err)
-		return err;
-
 	if (is_quota_modification(idmap, inode, attr)) {
 		err = f2fs_dquot_initialize(inode);
 		if (err)
-- 
2.39.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH AUTOSEL 6.16-5.10] i2c: Force DLL0945 touchpad i2c freq to 100khz
  2025-08-08 17:41 [PATCH AUTOSEL 6.16-6.6] exfat: add cluster chain loop check for dir Sasha Levin
  2025-08-08 17:41 ` [PATCH AUTOSEL 6.16-6.6] f2fs: check the generic conditions first Sasha Levin
@ 2025-08-08 17:41 ` Sasha Levin
  2025-08-08 17:41 ` [PATCH AUTOSEL 6.16-6.12] printk: nbcon: Allow reacquire during panic Sasha Levin
  2025-08-08 17:41 ` [PATCH AUTOSEL 6.16] f2fs: handle nat.blkaddr corruption in f2fs_get_node_info() Sasha Levin
  3 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2025-08-08 17:41 UTC (permalink / raw)
  To: patches, stable
  Cc: fangzhong.zhou, Wolfram Sang, Sasha Levin, westeri, linux-i2c,
	linux-acpi

From: "fangzhong.zhou" <myth5@myth5.com>

[ Upstream commit 0b7c9528facdb5a73ad78fea86d2e95a6c48dbc4 ]

This patch fixes an issue where the touchpad cursor movement becomes
slow on the Dell Precision 5560. Force the touchpad freq to 100khz
as a workaround.

Tested on Dell Precision 5560 with 6.14 to 6.14.6. Cursor movement
is now smooth and responsive.

Signed-off-by: fangzhong.zhou <myth5@myth5.com>
[wsa: kept sorting and removed unnecessary parts from commit msg]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit should be backported to stable kernel trees for the
following reasons:

## Analysis Supporting Backport:

### 1. **Fixes a Real User-Visible Bug**
The commit fixes a specific hardware malfunction where the touchpad
cursor movement becomes slow on Dell Precision 5560 laptops. This is a
concrete usability issue that affects end users' ability to use their
hardware properly.

### 2. **Minimal and Contained Change**
The change is extremely minimal - it only adds a single line `{
"DLL0945", 0 },` to an existing array
`i2c_acpi_force_100khz_device_ids[]`. This is a purely additive change
that:
- Doesn't modify any existing logic
- Only affects devices with the specific ACPI ID "DLL0945"
- Cannot cause regressions on other hardware

### 3. **Follows Established Pattern**
The code shows this is a well-established pattern for hardware quirks:
- The `i2c_acpi_force_100khz_device_ids[]` array already exists with the
  same purpose
- A nearly identical fix was just added for ELAN06FA touchpad (commit
  bfd74cd1fbc0)
- Similar quirk arrays exist for forcing 400KHz
  (i2c_acpi_force_400khz_device_ids)
- Previous similar fixes have been marked for stable (e.g., commit
  7574c0db2e68 for Silead touchscreens included `Cc: stable@kernel.org`)

### 4. **Hardware-Specific Workaround**
This is a hardware-specific workaround that:
- Only triggers for Dell devices with the DLL0945 touchpad
- Forces I2C bus speed to 100KHz to work around a hardware/firmware
  issue
- Has been tested on the affected hardware (Dell Precision 5560 with
  kernels 6.14 to 6.14.6)

### 5. **No Architecture Changes**
The commit:
- Uses existing infrastructure (the quirk array mechanism)
- Doesn't introduce new features
- Doesn't change any APIs or interfaces
- Simply adds one more device ID to an existing workaround list

### 6. **Low Risk of Regression**
The change has minimal regression risk because:
- It only affects devices with the specific ACPI ID
- The mechanism is already proven with other devices
- The fix is isolated to I2C bus speed negotiation for one specific
  touchpad model
- If the device ID doesn't match, the code path is never executed

### 7. **Consistent with Stable Kernel Rules**
This fix aligns perfectly with stable kernel criteria:
- Fixes a real bug that bothers users (slow touchpad cursor)
- Is obviously correct and tested
- Is small (1 line addition)
- Doesn't add new features
- Fixes only one specific issue

The commit follows the exact same pattern as previous touchpad I2C
frequency quirks that have been successfully backported to stable
kernels, making it a clear candidate for stable tree inclusion.

 drivers/i2c/i2c-core-acpi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-acpi.c
index d2499f302b50..f43067f6797e 100644
--- a/drivers/i2c/i2c-core-acpi.c
+++ b/drivers/i2c/i2c-core-acpi.c
@@ -370,6 +370,7 @@ static const struct acpi_device_id i2c_acpi_force_100khz_device_ids[] = {
 	 * the device works without issues on Windows at what is expected to be
 	 * a 400KHz frequency. The root cause of the issue is not known.
 	 */
+	{ "DLL0945", 0 },
 	{ "ELAN06FA", 0 },
 	{}
 };
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH AUTOSEL 6.16-6.12] printk: nbcon: Allow reacquire during panic
  2025-08-08 17:41 [PATCH AUTOSEL 6.16-6.6] exfat: add cluster chain loop check for dir Sasha Levin
  2025-08-08 17:41 ` [PATCH AUTOSEL 6.16-6.6] f2fs: check the generic conditions first Sasha Levin
  2025-08-08 17:41 ` [PATCH AUTOSEL 6.16-5.10] i2c: Force DLL0945 touchpad i2c freq to 100khz Sasha Levin
@ 2025-08-08 17:41 ` Sasha Levin
  2025-08-08 17:41 ` [PATCH AUTOSEL 6.16] f2fs: handle nat.blkaddr corruption in f2fs_get_node_info() Sasha Levin
  3 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2025-08-08 17:41 UTC (permalink / raw)
  To: patches, stable; +Cc: John Ogness, Michael Kelley, Petr Mladek, Sasha Levin

From: John Ogness <john.ogness@linutronix.de>

[ Upstream commit 571c1ea91a73db56bd94054fabecd0f070dc90db ]

If a console printer is interrupted during panic, it will never
be able to reacquire ownership in order to perform and cleanup.
That in itself is not a problem, since the non-panic CPU will
simply quiesce in an endless loop within nbcon_reacquire_nobuf().

However, in this state, platforms that do not support a true NMI
to interrupt the quiesced CPU will not be able to shutdown that
CPU from within panic(). This then causes problems for such as
being unable to load and run a kdump kernel.

Fix this by allowing non-panic CPUs to reacquire ownership using
a direct acquire. Then the non-panic CPUs can successfullyl exit
the nbcon_reacquire_nobuf() loop and the console driver can
perform any necessary cleanup. But more importantly, the CPU is
no longer quiesced and is free to process any interrupts
necessary for panic() to shutdown the CPU.

All other forms of acquire are still not allowed for non-panic
CPUs since it is safer to have them avoid gaining console
ownership that is not strictly necessary.

Reported-by: Michael Kelley <mhklinux@outlook.com>
Closes: https://lore.kernel.org/r/SN6PR02MB4157A4C5E8CB219A75263A17D46DA@SN6PR02MB4157.namprd02.prod.outlook.com
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250606185549.900611-1-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit should be backported to stable kernel trees for the
following reasons:

## Bug Fix for Critical System Functionality

1. **Fixes a real bug affecting kdump functionality**: The commit
   addresses a specific problem where platforms without true NMI support
   cannot properly shutdown CPUs during panic, preventing kdump kernels
   from loading. This is a critical debugging and recovery feature that
   many production systems rely on.

2. **Clear regression/breakage scenario**: The commit message clearly
   describes how the current behavior causes non-panic CPUs to get stuck
   in an endless loop in `nbcon_reacquire_nobuf()`, preventing proper
   CPU shutdown during panic. This is a functional regression that
   affects system reliability.

## Safe and Contained Fix

3. **Minimal and targeted change**: The fix is confined to the nbcon
   (new console) subsystem, specifically modifying only the acquire
   logic to allow reacquire during panic. The diff shows only 41
   insertions and 22 deletions, mostly adding a `is_reacquire` parameter
   to existing functions.

4. **No architectural changes**: The commit doesn't introduce new
   features or change the fundamental design. It merely adjusts the
   existing acquire logic to handle a specific edge case during panic.

5. **Conservative approach**: The fix maintains safety by:
   - Only allowing direct reacquire for non-panic CPUs (not all acquire
     types)
   - Preserving the check for `unsafe_takeover` state
   - Keeping all other panic-time restrictions in place

## Well-Tested and Reviewed

6. **Proper testing and review**: The commit has been:
   - Reported by Michael Kelley with a specific reproducer
   - Reviewed by Petr Mladek (printk maintainer)
   - Tested by the original reporter
   - Already included upstream (commit
     571c1ea91a73db56bd94054fabecd0f070dc90db)

## Code Analysis

The key changes in `nbcon_context_try_acquire_direct()`:
```c
-static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
- struct nbcon_state *cur)
+static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
+                                           struct nbcon_state *cur,
bool is_reacquire)
```

And the critical logic change:
```c
-if (other_cpu_in_panic())
+if (other_cpu_in_panic() &&
+    (!is_reacquire || cur->unsafe_takeover)) {
     return -EPERM;
+}
```

This allows reacquire during panic only when it's a genuine reacquire
attempt and no unsafe takeover has occurred, which is a safe and
necessary exception to handle the described bug.

The commit follows stable kernel rules by fixing an important bug with
minimal risk and without introducing new features.

 kernel/printk/nbcon.c | 63 ++++++++++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 22 deletions(-)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index fd12efcc4aed..e7a3af81b173 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -214,8 +214,9 @@ static void nbcon_seq_try_update(struct nbcon_context *ctxt, u64 new_seq)
 
 /**
  * nbcon_context_try_acquire_direct - Try to acquire directly
- * @ctxt:	The context of the caller
- * @cur:	The current console state
+ * @ctxt:		The context of the caller
+ * @cur:		The current console state
+ * @is_reacquire:	This acquire is a reacquire
  *
  * Acquire the console when it is released. Also acquire the console when
  * the current owner has a lower priority and the console is in a safe state.
@@ -225,17 +226,17 @@ static void nbcon_seq_try_update(struct nbcon_context *ctxt, u64 new_seq)
  *
  * Errors:
  *
- *	-EPERM:		A panic is in progress and this is not the panic CPU.
- *			Or the current owner or waiter has the same or higher
- *			priority. No acquire method can be successful in
- *			this case.
+ *	-EPERM:		A panic is in progress and this is neither the panic
+ *			CPU nor is this a reacquire. Or the current owner or
+ *			waiter has the same or higher priority. No acquire
+ *			method can be successful in these cases.
  *
  *	-EBUSY:		The current owner has a lower priority but the console
  *			in an unsafe state. The caller should try using
  *			the handover acquire method.
  */
 static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
-					    struct nbcon_state *cur)
+					    struct nbcon_state *cur, bool is_reacquire)
 {
 	unsigned int cpu = smp_processor_id();
 	struct console *con = ctxt->console;
@@ -243,14 +244,20 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
 
 	do {
 		/*
-		 * Panic does not imply that the console is owned. However, it
-		 * is critical that non-panic CPUs during panic are unable to
-		 * acquire ownership in order to satisfy the assumptions of
-		 * nbcon_waiter_matches(). In particular, the assumption that
-		 * lower priorities are ignored during panic.
+		 * Panic does not imply that the console is owned. However,
+		 * since all non-panic CPUs are stopped during panic(), it
+		 * is safer to have them avoid gaining console ownership.
+		 *
+		 * If this acquire is a reacquire (and an unsafe takeover
+		 * has not previously occurred) then it is allowed to attempt
+		 * a direct acquire in panic. This gives console drivers an
+		 * opportunity to perform any necessary cleanup if they were
+		 * interrupted by the panic CPU while printing.
 		 */
-		if (other_cpu_in_panic())
+		if (other_cpu_in_panic() &&
+		    (!is_reacquire || cur->unsafe_takeover)) {
 			return -EPERM;
+		}
 
 		if (ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio)
 			return -EPERM;
@@ -301,8 +308,9 @@ static bool nbcon_waiter_matches(struct nbcon_state *cur, int expected_prio)
 	 * Event #1 implies this context is EMERGENCY.
 	 * Event #2 implies the new context is PANIC.
 	 * Event #3 occurs when panic() has flushed the console.
-	 * Events #4 and #5 are not possible due to the other_cpu_in_panic()
-	 * check in nbcon_context_try_acquire_direct().
+	 * Event #4 occurs when a non-panic CPU reacquires.
+	 * Event #5 is not possible due to the other_cpu_in_panic() check
+	 *          in nbcon_context_try_acquire_handover().
 	 */
 
 	return (cur->req_prio == expected_prio);
@@ -431,6 +439,16 @@ static int nbcon_context_try_acquire_handover(struct nbcon_context *ctxt,
 	WARN_ON_ONCE(ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio);
 	WARN_ON_ONCE(!cur->unsafe);
 
+	/*
+	 * Panic does not imply that the console is owned. However, it
+	 * is critical that non-panic CPUs during panic are unable to
+	 * wait for a handover in order to satisfy the assumptions of
+	 * nbcon_waiter_matches(). In particular, the assumption that
+	 * lower priorities are ignored during panic.
+	 */
+	if (other_cpu_in_panic())
+		return -EPERM;
+
 	/* Handover is not possible on the same CPU. */
 	if (cur->cpu == cpu)
 		return -EBUSY;
@@ -558,7 +576,8 @@ static struct printk_buffers panic_nbcon_pbufs;
 
 /**
  * nbcon_context_try_acquire - Try to acquire nbcon console
- * @ctxt:	The context of the caller
+ * @ctxt:		The context of the caller
+ * @is_reacquire:	This acquire is a reacquire
  *
  * Context:	Under @ctxt->con->device_lock() or local_irq_save().
  * Return:	True if the console was acquired. False otherwise.
@@ -568,7 +587,7 @@ static struct printk_buffers panic_nbcon_pbufs;
  * in an unsafe state. Otherwise, on success the caller may assume
  * the console is not in an unsafe state.
  */
-static bool nbcon_context_try_acquire(struct nbcon_context *ctxt)
+static bool nbcon_context_try_acquire(struct nbcon_context *ctxt, bool is_reacquire)
 {
 	unsigned int cpu = smp_processor_id();
 	struct console *con = ctxt->console;
@@ -577,7 +596,7 @@ static bool nbcon_context_try_acquire(struct nbcon_context *ctxt)
 
 	nbcon_state_read(con, &cur);
 try_again:
-	err = nbcon_context_try_acquire_direct(ctxt, &cur);
+	err = nbcon_context_try_acquire_direct(ctxt, &cur, is_reacquire);
 	if (err != -EBUSY)
 		goto out;
 
@@ -913,7 +932,7 @@ void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt)
 {
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
 
-	while (!nbcon_context_try_acquire(ctxt))
+	while (!nbcon_context_try_acquire(ctxt, true))
 		cpu_relax();
 
 	nbcon_write_context_set_buf(wctxt, NULL, 0);
@@ -1101,7 +1120,7 @@ static bool nbcon_emit_one(struct nbcon_write_context *wctxt, bool use_atomic)
 		cant_migrate();
 	}
 
-	if (!nbcon_context_try_acquire(ctxt))
+	if (!nbcon_context_try_acquire(ctxt, false))
 		goto out;
 
 	/*
@@ -1486,7 +1505,7 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 	ctxt->prio			= nbcon_get_default_prio();
 	ctxt->allow_unsafe_takeover	= allow_unsafe_takeover;
 
-	if (!nbcon_context_try_acquire(ctxt))
+	if (!nbcon_context_try_acquire(ctxt, false))
 		return -EPERM;
 
 	while (nbcon_seq_read(con) < stop_seq) {
@@ -1762,7 +1781,7 @@ bool nbcon_device_try_acquire(struct console *con)
 	ctxt->console	= con;
 	ctxt->prio	= NBCON_PRIO_NORMAL;
 
-	if (!nbcon_context_try_acquire(ctxt))
+	if (!nbcon_context_try_acquire(ctxt, false))
 		return false;
 
 	if (!nbcon_context_enter_unsafe(ctxt))
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH AUTOSEL 6.16] f2fs: handle nat.blkaddr corruption in f2fs_get_node_info()
  2025-08-08 17:41 [PATCH AUTOSEL 6.16-6.6] exfat: add cluster chain loop check for dir Sasha Levin
                   ` (2 preceding siblings ...)
  2025-08-08 17:41 ` [PATCH AUTOSEL 6.16-6.12] printk: nbcon: Allow reacquire during panic Sasha Levin
@ 2025-08-08 17:41 ` Sasha Levin
  3 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2025-08-08 17:41 UTC (permalink / raw)
  To: patches, stable; +Cc: Chao Yu, Jaegeuk Kim, Sasha Levin, linux-f2fs-devel

From: Chao Yu <chao@kernel.org>

[ Upstream commit 1773f63d108b1b9b9d053d8c95f8300c556f93b8 ]

F2FS-fs (dm-55): access invalid blkaddr:972878540
Call trace:
 dump_backtrace+0xec/0x128
 show_stack+0x18/0x28
 dump_stack_lvl+0x40/0x88
 dump_stack+0x18/0x24
 __f2fs_is_valid_blkaddr+0x360/0x3b4
 f2fs_is_valid_blkaddr+0x10/0x20
 f2fs_get_node_info+0x21c/0x60c
 __write_node_page+0x15c/0x734
 f2fs_sync_node_pages+0x4f8/0x700
 f2fs_write_checkpoint+0x4a8/0x99c
 __checkpoint_and_complete_reqs+0x7c/0x20c
 issue_checkpoint_thread+0x4c/0xd8
 kthread+0x11c/0x1b0
 ret_from_fork+0x10/0x20

If nat.blkaddr is corrupted, during checkpoint, f2fs_sync_node_pages()
will loop to flush node page w/ corrupted nat.blkaddr.

Although, it tags SBI_NEED_FSCK, checkpoint can not persist it due
to deadloop.

Let's call f2fs_handle_error(, ERROR_INCONSISTENT_NAT) to record such
error into superblock, it expects fsck can detect the error and repair
inconsistent nat.blkaddr after device reboot.

Note that, let's add sanity check in f2fs_get_node_info() to detect
in-memory nat.blkaddr inconsistency, but only if CONFIG_F2FS_CHECK_FS
is enabled.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Based on my analysis of the commit, here is my assessment:

**Backport Status: YES**

This commit should be backported to stable kernel trees for the
following reasons:

1. **Critical Bug Fix**: This fixes a **deadloop/infinite loop
   condition** that can occur during checkpoint operations when NAT
   (Node Address Table) entries have corrupted block addresses. The
   commit message clearly shows a real-world stack trace where
   `f2fs_sync_node_pages()` enters an infinite loop trying to flush
   nodes with corrupted nat.blkaddr during checkpoint.

2. **Data Integrity and System Availability**: The deadloop prevents the
   checkpoint from completing, which means:
   - The filesystem cannot persist the `SBI_NEED_FSCK` flag that would
     trigger repair on next mount
   - The system may hang or become unresponsive during checkpoint
     operations
   - Filesystem metadata corruption cannot be properly handled

3. **Small and Contained Fix**: The patch is relatively small (21
   insertions, 8 deletions) and focused on a single function
   `f2fs_get_node_info()`. The changes are:
   - Move the sanity check earlier in the code path
   - Add proper error handling with `f2fs_handle_error()` and
     `ERROR_INCONSISTENT_NAT`
   - Return `-EFSCORRUPTED` instead of `-EFAULT` for better error
     semantics
   - Add conditional caching logic to avoid caching corrupted entries

4. **Low Risk of Regression**:
   - The main sanity check logic already existed but was incomplete
   - The new error handling path only triggers when corruption is
     detected
   - The CONFIG_F2FS_CHECK_FS conditional check adds extra validation
     only when debugging is enabled
   - The change doesn't alter normal operation flow for valid NAT
     entries

5. **Follows Stable Rules**: This fix meets stable kernel criteria:
   - Fixes a real bug that users have hit (stack trace provided)
   - The fix is minimal and targeted
   - No new features are introduced
   - The risk of regression is low

6. **Corruption Handling**: The commit properly handles filesystem
   corruption by:
   - Setting the `SBI_NEED_FSCK` flag
   - Recording the error in the superblock via `f2fs_handle_error()`
   - Providing detailed error logging for debugging
   - Returning appropriate error codes to prevent further damage

The deadloop condition this fixes is particularly severe as it can make
the system unresponsive and prevent proper error recovery, making this
an important candidate for stable backporting.

 fs/f2fs/node.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index bfe104db284e..2fd287f2bca4 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -555,8 +555,8 @@ int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid,
 	struct f2fs_nat_entry ne;
 	struct nat_entry *e;
 	pgoff_t index;
-	block_t blkaddr;
 	int i;
+	bool need_cache = true;
 
 	ni->flag = 0;
 	ni->nid = nid;
@@ -569,6 +569,10 @@ int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid,
 		ni->blk_addr = nat_get_blkaddr(e);
 		ni->version = nat_get_version(e);
 		f2fs_up_read(&nm_i->nat_tree_lock);
+		if (IS_ENABLED(CONFIG_F2FS_CHECK_FS)) {
+			need_cache = false;
+			goto sanity_check;
+		}
 		return 0;
 	}
 
@@ -594,7 +598,7 @@ int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid,
 	up_read(&curseg->journal_rwsem);
 	if (i >= 0) {
 		f2fs_up_read(&nm_i->nat_tree_lock);
-		goto cache;
+		goto sanity_check;
 	}
 
 	/* Fill node_info from nat page */
@@ -609,14 +613,23 @@ int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid,
 	ne = nat_blk->entries[nid - start_nid];
 	node_info_from_raw_nat(ni, &ne);
 	f2fs_folio_put(folio, true);
-cache:
-	blkaddr = le32_to_cpu(ne.block_addr);
-	if (__is_valid_data_blkaddr(blkaddr) &&
-		!f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC_ENHANCE))
-		return -EFAULT;
+sanity_check:
+	if (__is_valid_data_blkaddr(ni->blk_addr) &&
+		!f2fs_is_valid_blkaddr(sbi, ni->blk_addr,
+					DATA_GENERIC_ENHANCE)) {
+		set_sbi_flag(sbi, SBI_NEED_FSCK);
+		f2fs_err_ratelimited(sbi,
+			"f2fs_get_node_info of %pS: inconsistent nat entry, "
+			"ino:%u, nid:%u, blkaddr:%u, ver:%u, flag:%u",
+			__builtin_return_address(0),
+			ni->ino, ni->nid, ni->blk_addr, ni->version, ni->flag);
+		f2fs_handle_error(sbi, ERROR_INCONSISTENT_NAT);
+		return -EFSCORRUPTED;
+	}
 
 	/* cache nat entry */
-	cache_nat_entry(sbi, nid, &ne);
+	if (need_cache)
+		cache_nat_entry(sbi, nid, &ne);
 	return 0;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-08-08 17:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-08 17:41 [PATCH AUTOSEL 6.16-6.6] exfat: add cluster chain loop check for dir Sasha Levin
2025-08-08 17:41 ` [PATCH AUTOSEL 6.16-6.6] f2fs: check the generic conditions first Sasha Levin
2025-08-08 17:41 ` [PATCH AUTOSEL 6.16-5.10] i2c: Force DLL0945 touchpad i2c freq to 100khz Sasha Levin
2025-08-08 17:41 ` [PATCH AUTOSEL 6.16-6.12] printk: nbcon: Allow reacquire during panic Sasha Levin
2025-08-08 17:41 ` [PATCH AUTOSEL 6.16] f2fs: handle nat.blkaddr corruption in f2fs_get_node_info() Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox