Linux-mtd Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [bug report] mtd: bad block counter inflated when repeatedly marking the same block
@ 2025-09-01  9:26 Wang Zhaolong
  2025-09-01 12:40 ` Zhihao Cheng
  2025-09-01 12:46 ` Miquel Raynal
  0 siblings, 2 replies; 5+ messages in thread
From: Wang Zhaolong @ 2025-09-01  9:26 UTC (permalink / raw)
  To: miquel.raynal, richard, vigneshr, linux-mtd
  Cc: yi.zhang, yangerkun, chengzhihao1

Hi all,

I’d like to report a mismatch between bad-block statistics and actual
on-flash state when repeatedly calling MEMSETBADBLOCK on the same
eraseblock.

Summary
- Repeatedly marking the same block bad (e.g., 5 times) makes
   /sys/class/mtd/mtdX/bad_blocks increase by 5.
- After reboot,  the statistical value ture to the correct value of 1.
- So the runtime counter (ecc_stats.badblocks) is inflated.

Repro (with nandsim.ko)

```bash
# ID="0xec,0xa1,0x00,0x15" # 128M 128KB 2KB
# modprobe nandsim id_bytes=$ID
# ~/mtd-utils/mtd_markbad /dev/mtd1 10 1 # Repeat 5 times
......
# ~/mtd-utils/mtd_markbad /dev/mtd1 10 1

# -- It can be observed that 5 bad blocks will appear in the statistical information.
# cat /sys/class/mtd/mtd1/bad_blocks
5

# -- In fact, we can only scan 1 bad block.
# ubiformat -v /dev/mtd1  | grep "bad eraseblock"
ubiformat: 1 bad eraseblocks found, numbers: 10
```

Root cause analysis (kernel-side)

```
mtd_block_markbad
   mtd->_block_markbad()
     nand_block_markbad
       ret = nand_block_isbad
       return 0; // ret > 0
   mtd->ecc_stats.badblocks++;  // No bad blocks was marked but was counted.
   
Relevant code
- drivers/mtd/nand/raw/nand_base.c:nand_block_markbad()
- drivers/mtd/mtdcore.c:mtd_block_markbad()
```

nand_block_markbad() returns 0 both for “newly marked” and “already bad”.
mtdcore cannot tell whether this call actually added a new bad block,
but still increments ecc_stats.badblocks.

Possible fixes (high level)
- Core-side conservative fix (minimal ABI change):
   * In mtd_block_markbad(), probe _block_isbad(master, ofs) before
     calling _block_markbad(), and (if available) probe again after success.
   * Only increment ecc_stats.badblocks if the state transitioned from
     “good” to “bad”.

- Teach *_block_markbad() to return a distinct positive code for
   “already bad” vs “newly marked”, so the core can increment only on
   “newly marked”.

What I want to know is:
- Would the core-side pre/post _block_isbad check be acceptable as a short-term fix?
- Any objections regarding the extra isbad IO in the markbad path?
- Longer-term, is there interest in an explicit API/return-code semantics
   to differentiate “already bad” vs “newly marked”?

I’m very interested in helping resolve this issue and would be grateful
for any guidance or suggestions.

Best regards,
Wang Zhaolong


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-09-01 17:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-01  9:26 [bug report] mtd: bad block counter inflated when repeatedly marking the same block Wang Zhaolong
2025-09-01 12:40 ` Zhihao Cheng
2025-09-01 14:14   ` Wang Zhaolong
2025-09-01 12:46 ` Miquel Raynal
2025-09-01 14:16   ` Wang Zhaolong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox