Re: [bug report] mtd: bad block counter inflated when repeatedly marking the same block

From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Cc: richard@nod.at,  vigneshr@ti.com,  linux-mtd@lists.infradead.org,
	yi.zhang@huawei.com,  yangerkun@huawei.com,
	 chengzhihao1@huawei.com
Subject: Re: [bug report] mtd: bad block counter inflated when repeatedly marking the same block
Date: Mon, 01 Sep 2025 14:46:34 +0200	[thread overview]
Message-ID: <87y0qynxw5.fsf@bootlin.com> (raw)
In-Reply-To: <ef573188-9815-4a6b-bad1-3d8ff7c9b16f@huaweicloud.com> (Wang Zhaolong's message of "Mon, 1 Sep 2025 17:26:45 +0800")

Hello,

On 01/09/2025 at 17:26:45 +08, Wang Zhaolong <wangzhaolong@huaweicloud.com> wrote:

> Hi all,
>
> I’d like to report a mismatch between bad-block statistics and actual
> on-flash state when repeatedly calling MEMSETBADBLOCK on the same
> eraseblock.
>
> Summary
> - Repeatedly marking the same block bad (e.g., 5 times) makes
>   /sys/class/mtd/mtdX/bad_blocks increase by 5.
> - After reboot,  the statistical value ture to the correct value of 1.
> - So the runtime counter (ecc_stats.badblocks) is inflated.
>
> Repro (with nandsim.ko)
>
> ```bash
> # ID="0xec,0xa1,0x00,0x15" # 128M 128KB 2KB
> # modprobe nandsim id_bytes=$ID
> # ~/mtd-utils/mtd_markbad /dev/mtd1 10 1 # Repeat 5 times
> ......
> # ~/mtd-utils/mtd_markbad /dev/mtd1 10 1
>
> # -- It can be observed that 5 bad blocks will appear in the statistical information.
> # cat /sys/class/mtd/mtd1/bad_blocks
> 5
>
> # -- In fact, we can only scan 1 bad block.
> # ubiformat -v /dev/mtd1  | grep "bad eraseblock"
> ubiformat: 1 bad eraseblocks found, numbers: 10
> ```
>
> Root cause analysis (kernel-side)
>
> ```
> mtd_block_markbad
>   mtd->_block_markbad()
>     nand_block_markbad
>       ret = nand_block_isbad
>       return 0; // ret > 0
>   mtd->ecc_stats.badblocks++;  // No bad blocks was marked but was counted.
>   Relevant code
> - drivers/mtd/nand/raw/nand_base.c:nand_block_markbad()
> - drivers/mtd/mtdcore.c:mtd_block_markbad()
> ```
>
> nand_block_markbad() returns 0 both for “newly marked” and “already bad”.
> mtdcore cannot tell whether this call actually added a new bad block,
> but still increments ecc_stats.badblocks.
>
> Possible fixes (high level)
> - Core-side conservative fix (minimal ABI change):
>   * In mtd_block_markbad(), probe _block_isbad(master, ofs) before
>     calling _block_markbad(), and (if available) probe again after
> success.

Sounds reasonable to me.

>   * Only increment ecc_stats.badblocks if the state transitioned from
>     “good” to “bad”.
>
> - Teach *_block_markbad() to return a distinct positive code for
>   “already bad” vs “newly marked”, so the core can increment only on
>   “newly marked”.

The subsystems have no other way to tell rather than probing the state
of the marker. I believe it is best to do it in mtd_block_markbad() in
the common mtdcore.c rather than in each implementation.

> What I want to know is:
> - Would the core-side pre/post _block_isbad check be acceptable as a short-term fix?
> - Any objections regarding the extra isbad IO in the markbad path?

Fine by me, it's not a hot path. If we end up here, we are already doing
damage control.;

> - Longer-term, is there interest in an explicit API/return-code semantics
>   to differentiate “already bad” vs “newly marked”?

If you mean userspace API, I'd say not necessarily. Querying the stats
is probably the way to go. However in the kernel, while I would in
theory not be opposed to it, I don't see how one could implement that in
a more efficient way than probing the marker as discussed above in a
complex-free manner.

> I’m very interested in helping resolve this issue and would be grateful
> for any guidance or suggestions.

MTD folks in Cc, your feedback is also welcome ;-)

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/