[bug report] mtd: bad block counter inflated when repeatedly marking the same block

Linux-mtd Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [bug report] mtd: bad block counter inflated when repeatedly marking the same block
@ 2025-09-01  9:26 Wang Zhaolong
  2025-09-01 12:40 ` Zhihao Cheng
  2025-09-01 12:46 ` Miquel Raynal
  0 siblings, 2 replies; 5+ messages in thread
From: Wang Zhaolong @ 2025-09-01  9:26 UTC (permalink / raw)
  To: miquel.raynal, richard, vigneshr, linux-mtd
  Cc: yi.zhang, yangerkun, chengzhihao1

Hi all,

I’d like to report a mismatch between bad-block statistics and actual
on-flash state when repeatedly calling MEMSETBADBLOCK on the same
eraseblock.

Summary
- Repeatedly marking the same block bad (e.g., 5 times) makes
   /sys/class/mtd/mtdX/bad_blocks increase by 5.
- After reboot,  the statistical value ture to the correct value of 1.
- So the runtime counter (ecc_stats.badblocks) is inflated.

Repro (with nandsim.ko)

```bash
# ID="0xec,0xa1,0x00,0x15" # 128M 128KB 2KB
# modprobe nandsim id_bytes=$ID
# ~/mtd-utils/mtd_markbad /dev/mtd1 10 1 # Repeat 5 times
......
# ~/mtd-utils/mtd_markbad /dev/mtd1 10 1

# -- It can be observed that 5 bad blocks will appear in the statistical information.
# cat /sys/class/mtd/mtd1/bad_blocks
5

# -- In fact, we can only scan 1 bad block.
# ubiformat -v /dev/mtd1  | grep "bad eraseblock"
ubiformat: 1 bad eraseblocks found, numbers: 10
```

Root cause analysis (kernel-side)

```
mtd_block_markbad
   mtd->_block_markbad()
     nand_block_markbad
       ret = nand_block_isbad
       return 0; // ret > 0
   mtd->ecc_stats.badblocks++;  // No bad blocks was marked but was counted.
   
Relevant code
- drivers/mtd/nand/raw/nand_base.c:nand_block_markbad()
- drivers/mtd/mtdcore.c:mtd_block_markbad()
```

nand_block_markbad() returns 0 both for “newly marked” and “already bad”.
mtdcore cannot tell whether this call actually added a new bad block,
but still increments ecc_stats.badblocks.

Possible fixes (high level)
- Core-side conservative fix (minimal ABI change):
   * In mtd_block_markbad(), probe _block_isbad(master, ofs) before
     calling _block_markbad(), and (if available) probe again after success.
   * Only increment ecc_stats.badblocks if the state transitioned from
     “good” to “bad”.

- Teach *_block_markbad() to return a distinct positive code for
   “already bad” vs “newly marked”, so the core can increment only on
   “newly marked”.

What I want to know is:
- Would the core-side pre/post _block_isbad check be acceptable as a short-term fix?
- Any objections regarding the extra isbad IO in the markbad path?
- Longer-term, is there interest in an explicit API/return-code semantics
   to differentiate “already bad” vs “newly marked”?

I’m very interested in helping resolve this issue and would be grateful
for any guidance or suggestions.

Best regards,
Wang Zhaolong


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] mtd: bad block counter inflated when repeatedly marking the same block
  2025-09-01  9:26 [bug report] mtd: bad block counter inflated when repeatedly marking the same block Wang Zhaolong
@ 2025-09-01 12:40 ` Zhihao Cheng
  2025-09-01 14:14   ` Wang Zhaolong
  2025-09-01 12:46 ` Miquel Raynal
  1 sibling, 1 reply; 5+ messages in thread
From: Zhihao Cheng @ 2025-09-01 12:40 UTC (permalink / raw)
  To: Wang Zhaolong, miquel.raynal, richard, vigneshr, linux-mtd
  Cc: yi.zhang, yangerkun

在 2025/9/1 17:26, Wang Zhaolong 写道:
> Hi all,
> 
> I’d like to report a mismatch between bad-block statistics and actual
> on-flash state when repeatedly calling MEMSETBADBLOCK on the same
> eraseblock.
> 
> Summary
> - Repeatedly marking the same block bad (e.g., 5 times) makes
>    /sys/class/mtd/mtdX/bad_blocks increase by 5.
> - After reboot,  the statistical value ture to the correct value of 1.
> - So the runtime counter (ecc_stats.badblocks) is inflated.
> 
> Repro (with nandsim.ko)
> 
> ```bash
> # ID="0xec,0xa1,0x00,0x15" # 128M 128KB 2KB
> # modprobe nandsim id_bytes=$ID
> # ~/mtd-utils/mtd_markbad /dev/mtd1 10 1 # Repeat 5 times
> ......
> # ~/mtd-utils/mtd_markbad /dev/mtd1 10 1
> 
> # -- It can be observed that 5 bad blocks will appear in the statistical 
> information.
> # cat /sys/class/mtd/mtd1/bad_blocks
> 5
> 
> # -- In fact, we can only scan 1 bad block.
> # ubiformat -v /dev/mtd1  | grep "bad eraseblock"
> ubiformat: 1 bad eraseblocks found, numbers: 10
> ```
> 
> Root cause analysis (kernel-side)
> 
> ```
> mtd_block_markbad
>    mtd->_block_markbad()
>      nand_block_markbad
>        ret = nand_block_isbad
>        return 0; // ret > 0
>    mtd->ecc_stats.badblocks++;  // No bad blocks was marked but was 
> counted.
> Relevant code
> - drivers/mtd/nand/raw/nand_base.c:nand_block_markbad()
> - drivers/mtd/mtdcore.c:mtd_block_markbad()
> ```
> 
> nand_block_markbad() returns 0 both for “newly marked” and “already bad”.
> mtdcore cannot tell whether this call actually added a new bad block,
> but still increments ecc_stats.badblocks.
> 
> Possible fixes (high level)
> - Core-side conservative fix (minimal ABI change):
>    * In mtd_block_markbad(), probe _block_isbad(master, ofs) before
>      calling _block_markbad(), and (if available) probe again after 
> success.
>    * Only increment ecc_stats.badblocks if the state transitioned from
>      “good” to “bad”.

IMHO, the return value of _block_markbad() is used to identify whether 
the block becomes bad successfully after the operation, the extra 
message(eg. whether the block is already bad or newly marked as bad) 
could be passed out from a pointer-type function parameter.
Since there is only one place (mtd_block_markbad) invokes the 
_block_markbad(), I prefer the conservative(simple) fix.

> 
> - Teach *_block_markbad() to return a distinct positive code for
>    “already bad” vs “newly marked”, so the core can increment only on
>    “newly marked”.
> 
> What I want to know is:
> - Would the core-side pre/post _block_isbad check be acceptable as a 
> short-term fix?
> - Any objections regarding the extra isbad IO in the markbad path?
> - Longer-term, is there interest in an explicit API/return-code semantics
>    to differentiate “already bad” vs “newly marked”?
> 
> I’m very interested in helping resolve this issue and would be grateful
> for any guidance or suggestions.
> 
> Best regards,
> Wang Zhaolong
> 
> .


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] mtd: bad block counter inflated when repeatedly marking the same block
  2025-09-01  9:26 [bug report] mtd: bad block counter inflated when repeatedly marking the same block Wang Zhaolong
  2025-09-01 12:40 ` Zhihao Cheng
@ 2025-09-01 12:46 ` Miquel Raynal
  2025-09-01 14:16   ` Wang Zhaolong
  1 sibling, 1 reply; 5+ messages in thread
From: Miquel Raynal @ 2025-09-01 12:46 UTC (permalink / raw)
  To: Wang Zhaolong
  Cc: richard, vigneshr, linux-mtd, yi.zhang, yangerkun, chengzhihao1

Hello,

On 01/09/2025 at 17:26:45 +08, Wang Zhaolong <wangzhaolong@huaweicloud.com> wrote:

> Hi all,
>
> I’d like to report a mismatch between bad-block statistics and actual
> on-flash state when repeatedly calling MEMSETBADBLOCK on the same
> eraseblock.
>
> Summary
> - Repeatedly marking the same block bad (e.g., 5 times) makes
>   /sys/class/mtd/mtdX/bad_blocks increase by 5.
> - After reboot,  the statistical value ture to the correct value of 1.
> - So the runtime counter (ecc_stats.badblocks) is inflated.
>
> Repro (with nandsim.ko)
>
> ```bash
> # ID="0xec,0xa1,0x00,0x15" # 128M 128KB 2KB
> # modprobe nandsim id_bytes=$ID
> # ~/mtd-utils/mtd_markbad /dev/mtd1 10 1 # Repeat 5 times
> ......
> # ~/mtd-utils/mtd_markbad /dev/mtd1 10 1
>
> # -- It can be observed that 5 bad blocks will appear in the statistical information.
> # cat /sys/class/mtd/mtd1/bad_blocks
> 5
>
> # -- In fact, we can only scan 1 bad block.
> # ubiformat -v /dev/mtd1  | grep "bad eraseblock"
> ubiformat: 1 bad eraseblocks found, numbers: 10
> ```
>
> Root cause analysis (kernel-side)
>
> ```
> mtd_block_markbad
>   mtd->_block_markbad()
>     nand_block_markbad
>       ret = nand_block_isbad
>       return 0; // ret > 0
>   mtd->ecc_stats.badblocks++;  // No bad blocks was marked but was counted.
>   Relevant code
> - drivers/mtd/nand/raw/nand_base.c:nand_block_markbad()
> - drivers/mtd/mtdcore.c:mtd_block_markbad()
> ```
>
> nand_block_markbad() returns 0 both for “newly marked” and “already bad”.
> mtdcore cannot tell whether this call actually added a new bad block,
> but still increments ecc_stats.badblocks.
>
> Possible fixes (high level)
> - Core-side conservative fix (minimal ABI change):
>   * In mtd_block_markbad(), probe _block_isbad(master, ofs) before
>     calling _block_markbad(), and (if available) probe again after
> success.

Sounds reasonable to me.

>   * Only increment ecc_stats.badblocks if the state transitioned from
>     “good” to “bad”.
>
> - Teach *_block_markbad() to return a distinct positive code for
>   “already bad” vs “newly marked”, so the core can increment only on
>   “newly marked”.

The subsystems have no other way to tell rather than probing the state
of the marker. I believe it is best to do it in mtd_block_markbad() in
the common mtdcore.c rather than in each implementation.

> What I want to know is:
> - Would the core-side pre/post _block_isbad check be acceptable as a short-term fix?
> - Any objections regarding the extra isbad IO in the markbad path?

Fine by me, it's not a hot path. If we end up here, we are already doing
damage control.;

> - Longer-term, is there interest in an explicit API/return-code semantics
>   to differentiate “already bad” vs “newly marked”?

If you mean userspace API, I'd say not necessarily. Querying the stats
is probably the way to go. However in the kernel, while I would in
theory not be opposed to it, I don't see how one could implement that in
a more efficient way than probing the marker as discussed above in a
complex-free manner.

> I’m very interested in helping resolve this issue and would be grateful
> for any guidance or suggestions.

MTD folks in Cc, your feedback is also welcome ;-)

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] mtd: bad block counter inflated when repeatedly marking the same block
  2025-09-01 12:40 ` Zhihao Cheng
@ 2025-09-01 14:14   ` Wang Zhaolong
  0 siblings, 0 replies; 5+ messages in thread
From: Wang Zhaolong @ 2025-09-01 14:14 UTC (permalink / raw)
  To: Zhihao Cheng, miquel.raynal, richard, vigneshr, linux-mtd
  Cc: yi.zhang, yangerkun


>>
>> Possible fixes (high level)
>> - Core-side conservative fix (minimal ABI change):
>>    * In mtd_block_markbad(), probe _block_isbad(master, ofs) before
>>      calling _block_markbad(), and (if available) probe again after success.
>>    * Only increment ecc_stats.badblocks if the state transitioned from
>>      “good” to “bad”.
> 
> IMHO, the return value of _block_markbad() is used to identify whether the block becomes bad successfully after the operation, the extra message(eg. whether the block is already bad or newly marked as bad) could be passed out from a pointer-type function parameter.
> Since there is only one place (mtd_block_markbad) invokes the _block_markbad(), I prefer the conservative(simple) fix.
> 

Thanks for the clarification!

I agree with your interpretation of _block_markbad()’s return value
(0 = success, <0 = error).

Given there is only one caller (mtd_block_markbad), I’ll proceed with
the conservative fix.

Thanks again!

Best regards,
Wang Zhaolong


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] mtd: bad block counter inflated when repeatedly marking the same block
  2025-09-01 12:46 ` Miquel Raynal
@ 2025-09-01 14:16   ` Wang Zhaolong
  0 siblings, 0 replies; 5+ messages in thread
From: Wang Zhaolong @ 2025-09-01 14:16 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: richard, vigneshr, linux-mtd, yi.zhang, yangerkun, chengzhihao1


>> Possible fixes (high level)
>> - Core-side conservative fix (minimal ABI change):
>>    * In mtd_block_markbad(), probe _block_isbad(master, ofs) before
>>      calling _block_markbad(), and (if available) probe again after
>> success.
> 
> Sounds reasonable to me.
> 
>>    * Only increment ecc_stats.badblocks if the state transitioned from
>>      “good” to “bad”.
>>
>> - Teach *_block_markbad() to return a distinct positive code for
>>    “already bad” vs “newly marked”, so the core can increment only on
>>    “newly marked”.
> 
> The subsystems have no other way to tell rather than probing the state
> of the marker. I believe it is best to do it in mtd_block_markbad() in
> the common mtdcore.c rather than in each implementation.
> 

Thank you for your guidance!

>> What I want to know is:
>> - Would the core-side pre/post _block_isbad check be acceptable as a short-term fix?
>> - Any objections regarding the extra isbad IO in the markbad path?
> 
> Fine by me, it's not a hot path. If we end up here, we are already doing
> damage control.;

I’ll prepare an RFC patch implementing the pre/post _block_isbad() checks
in mtd_block_markbad() and only increment ecc_stats.badblocks on a confirmed
good→bad transition. As discussed, this keeps the logic in mtdcore and
avoids per-driver changes.

For devices lacking _block_isbad(), I’ll take the conservative approach
(no increment unless we can confirm the transition) and call that out in the
commit message to gather feedback.

> If you mean userspace API, I'd say not necessarily. Querying the stats
> is probably the way to go. However in the kernel, while I would in
> theory not be opposed to it, I don't see how one could implement that in
> a more efficient way than probing the marker as discussed above in a
> complex-free manner.
> 

It sounds very reasonable.

I will include reproduction methods and test cases based on both nandsim and
physical devices. Thank you again for your guidance!

Best regards,
Wang Zhaolong


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-09-01 17:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-01  9:26 [bug report] mtd: bad block counter inflated when repeatedly marking the same block Wang Zhaolong
2025-09-01 12:40 ` Zhihao Cheng
2025-09-01 14:14   ` Wang Zhaolong
2025-09-01 12:46 ` Miquel Raynal
2025-09-01 14:16   ` Wang Zhaolong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox