linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* ubifs mount fails due to corrupt empty space
@ 2012-01-13  2:24 Gabriel Matni
  2012-01-13  9:58 ` Matthieu CASTET
  2012-01-16 11:14 ` Artem Bityutskiy
  0 siblings, 2 replies; 4+ messages in thread
From: Gabriel Matni @ 2012-01-13  2:24 UTC (permalink / raw)
  To: linux-mtd


[-- Attachment #1.1: Type: text/plain, Size: 4038 bytes --]

Hello list,

Using kernel 3.2, ubifs failed to mount a fs properly due to a corrupt
empty space encoutered while replaying the journal. Please notice the
bitflip occurring in the supposedly empty space shown in the log below. The
storage medium is an SLC NAND flash.

The stack dump shown below shows that replay_log_leb() was behind the call
to ubifs_scan(), and could not deal with the -EUCLEAN error returned.
ubifs_scan() returned this error because it detected the corrupt empty
space.
The fs was unmounted cleanly, resulting in no need for fs recovery to
happen during the mount (need_recovery=0). Therefore the recovery of the
log_leb in question couldn't be performed, causing the mount to abort. This
problem was also seen in the following thread:
http://lists.infradead.org/pipermail/linux-mtd/2009-March/024953.html with
a NOR flash

I applied the attached patch and the fs was successfully mounted giving the
fs a second life.

My main questions are:
1) What should be the expected behavior of ubifs in this case?
2) Is there a systematic way to recover from a situation like that?
3) Is the usage of this patch safe?
4) Any suggestions?

Thank you,

Gabriel Matni

Kernel log when attempting to mount the file system (2 UBI volumes on mtd12
each having a ubifs):

<5>[   68.901714] UBI: attaching mtd12 to ubi12
<5>[   68.905773] UBI: physical eraseblock size:   131072 bytes (128 KiB)
<5>[   68.912640] UBI: logical eraseblock size:    126976 bytes
<5>[   68.918070] UBI: smallest flash I/O unit:    2048
<5>[   68.923217] UBI: VID header offset:          2048 (aligned 2048)
<5>[   68.929255] UBI: data offset:                4096
<5>[   69.853538] UBI: max. sequence number:       52629
<5>[   69.906701] UBI: attached mtd12 to ubi12
<5>[   69.911170] UBI: MTD device name:            ""
<5>[   69.916702] UBI: MTD device size:            192 MiB
<5>[   69.922134] UBI: number of good PEBs:        1528
<5>[   69.926871] UBI: number of bad PEBs:         8
<5>[   69.931756] UBI: number of corrupted PEBs:   0
<5>[   69.936239] UBI: max. allowed volumes:       128
<5>[   69.941285] UBI: wear-leveling threshold:    4096
<5>[   69.946023] UBI: number of internal volumes: 1
<5>[   69.950900] UBI: number of user volumes:     2
<5>[   69.955378] UBI: available PEBs:             0
<5>[   69.960254] UBI: total number of reserved PEBs: 1528
<5>[   69.965253] UBI: number of PEBs reserved for bad PEB handling: 30
<5>[   69.971786] UBI: max/mean erase counter: 165/36
<5>[   69.976348] UBI: image sequence number:  1271161585
<5>[   69.981721] UBI: background thread "ubi_bgt12d" started, PID 891
<3>[   70.085510] UBIFS error (pid 893): ubifs_scan: corrupt empty space at
LEB 5:3775
<3>[   70.093329] UBIFS error (pid 893): ubifs_scanned_corruption:
corruption at LEB 5:3775
<7>[   70.101696] 00000000: ffffffef ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
<7>[   70.101847] 00000020: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
etc...
<3>[   70.144549] UBIFS error (pid 893): ubifs_scan: LEB 5 scanning failed
<5>[   70.445773] UBI: mtd12 is detached from ubi12

Stack dump at the location of interest (inside ubifs_scan):

<4>[   60.988112] [<c000c29c>] (dump_backtrace+0x0/0x114) from [<c0234460>]
(dump_stack+0x18/0x1c)
<4>[   60.996995]  r7:c613a000 r6:00000ebf r5:0001e140 r4:c8be7ebf
<4>[   61.003266] [<c0234448>] (dump_stack+0x0/0x1c) from [<c00f3fd4>]
(ubifs_scan+0x334/0x3a0)
<4>[   61.011959] [<c00f3ca0>] (ubifs_scan+0x0/0x3a0) from [<c00f417c>]
(replay_log_leb+0x94/0x7e8)
<4>[   61.021016] [<c00f40e8>] (replay_log_leb+0x0/0x7e8) from [<c00f5aa0>]
(ubifs_replay_journal+0x198/0x398)
<4>[   61.031029] [<c00f5908>] (ubifs_replay_journal+0x0/0x398) from
[<c00e802c>] (ubifs_fill_super+0xbdc/0x1854)
<4>[   61.041283] [<c00e7450>] (ubifs_fill_super+0x0/0x1854) from
[<c00e9138>] (ubifs_mount+0x494/0x564)
<4>[   61.050769] [<c00e8ca4>] (ubifs_mount+0x0/0x564) from [<c00816cc>]
(mount_fs+0x14/0x40)

[-- Attachment #1.2: Type: text/html, Size: 4731 bytes --]

[-- Attachment #2: ubifs-corrupt-empty-space.patch --]
[-- Type: application/octet-stream, Size: 544 bytes --]

diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index ccabaf1..c6543ed 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -833,7 +833,7 @@ static int replay_log_leb(struct ubifs_info *c, int lnum, int offs, void *sbuf)
 	dbg_mnt("replay log LEB %d:%d", lnum, offs);
 	sleb = ubifs_scan(c, lnum, offs, sbuf, c->need_recovery);
 	if (IS_ERR(sleb)) {
-		if (PTR_ERR(sleb) != -EUCLEAN || !c->need_recovery)
+		if (PTR_ERR(sleb) != -EUCLEAN)
 			return PTR_ERR(sleb);
 		/*
 		 * Note, the below function will recover this log LEB only if

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: ubifs mount fails due to corrupt empty space
  2012-01-13  2:24 ubifs mount fails due to corrupt empty space Gabriel Matni
@ 2012-01-13  9:58 ` Matthieu CASTET
  2012-01-16 11:16   ` Artem Bityutskiy
  2012-01-16 11:14 ` Artem Bityutskiy
  1 sibling, 1 reply; 4+ messages in thread
From: Matthieu CASTET @ 2012-01-13  9:58 UTC (permalink / raw)
  To: Gabriel Matni; +Cc: linux-mtd@lists.infradead.org

Gabriel Matni a écrit :
> Hello list,
> 
> Using kernel 3.2, ubifs failed to mount a fs properly due to a corrupt
> empty space encoutered while replaying the journal. Please notice the
> bitflip occurring in the supposedly empty space shown in the log below.
> The storage medium is an SLC NAND flash.
> 
> The stack dump shown below shows that replay_log_leb() was behind the
> call to ubifs_scan(), and could not deal with the -EUCLEAN error
> returned. ubifs_scan() returned this error because it detected the
> corrupt empty space.
> 
> The fs was unmounted cleanly, resulting in no need for fs recovery to
> happen during the mount (need_recovery=0). Therefore the recovery of the
> log_leb in question couldn't be performed, causing the mount to abort.
> This problem was also seen in the following thread:
> http://lists.infradead.org/pipermail/linux-mtd/2009-March/024953.html
> <http://lists..infradead.org/pipermail/linux-mtd/2009-March/024953.html> with
> a NOR flash
>  
> I applied the attached patch and the fs was successfully mounted giving
> the fs a second life.
> 
> My main questions are:
> 1) What should be the expected behavior of ubifs in this case?
> 2) Is there a systematic way to recover from a situation like that?
ubifs expect empty space to be protected by ecc. But if empty page ecc is not
ff, this need hack in the nand driver.



Matthieu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ubifs mount fails due to corrupt empty space
  2012-01-13  2:24 ubifs mount fails due to corrupt empty space Gabriel Matni
  2012-01-13  9:58 ` Matthieu CASTET
@ 2012-01-16 11:14 ` Artem Bityutskiy
  1 sibling, 0 replies; 4+ messages in thread
From: Artem Bityutskiy @ 2012-01-16 11:14 UTC (permalink / raw)
  To: Gabriel Matni; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 963 bytes --]

On Thu, 2012-01-12 at 21:24 -0500, Gabriel Matni wrote:
> ) Is the usage of this patch safe?

No, it is wrong and unsafe.

> 4) Any suggestions?

Did you check if your driver protects empty space? If no, try to make it
so. The other way is to teach UBI/UBIFS allow for bit-flips in empty
space.

> Kernel log when attempting to mount the file system (2 UBI volumes on
> mtd12 each having a ubifs):
> 
> <5>[   68.901714] UBI: attaching mtd12 to ubi12

mtd12? Often when people have so many MTD devices it is an indication of
bad design. Why you have so many?

Also, if you can share the mtd12 image with me - I can take a closer
look, but it does not look like I will suggest anything new.

Please, check your driver to start with and let us know if it protects
the empty space.

Did you validate your driver with mtd tests, BTW?
(http://www.linux-mtd.infradead.org/doc/general.html#L_mtd_tests)

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ubifs mount fails due to corrupt empty space
  2012-01-13  9:58 ` Matthieu CASTET
@ 2012-01-16 11:16   ` Artem Bityutskiy
  0 siblings, 0 replies; 4+ messages in thread
From: Artem Bityutskiy @ 2012-01-16 11:16 UTC (permalink / raw)
  To: Matthieu CASTET; +Cc: linux-mtd@lists.infradead.org, Gabriel Matni

[-- Attachment #1: Type: text/plain, Size: 543 bytes --]

On Fri, 2012-01-13 at 10:58 +0100, Matthieu CASTET wrote:
> > My main questions are:
> > 1) What should be the expected behavior of ubifs in this case?
> > 2) Is there a systematic way to recover from a situation like that?
> ubifs expect empty space to be protected by ecc. But if empty page ecc is not
> ff, this need hack in the nand driver.

I guess one could also teach UBI/UBIFS to do this. I do not have time,
but if someone with clue had it - I could help by suggesting and
reviewing.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-01-16 11:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-13  2:24 ubifs mount fails due to corrupt empty space Gabriel Matni
2012-01-13  9:58 ` Matthieu CASTET
2012-01-16 11:16   ` Artem Bityutskiy
2012-01-16 11:14 ` Artem Bityutskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).