public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* corruption of JFFS2 filesystem, csize is set to 0 after moving a block
@ 2007-04-26 14:54 Hans-Christian Egtvedt
  2007-04-26 15:43 ` David Woodhouse
  0 siblings, 1 reply; 6+ messages in thread
From: Hans-Christian Egtvedt @ 2007-04-26 14:54 UTC (permalink / raw)
  To: linux-mtd

Hello,

When I stress the JFFS2 filesystem by copying files around on the root
(/) I end up with a corrupted filesystem after a reboot. The system just
hangs after the kernel is done booting:
Freeing init memory: 56K (90000000 - 9000e000)

Where I should get:
init started:  BusyBox v1.4.2 (2007-04-17 15:34:55 CEST) multi-call
binary
etc...

I copy and remove files until I reach "cp: write error: No space left on
device"

I extracted the filesystem from my flash device (Atmel AT49BV642D) and
did a dump. Here I can see that some of the nodes have a csize set to 0
for vital files such as libdl-0.9.28.so.

Any pointers to where I should start debugging, what can go wrong?

I can provide jffs2dump's, logs or images if needed.

-- 
Best regards
Hans-Christian Egtvedt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corruption of JFFS2 filesystem, csize is set to 0 after moving a block
  2007-04-26 14:54 corruption of JFFS2 filesystem, csize is set to 0 after moving a block Hans-Christian Egtvedt
@ 2007-04-26 15:43 ` David Woodhouse
  2007-04-27  9:13   ` Hans-Christian Egtvedt
  0 siblings, 1 reply; 6+ messages in thread
From: David Woodhouse @ 2007-04-26 15:43 UTC (permalink / raw)
  To: Hans-Christian Egtvedt; +Cc: linux-mtd

On Thu, 2007-04-26 at 16:54 +0200, Hans-Christian Egtvedt wrote:
> Hello,
> 
> When I stress the JFFS2 filesystem by copying files around on the root
> (/) I end up with a corrupted filesystem after a reboot. The system just
> hangs after the kernel is done booting:
> Freeing init memory: 56K (90000000 - 9000e000)
> 
> Where I should get:
> init started:  BusyBox v1.4.2 (2007-04-17 15:34:55 CEST) multi-call
> binary
> etc...
> 
> I copy and remove files until I reach "cp: write error: No space left on
> device"
> 
> I extracted the filesystem from my flash device (Atmel AT49BV642D) and
> did a dump. Here I can see that some of the nodes have a csize set to 0
> for vital files such as libdl-0.9.28.so.

There's not necessarily anything wrong with that.

> Any pointers to where I should start debugging, what can go wrong?
> 
> I can provide jffs2dump's, logs or images if needed.

Take a copy of the image, then work out where the kernel is stuck. Use
SysRq-P and/or SysRq-T, and if it's in JFFS2 try running with
CONFIG_JFFS2_FS_DEBUG=1 (and with 'verbose' on the command line), and
capture all the output on a serial console.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corruption of JFFS2 filesystem, csize is set to 0 after moving a block
  2007-04-26 15:43 ` David Woodhouse
@ 2007-04-27  9:13   ` Hans-Christian Egtvedt
  2007-04-27  9:31     ` Haavard Skinnemoen
  2007-04-27  9:46     ` David Woodhouse
  0 siblings, 2 replies; 6+ messages in thread
From: Hans-Christian Egtvedt @ 2007-04-27  9:13 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

On Thu, 2007-04-26 at 16:43 +0100, David Woodhouse wrote:
> On Thu, 2007-04-26 at 16:54 +0200, Hans-Christian Egtvedt wrote:
> > Hello,
> > 
> > When I stress the JFFS2 filesystem by copying files around on the root
> > (/) I end up with a corrupted filesystem after a reboot. The system just
> > hangs after the kernel is done booting:
> > Freeing init memory: 56K (90000000 - 9000e000)
> > 
> > Where I should get:
> > init started:  BusyBox v1.4.2 (2007-04-17 15:34:55 CEST) multi-call
> > binary
> > etc...
> > 
> > I copy and remove files until I reach "cp: write error: No space left on
> > device"
> > 
> > I extracted the filesystem from my flash device (Atmel AT49BV642D) and
> > did a dump. Here I can see that some of the nodes have a csize set to 0
> > for vital files such as libdl-0.9.28.so.
> 
> There's not necessarily anything wrong with that.

Some filesystem dump from before:
         Dirent     node at 0x0013c7e0, totlen 0x0000003b, #pino     7, version   148, #ino       150, nsize       19, name ld-uClibc-0.9.28.so
         Inode      node at 0x0013c81c, totlen 0x00000a14, #ino    150, version     1, isize    13108, csize     2512, dsize     4092, offset        0
         Inode      node at 0x0013d230, totlen 0x00000c57, #ino    150, version     2, isize    13108, csize     3091, dsize     4092, offset     4092
         Inode      node at 0x0013de88, totlen 0x00000b21, #ino    150, version     3, isize    13108, csize     2781, dsize     4092, offset     8184
         Inode      node at 0x0013e9ac, totlen 0x000001e0, #ino    150, version     4, isize    13108, csize      412, dsize      832, offset    12276

After:
         Dirent     node at 0x006c7bf0, totlen 0x0000003b, #pino     7, version   171, #ino       150, nsize       19, name ld-uClibc-0.9.28.so
         Inode      node at 0x006c7c2c, totlen 0x00000a14, #ino    150, version     5, isize    13108, csize     2512, dsize     4092, offset        0
         Inode      node at 0x006c8640, totlen 0x00000044, #ino    150, version     6, isize    13108, csize        0, dsize     4092, offset     4092
         Inode      node at 0x006c8684, totlen 0x00000044, #ino    150, version     7, isize    13108, csize        0, dsize     4092, offset     8184
         Inode      node at 0x006c86c8, totlen 0x00000044, #ino    150, version     8, isize    13108, csize        0, dsize      832, offset    12276

csize changed to 0 is correct for this node?

If the node header is correct, could it be that the node data has been
corrupted in some way?

> > Any pointers to where I should start debugging, what can go wrong?
> > 
> > I can provide jffs2dump's, logs or images if needed.
> 
> Take a copy of the image, then work out where the kernel is stuck. Use
> SysRq-P and/or SysRq-T, and if it's in JFFS2 try running with
> CONFIG_JFFS2_FS_DEBUG=1 (and with 'verbose' on the command line), and
> capture all the output on a serial console.

The system is in do_signal, which is most likely a sign of the init
process has received an unexpected signal. I assume it is due to one of
the core libraries being corrupted.

JFFS2 log with debug=1

jffs2_scan_dirent_node(): Node at 0x006c7bf0
[JFFS2 DBG] (1) jffs2_link_node_ref: Last node at 903008c4 is (006c7bac,902febd8)
[JFFS2 DBG] (1) jffs2_link_node_ref: New ref is 903008d0 (fffffffe becomes 006c7bf2,00000000) len 0x3c
[JFFS2 DBG] (1) jffs2_add_fd_to_list: add dirent "ld-uClibc-0.9.28.so", ino #150
jffs2_scan_inode_node(): Node at 0x006c7c2c
[JFFS2 DBG] (1) jffs2_add_ino_cache: add 902febc0 (ino #150)
[JFFS2 DBG] (1) jffs2_link_node_ref: Last node at 903008d0 is (006c7bf2,902e7704)
[JFFS2 DBG] (1) jffs2_link_node_ref: New ref is 903008dc (fffffffe becomes 006c7c2c,00000000) len 0xa14
Node is ino #150, version 5. Range 0x0-0xffc
Fewer than 68 bytes (inode node) left to end of buf. Reading 0x1000 at 0x006c8640
jffs2_scan_inode_node(): Node at 0x006c8640
[JFFS2 DBG] (1) jffs2_link_node_ref: Last node at 903008dc is (006c7c2c,902febc0)
[JFFS2 DBG] (1) jffs2_link_node_ref: New ref is 903008e8 (fffffffe becomes 006c8640,00000000) len 0x44
Node is ino #150, version 6. Range 0xffc-0x1ff8
jffs2_scan_inode_node(): Node at 0x006c8684
[JFFS2 DBG] (1) jffs2_link_node_ref: Last node at 903008e8 is (006c8640,903008dc)
[JFFS2 DBG] (1) jffs2_link_node_ref: New ref is 903008f4 (fffffffe becomes 006c8684,00000000) len 0x44
Node is ino #150, version 7. Range 0x1ff8-0x2ff4
jffs2_scan_inode_node(): Node at 0x006c86c8
[JFFS2 DBG] (1) jffs2_link_node_ref: Last node at 903008f4 is (006c8684,903008e8)
[JFFS2 DBG] (1) jffs2_link_node_ref: New ref is 90300900 (fffffffe becomes 006c86c8,00000000) len 0x44
Node is ino #150, version 8. Range 0x2ff4-0x3334

What else should I look for in the log file, it is a bit big to be
attached to this list (21 MB).

-- 
Best regards
Hans-Christian Egtvedt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corruption of JFFS2 filesystem, csize is set to 0 after moving a block
  2007-04-27  9:13   ` Hans-Christian Egtvedt
@ 2007-04-27  9:31     ` Haavard Skinnemoen
  2007-04-27  9:46     ` David Woodhouse
  1 sibling, 0 replies; 6+ messages in thread
From: Haavard Skinnemoen @ 2007-04-27  9:31 UTC (permalink / raw)
  To: Hans-Christian Egtvedt; +Cc: linux-mtd, David Woodhouse

On Fri, 27 Apr 2007 11:13:49 +0200
Hans-Christian Egtvedt <hcegtvedt@norway.atmel.com> wrote:

> The system is in do_signal, which is most likely a sign of the init
> process has received an unexpected signal. I assume it is due to one of
> the core libraries being corrupted.

FWIW, the avr32 update I'm about to push out will change the behaviour
of the exception handling code to panic when this happens instead of
trying to deliver the signal forever. I can try to backport it to
whatever version you're running, but I'm pretty sure your analysis is
correct (more specifically, I think init got a SIGBUS signal it didn't
want.)

Haavard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corruption of JFFS2 filesystem, csize is set to 0 after moving a block
  2007-04-27  9:13   ` Hans-Christian Egtvedt
  2007-04-27  9:31     ` Haavard Skinnemoen
@ 2007-04-27  9:46     ` David Woodhouse
  2007-04-27 11:52       ` Hans-Christian Egtvedt
  1 sibling, 1 reply; 6+ messages in thread
From: David Woodhouse @ 2007-04-27  9:46 UTC (permalink / raw)
  To: Hans-Christian Egtvedt; +Cc: linux-mtd

On Fri, 2007-04-27 at 11:13 +0200, Hans-Christian Egtvedt wrote:
> 
> Some filesystem dump from before:
>          Dirent     node at 0x0013c7e0, totlen 0x0000003b, #pino     7, version   148, #ino       150, nsize       19, name ld-uClibc-0.9.28.so
>          Inode      node at 0x0013c81c, totlen 0x00000a14, #ino    150, version     1, isize    13108, csize     2512, dsize     4092, offset        0
>          Inode      node at 0x0013d230, totlen 0x00000c57, #ino    150, version     2, isize    13108, csize     3091, dsize     4092, offset     4092
>          Inode      node at 0x0013de88, totlen 0x00000b21, #ino    150, version     3, isize    13108, csize     2781, dsize     4092, offset     8184
>          Inode      node at 0x0013e9ac, totlen 0x000001e0, #ino    150, version     4, isize    13108, csize      412, dsize      832, offset    12276

Those are suspect. Why 4092 bytes not 4096? The node with version 2
claims to be 4092 bytes starting from 4092, which is invalid because it
crosses a page boundary.

> After:
>          Dirent     node at 0x006c7bf0, totlen 0x0000003b, #pino     7, version   171, #ino       150, nsize       19, name ld-uClibc-0.9.28.so
>          Inode      node at 0x006c7c2c, totlen 0x00000a14, #ino    150, version     5, isize    13108, csize     2512, dsize     4092, offset        0
>          Inode      node at 0x006c8640, totlen 0x00000044, #ino    150, version     6, isize    13108, csize        0, dsize     4092, offset     4092
>          Inode      node at 0x006c8684, totlen 0x00000044, #ino    150, version     7, isize    13108, csize        0, dsize     4092, offset     8184
>          Inode      node at 0x006c86c8, totlen 0x00000044, #ino    150, version     8, isize    13108, csize        0, dsize      832, offset    12276 

Ok, in that case I agree that a csize of zero also looks suspicious.
Matches the node 'totlen' though. What's the compression type.

Did you use 'mkfs.jffs2 -s 4092'?

-- 
dwmw2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corruption of JFFS2 filesystem, csize is set to 0 after moving a block
  2007-04-27  9:46     ` David Woodhouse
@ 2007-04-27 11:52       ` Hans-Christian Egtvedt
  0 siblings, 0 replies; 6+ messages in thread
From: Hans-Christian Egtvedt @ 2007-04-27 11:52 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

On Fri, 2007-04-27 at 10:46 +0100, David Woodhouse wrote:
> On Fri, 2007-04-27 at 11:13 +0200, Hans-Christian Egtvedt wrote:

<cut jffs2dump initial image>

> Those are suspect. Why 4092 bytes not 4096? The node with version 2
> claims to be 4092 bytes starting from 4092, which is invalid because it
> crosses a page boundary.

Let me quote Homer Jay Simpson, "DOH!".

<cut jffs2dump corrupted image>

> Ok, in that case I agree that a csize of zero also looks suspicious.
> Matches the node 'totlen' though. What's the compression type.
> 
> Did you use 'mkfs.jffs2 -s 4092'?

I have no idea how I turned up with this number, but rebuilding the
image with pagesize=4096 gives a fully working image.

Many thanks for your help.

-- 
Best regards
Hans-Christian Egtvedt

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-04-27 11:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-26 14:54 corruption of JFFS2 filesystem, csize is set to 0 after moving a block Hans-Christian Egtvedt
2007-04-26 15:43 ` David Woodhouse
2007-04-27  9:13   ` Hans-Christian Egtvedt
2007-04-27  9:31     ` Haavard Skinnemoen
2007-04-27  9:46     ` David Woodhouse
2007-04-27 11:52       ` Hans-Christian Egtvedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox