public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* Problems with JFFS2 FS during parallel write operations
@ 2008-11-05  8:43 Ostendorf, Rainer
  2008-12-01  7:36 ` Ostendorf, Rainer
  0 siblings, 1 reply; 4+ messages in thread
From: Ostendorf, Rainer @ 2008-11-05  8:43 UTC (permalink / raw)
  To: linux-mtd

Hi list,

i am currently working on a Atmel AT91RM9200 based embedded system. As persistent memory there is a 32MByte Spansion S29GL256P NOR flash device connected to the parallel bus. That flash memory is used for loading first the u-boot bootloader and then the combined kernel- and ramdisk-image out of a JFFS2 filesystem on the flash. The linux kernel running on the board is a linux 2.6.23-rc3 adapted to my hardware.

There is a silicon bug in the processor, that leads to address line A24 not being driven by the bus interface. As a workaround for this, i connected A25 instead of A24 to the flash and address the flash with 16MB offset. 

During normal operation the system runs perfectly stable, but when i start two processes running parallel, writing huge amounts of data to the flash device, i get error messages from the JFFS2 filesystem:

...
argh. node added in wrong place
argh. node added in wrong place
...

This message repeats for about 15-20 times while copying parallel 2 files of about 6MByte to the flash via Ethernet (SCP). When i then reboot the system, the u-boot bootloader generates the following errors while scanning the JFFS2 filesystem for the image file:

### JFFS2 loading '/images/boot.img' to 0x21000000
Scanning JFFS2 FS: | Unknown node type: e002 len 4164 offset 0x7d1bc
.| Unknown node type: e002 len 4164 offset 0xf4dd5c
\ Unknown node type: e002 len 4164 offset 0x100a120
/ Unknown node type: e002 len 4164 offset 0x10a27e0
| Unknown node type: e002 len 4164 offset 0x11563c0
- Unknown node type: e002 len 4164 offset 0x11f4bec
/ Unknown node type: e002 len 4164 offset 0x12aa0cc
| Unknown node type: e002 len 4164 offset 0x1349aa4
/ Unknown node type: e002 len 4164 offset 0x14bd484
| Unknown node type: e002 len 4164 offset 0x155163c
- Unknown node type: e002 len 4164 offset 0x15f1b5c
| Unknown node type: e002 len 4164 offset 0x166c750
\ Unknown node type: e002 len 4164 offset 0x173d7c0
- Unknown node type: e002 len 4164 offset 0x1cefb58
/ Unknown node type: e002 len 4164 offset 0x1d902fc
\ Unknown node type: e002 len 4164 offset 0x1e35aec
| Unknown node type: e002 len 4164 offset 0x1f76b54
 done.
### JFFS2 load complete: 6410357 bytes loaded to 0x21000000

The U-Boot bootloader detects that the checksum of the uploaded image is wrong and the system does not boot any more from flash memory/JFFS2. After booting from ethernet, the kernel gives the following error messages when trying to mount the root filesystem:

Linux version 2.6.23-rc3 (armdev@arm-workstation) (gcc version 4.1.1) #3 Mon Nov 3 15:07:56 CET 2008
CPU: ARM920T [41129200] revision 0 (ARMv4T), cr=c0003177

Memory policy: ECC disabled, Data cache writeback
Clocks: CPU 180 MHz, master 60 MHz, main 20.000 MHz
CPU0: D VIVT write-back cache
CPU0: I cache: 16384 bytes, associativity 64, 32 byte lines, 8 sets
CPU0: D cache: 16384 bytes, associativity 64, 32 byte lines, 8 sets
Built 1 zonelists in Zone order.  Total pages: 8128
Kernel command line: console=/dev/ttyS0,115200n8 mtdparts=physmap-flash.0:128k(u-boot),128k(env),-(User)

[...]

physmap platform flash device: 02000000 at 11000000
physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank
 Amd/Fujitsu Extended Query Table at 0x0040
physmap-flash.0: CFI does not contain boot bank location. Assuming top.
number of CFI chips: 1
cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.
3 cmdlinepart partitions found on MTD device physmap-flash.0
Creating 3 MTD partitions on "physmap-flash.0":
0x00000000-0x00020000 : "u-boot"
0x00020000-0x00040000 : "env"
0x00040000-0x02000000 : "User"

[...]

syslogd starting
klogd starting
mounting flash file system...jffs2_scan_inode_node(): CRC failed on node at 0x0003d1bc: Read 0x218d1014, calculated 0x4418ef99
jffs2_scan_eraseblock(): Node at 0x0009286c {0x1985, 0xe002, 0x00001040) has invalid CRC 0x00914828 (calculated 0x0bb3cf3a)
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00092870: 0x1040 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00092874: 0x4828 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0009287c: 0x0004 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00092880: 0x81a4 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00092888: 0xd075 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0009288c: 0xdf2b instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00092890: 0xdf2b instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00092894: 0xdf2b instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00092898: 0x8000 instead
Further such events for this erase block will not be printed
Old JFFS2 bitmask found at 0x00092bb4
You cannot use older JFFS2 filesystems with newer kernels
jffs2_scan_inode_node(): CRC failed on node at 0x00f0dd5c: Read 0xa1003010, calculated 0xc503cba7
jffs2_scan_inode_node(): CRC failed on node at 0x00fca120: Read 0x40484a39, calculated 0x6589f6f0
jffs2_scan_inode_node(): CRC failed on node at 0x010627e0: Read 0x0082951c, calculated 0xdeb63380
jffs2_scan_inode_node(): CRC failed on node at 0x011163c0: Read 0x31850c5d, calculated 0xf9ff6bb4
jffs2_scan_inode_node(): CRC failed on node at 0x011b4bec: Read 0x2e2a8811, calculated 0xa736b9ce
jffs2_scan_inode_node(): CRC failed on node at 0x0126a0cc: Read 0x15c8c20c, calculated 0x70566992
jffs2_scan_inode_node(): CRC failed on node at 0x01309aa4: Read 0x40914000, calculated 0xfb3b404a
jffs2_scan_eraseblock(): Node at 0x013a6fdc {0x1985, 0xe002, 0x00000044) has invalid CRC 0x04111040 (calculated 0x98f7fb1d)
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x013a6fe0: 0x0044 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x013a6fe4: 0x1040 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x013a6fec: 0x0801 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x013a6ff0: 0x81a4 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x013a6ff8: 0xd075 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x013a6ffc: 0xdee6 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x013a7000: 0xdee6 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x013a7004: 0xdee6 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x013a7010: 0x1000 instead
Further such events for this erase block will not be printed
jffs2_scan_inode_node(): CRC failed on node at 0x0147d484: Read 0x4a5a0490, calculated 0xdb37c1be
jffs2_scan_inode_node(): CRC failed on node at 0x0151163c: Read 0x90010230, calculated 0x243c5ff2
jffs2_scan_inode_node(): CRC failed on node at 0x015b1b5c: Read 0x0b340e00, calculated 0x52cba956
jffs2_scan_inode_node(): CRC failed on node at 0x0162c750: Read 0x20028308, calculated 0xb06ff75e
jffs2_scan_inode_node(): CRC failed on node at 0x016fd7c0: Read 0x23084020, calculated 0x93ff7ffd
jffs2_scan_inode_node(): CRC failed on node at 0x01cafb58: Read 0x000354b0, calculated 0xd537d003
jffs2_scan_inode_node(): CRC failed on node at 0x01d502fc: Read 0x0a100622, calculated 0x844de98d
jffs2_scan_inode_node(): CRC failed on node at 0x01df5aec: Read 0xc1007800, calculated 0x21619a2e
jffs2_scan_eraseblock(): Node at 0x01e95594 {0x1985, 0xe002, 0x00000040) has invalid CRC 0x84100825 (calculated 0x17956c4a)
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x01e95598: 0x0040 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x01e9559c: 0x0825 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x01e955a4: 0x00c6 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x01e955a8: 0x81a4 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x01e955b0: 0xd075 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x01e955b4: 0xdf42 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x01e955b8: 0xdf42 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x01e955bc: 0xdf42 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x01e955c0: 0x8000 instead
Further such events for this erase block will not be printed
jffs2_scan_inode_node(): CRC failed on node at 0x01f36b54: Read 0x14282080, calculated 0x62aa15fe
ok
[...]


Before beeing mounted as JFFS2 FS the first time, the flash was completly erased (using the ICs embedded erase algorithm). I also tried to erase it with the flash_eraseall command with option "-j" set and saw no difference. Do i need to prepare the memory in any other way then just erasing it, before mounting it as JFFS2?

What i did until now was to check the timings of the flash IC - they seem to be ok. I also tested to copy and erase a big file with random content in the flash-filesystem serveral times and checked its MD5-sum after each cycle - it was always ok. The problem only occured during access to the JFFS2 filesystem during parallel access.

As i don't know exactly where to look next: has anyone has seen such a behavior before? Is it possible that this a kernel bug (perhaps race condition?), or more likely a hardware problem?

Many thanks in advance for any hint!

regards,
Rainer


Benning Elektrotechnik und Elektronik GmbH & Co. KG Bocholt
Handelsregister Coesfeld HRA-Nr. 4661
Persönlich haftende Gesellschaft: Benning GmbH
Handelsregister Coesfeld HRB-Nr. 7772
Geschäftsführer: Th. Benning

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Problems with JFFS2 FS during parallel write operations
  2008-11-05  8:43 Problems with JFFS2 FS during parallel write operations Ostendorf, Rainer
@ 2008-12-01  7:36 ` Ostendorf, Rainer
  2008-12-01 12:08   ` Cal Page
  2008-12-01 12:48   ` Geert Uytterhoeven
  0 siblings, 2 replies; 4+ messages in thread
From: Ostendorf, Rainer @ 2008-12-01  7:36 UTC (permalink / raw)
  To: linux-mtd

Hi list,

i'm still working on the problem described in my previous post:

[...]
> During normal operation the system runs perfectly stable, but when i 
> start two processes running parallel, writing huge amounts of data 
> to the flash device, i get error 
> messages from the JFFS2 filesystem:
> 
> ...
> argh. node added in wrong place
> argh. node added in wrong place
> ...
> 
> This message repeats for about 15-20 times while copying parallel 2 
> files of about 6MByte to the flash via Ethernet (SCP). 
[...]

As i can only reproduce the corruption of the JFFS2 FS during parallel access of at leat two processes writing to the flash, i assume that the cause for the corruption is some kind of race-condition. Have there been any known problems/bugs with race conditions, that could lead to an error like this? Would it probably help to upgrade to a newer kernel version? Any hint and help would be greatly appreciated.

best regards,
Rainer


Benning Elektrotechnik und Elektronik GmbH & Co. KG Bocholt
Handelsregister Coesfeld HRA-Nr. 4661
Persönlich haftende Gesellschaft: Benning GmbH
Handelsregister Coesfeld HRB-Nr. 7772
Geschäftsführer: Th. Benning

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problems with JFFS2 FS during parallel write operations
  2008-12-01  7:36 ` Ostendorf, Rainer
@ 2008-12-01 12:08   ` Cal Page
  2008-12-01 12:48   ` Geert Uytterhoeven
  1 sibling, 0 replies; 4+ messages in thread
From: Cal Page @ 2008-12-01 12:08 UTC (permalink / raw)
  To: Ostendorf, Rainer; +Cc: linux-mtd

You might check the locks in mtd. I noticed they were pretty low. It 
might make sense in your case to move them to mtdpart.c or even up into 
yaffs_mtdif2.c for the read/write/erase functions.

Cal Page

Ostendorf, Rainer wrote:
> Hi list,
>
> i'm still working on the problem described in my previous post:
>
> [...]
>   
>> During normal operation the system runs perfectly stable, but when i 
>> start two processes running parallel, writing huge amounts of data 
>> to the flash device, i get error 
>> messages from the JFFS2 filesystem:
>>
>> ...
>> argh. node added in wrong place
>> argh. node added in wrong place
>> ...
>>
>> This message repeats for about 15-20 times while copying parallel 2 
>> files of about 6MByte to the flash via Ethernet (SCP). 
>>     
> [...]
>
> As i can only reproduce the corruption of the JFFS2 FS during parallel access of at leat two processes writing to the flash, i assume that the cause for the corruption is some kind of race-condition. Have there been any known problems/bugs with race conditions, that could lead to an error like this? Would it probably help to upgrade to a newer kernel version? Any hint and help would be greatly appreciated.
>
> best regards,
> Rainer
>
>
> Benning Elektrotechnik und Elektronik GmbH & Co. KG Bocholt
> Handelsregister Coesfeld HRA-Nr. 4661
> Persönlich haftende Gesellschaft: Benning GmbH
> Handelsregister Coesfeld HRB-Nr. 7772
> Geschäftsführer: Th. Benning
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
>
>   

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Problems with JFFS2 FS during parallel write operations
  2008-12-01  7:36 ` Ostendorf, Rainer
  2008-12-01 12:08   ` Cal Page
@ 2008-12-01 12:48   ` Geert Uytterhoeven
  1 sibling, 0 replies; 4+ messages in thread
From: Geert Uytterhoeven @ 2008-12-01 12:48 UTC (permalink / raw)
  To: Ostendorf, Rainer; +Cc: linux-mtd

On Mon, 1 Dec 2008, Ostendorf, Rainer wrote:
> i'm still working on the problem described in my previous post:
> 
> [...]
> > During normal operation the system runs perfectly stable, but when i 
> > start two processes running parallel, writing huge amounts of data 
> > to the flash device, i get error 
> > messages from the JFFS2 filesystem:
> > 
> > ...
> > argh. node added in wrong place
> > argh. node added in wrong place
> > ...
> > 
> > This message repeats for about 15-20 times while copying parallel 2 
> > files of about 6MByte to the flash via Ethernet (SCP). 
> [...]
> 
> As i can only reproduce the corruption of the JFFS2 FS during parallel access of at leat two processes writing to the flash, i assume that the cause for the corruption is some kind of race-condition. Have there been any known problems/bugs with race conditions, that could lead to an error like this? Would it probably help to upgrade to a newer kernel version? Any hint and help would be greatly appreciated.

Do you use LZO compression? If yes, it could have been this one:

commit dc8a0843a435b2c0891e7eaea64faaf1ebec9b11
Author: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Date:   Wed Nov 5 23:21:16 2008 +0100

    [JFFS2] fix race condition in jffs2_lzo_compress()
    
    deflate_mutex protects the globals lzo_mem and lzo_compress_buf.  However,
    jffs2_lzo_compress() unlocks deflate_mutex _before_ it has copied out the
    compressed data from lzo_compress_buf.  Correct this by moving the mutex
    unlock after the copy.
    
    In addition, document what deflate_mutex actually protects.
    
    Cc: stable@kernel.org
    Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
    Acked-by: Richard Purdie <rpurdie@openedhand.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

diff --git a/fs/jffs2/compr_lzo.c b/fs/jffs2/compr_lzo.c
index 47b0457..90cb60d 100644
--- a/fs/jffs2/compr_lzo.c
+++ b/fs/jffs2/compr_lzo.c
@@ -19,7 +19,7 @@
 
 static void *lzo_mem;
 static void *lzo_compress_buf;
-static DEFINE_MUTEX(deflate_mutex);
+static DEFINE_MUTEX(deflate_mutex);	/* for lzo_mem and lzo_compress_buf */
 
 static void free_workspace(void)
 {
@@ -49,18 +49,21 @@ static int jffs2_lzo_compress(unsigned char *data_in, unsigned char *cpage_out,
 
 	mutex_lock(&deflate_mutex);
 	ret = lzo1x_1_compress(data_in, *sourcelen, lzo_compress_buf, &compress_size, lzo_mem);
-	mutex_unlock(&deflate_mutex);
-
 	if (ret != LZO_E_OK)
-		return -1;
+		goto fail;
 
 	if (compress_size > *dstlen)
-		return -1;
+		goto fail;
 
 	memcpy(cpage_out, lzo_compress_buf, compress_size);
-	*dstlen = compress_size;
+	mutex_unlock(&deflate_mutex);
 
+	*dstlen = compress_size;
 	return 0;
+
+ fail:
+	mutex_unlock(&deflate_mutex);
+	return -1;
 }
 
 static int jffs2_lzo_decompress(unsigned char *data_in, unsigned char *cpage_out,


With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-12-01 12:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-05  8:43 Problems with JFFS2 FS during parallel write operations Ostendorf, Rainer
2008-12-01  7:36 ` Ostendorf, Rainer
2008-12-01 12:08   ` Cal Page
2008-12-01 12:48   ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox