* JFFS2 Corruption. @ 2004-02-19 16:48 simon 2004-02-23 11:07 ` simon 0 siblings, 1 reply; 38+ messages in thread From: simon @ 2004-02-19 16:48 UTC (permalink / raw) To: linux-mtd I am having problems using JFFS2 filesystems on a NAND device. I have had these problems in the past an decided not to use this method for storing my data. I would really like to do this and I was wondering if anyone else had been able to run such a system reliably ? The NAND device is a 128MB SMC and I have downloaded the mtd code from CVS within the last week. I am using mtdpart to provide partitions but for the moment I am only using mtd1 as a root file system. To build this I perform the following steps. 1. Boot system via network and mount nfs root 2. eraseall /dev/mtd1 3. mount -t jffs2 /dev/mtdblock1 /smc 4. cd /smc 5. tar xvzf /rootfilesystem.tgz 6. unmount /smc I then reboot using the SMC as my root filesystem. As the system gets rebooted and incurrs more writes I get the following kinds of message Empty flash at 0x00469ffcb ends at 0x0046a000 or jffs2_scan_dirent_node(): Node CRC failed on node at 0x0046a7f0 read 0xffffffff calculated 0xdec8161b I have written in before about these messages and I understand that the first is of no concern but the second relates to write data which may not have been written to the SMC. The thing is I have had this happen on two completely different harware designs. On an X86 board and the other a PPC. I have checked and the OS umounts root as it shuts down. Also I cannot find fsck.jffs2 in the util directory. Does it exist ? In trying to debug this I noticed that the device was still reporting busy when nand_command was entered. I have put a line of code in to delay until the device is ready at the start of the routine. It is my understanding that if the device is busy when you try and select/deselect or send a command the outcome cannot be predicted. If this is not the case I would like to understand why. If it is how does any of the NAND code work ? Any help greatly appreciated Cheers Simon. __________________________ Simon Haynes - Baydel Phone : 44 (0) 1372 378811 Email : simon@baydel.com __________________________ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-19 16:48 JFFS2 Corruption simon @ 2004-02-23 11:07 ` simon 2004-02-24 9:48 ` simon 0 siblings, 1 reply; 38+ messages in thread From: simon @ 2004-02-23 11:07 UTC (permalink / raw) To: linux-mtd As I have been trying to use this as a root filesystem I checked made checks into the dismounting of the filesystem. It seems that the halt script remounts root read only. However I found that this does not cause mtdblock to flush it's cache. I have written a small program to perform an ioctl BLKFLSBUF and I call it after the system mounts root read only. This seems to help but it is not the whole story. I have also trawled the archives. There are some articles which list similar problems. It is proposed that these are related to unaligned memory access. Can anyone explain why this matters and how to prevent it. My current architecture is PPC. Many Thanks Simon. On 19 Feb 2004 at 16:48, simon@baydel.com wrote: > I am having problems using JFFS2 filesystems on a NAND device. I have > had these problems in the past an decided not to use this method for > storing my data. I would really like to do this and I was wondering if > anyone else had been able to run such a system reliably ? The NAND > device is a 128MB SMC and I have downloaded the mtd code from CVS > within the last week. > > I am using mtdpart to provide partitions but for the moment I am only > using mtd1 as a root file system. To build this I perform the > following steps. > > > 1. Boot system via network and mount nfs root > 2. eraseall /dev/mtd1 > 3. mount -t jffs2 /dev/mtdblock1 /smc > 4. cd /smc > 5. tar xvzf /rootfilesystem.tgz > 6. unmount /smc > > I then reboot using the SMC as my root filesystem. As the system gets > rebooted and incurrs more writes I get the following kinds of message > > Empty flash at 0x00469ffcb ends at 0x0046a000 > > or > > jffs2_scan_dirent_node(): Node CRC failed on node at 0x0046a7f0 > read 0xffffffff calculated 0xdec8161b > > I have written in before about these messages and I understand that > the first is of no concern but the second relates to write data which > may not have been written to the SMC. > > The thing is I have had this happen on two completely different > harware designs. On an X86 board and the other a PPC. > > I have checked and the OS umounts root as it shuts down. Also I cannot > find fsck.jffs2 in the util directory. Does it exist ? > > In trying to debug this I noticed that the device was still reporting > busy when nand_command was entered. I have put a line of code in to > delay until the device is ready at the start of the routine. It is my > understanding that if the device is busy when you try and > select/deselect or send a command the outcome cannot be predicted. If > this is not the case I would like to understand why. If it is how does > any of the NAND code work ? > > Any help greatly appreciated > > > Cheers Simon. > > > > > > > > __________________________ > > Simon Haynes - Baydel > Phone : 44 (0) 1372 378811 > Email : simon@baydel.com > __________________________ > > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ __________________________ Simon Haynes - Baydel Phone : 44 (0) 1372 378811 Email : simon@baydel.com __________________________ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-23 11:07 ` simon @ 2004-02-24 9:48 ` simon 2004-02-24 12:00 ` David Woodhouse 0 siblings, 1 reply; 38+ messages in thread From: simon @ 2004-02-24 9:48 UTC (permalink / raw) To: linux-mtd > On 19 Feb 2004 at 16:48, simon@baydel.com wrote: > > > I am having problems using JFFS2 filesystems on a NAND device. I > > have had these problems in the past an decided not to use this > > method for storing my data. I would really like to do this and I was > > wondering if anyone else had been able to run such a system reliably > > ? The NAND device is a 128MB SMC and I have downloaded the mtd code > > from CVS within the last week. > > > > I am using mtdpart to provide partitions but for the moment I am > > only using mtd1 as a root file system. To build this I perform the > > following steps. > > > > > > 1. Boot system via network and mount nfs root > > 2. eraseall /dev/mtd1 > > 3. mount -t jffs2 /dev/mtdblock1 /smc > > 4. cd /smc > > 5. tar xvzf /rootfilesystem.tgz > > 6. unmount /smc > > > > I then reboot using the SMC as my root filesystem. As the system > > gets rebooted and incurrs more writes I get the following kinds of > > message > > > > Empty flash at 0x00469ffcb ends at 0x0046a000 > > > > or > > > > jffs2_scan_dirent_node(): Node CRC failed on node at 0x0046a7f0 read > > 0xffffffff calculated 0xdec8161b > > > > I have written in before about these messages and I understand that > > the first is of no concern but the second relates to write data > > which may not have been written to the SMC. > > > > The thing is I have had this happen on two completely different > > harware designs. On an X86 board and the other a PPC. > > > > I have checked and the OS umounts root as it shuts down. Also I > > cannot find fsck.jffs2 in the util directory. Does it exist ? > > > > In trying to debug this I noticed that the device was still > > reporting busy when nand_command was entered. I have put a line of > > code in to delay until the device is ready at the start of the > > routine. It is my understanding that if the device is busy when you > > try and select/deselect or send a command the outcome cannot be > > predicted. If this is not the case I would like to understand why. > > If it is how does any of the NAND code work ? > > > > Any help greatly appreciated > > > > > > Cheers Simon. > > > > > > On 23 Feb 2004 at 11:07, simon@baydel.com wrote: > As I have been trying to use this as a root filesystem I checked made > checks into the dismounting of the filesystem. It seems that the halt > script remounts root read only. However I found that this does not > cause mtdblock to flush it's cache. I have written a small program to > perform an ioctl BLKFLSBUF and I call it after the system mounts root > read only. This seems to help but it is not the whole story. > > I have also trawled the archives. There are some articles which list > similar problems. It is proposed that these are related to unaligned > memory access. Can anyone explain why this matters and how to prevent > it. > > My current architecture is PPC. > > Many Thanks > > > Simon. > > Comitted the sin of posting a reply with the reply text first, sorry. I have only managed to get this to fail if the jffs2 filesystem is mounted as root. I do not seem to be able to get it to close and unmount the filesystem at shutdown. I guess the BLKFLSBUF I do only flushes the buffer that is created when I open the device and not the one that was created when the kernel opened the device. My current thoughts are to create a ro root and and additional rw. Is there a better way that I can do this ? Cheers Simon. __________________________ Simon Haynes - Baydel Phone : 44 (0) 1372 378811 Email : simon@baydel.com __________________________ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 9:48 ` simon @ 2004-02-24 12:00 ` David Woodhouse 2004-02-24 12:54 ` Simon Haynes 2004-02-24 13:04 ` Simon Haynes 0 siblings, 2 replies; 38+ messages in thread From: David Woodhouse @ 2004-02-24 12:00 UTC (permalink / raw) To: simon; +Cc: linux-mtd On Tue, 2004-02-24 at 09:48 +0000, simon@baydel.com wrote: > Comitted the sin of posting a reply with the reply text first, sorry. And this time you committed the sin of including _far_ more of the previous mail(s) than was necessary. But I'm not feeling cruel today so I'll not continue to ignore you :) > I have only managed to get this to fail if the jffs2 filesystem is mounted as root. I do > not seem to be able to get it to close and unmount the filesystem at shutdown. I > guess the BLKFLSBUF I do only flushes the buffer that is created when I open the > device and not the one that was created when the kernel opened the device. JFFS2 doesn't actually _use_ the mtdblock device. If you look closely at the code, you'll see we never read or write to/from it, we just use the minor number as an argument to get_mtd_device(). In fact, it's perfectly possible to use any _other_ device driver instead of the mtdblock device, as long as it has the major number which JFFS2 is looking for. I wonder if the rootfs-mounting is opening the _actual_ block device and doing some I/O, and that's later getting flushed, causing corruption. Although I can't comprehend why a failed attempt to mount, for example, ext2 would cause the mtdblock device to consider its buffer _dirty_ and try to write it back on close. What happens if you use the 'mtdblock_ro' device instead? That shares the major number, but doesn't share the caching. -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 12:00 ` David Woodhouse @ 2004-02-24 12:54 ` Simon Haynes 2004-02-24 13:04 ` Simon Haynes 1 sibling, 0 replies; 38+ messages in thread From: Simon Haynes @ 2004-02-24 12:54 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Tuesday 24 Feb 2004 12:00 pm, David Woodhouse wrote: > On Tue, 2004-02-24 at 09:48 +0000, simon@baydel.com wrote: > > Comitted the sin of posting a reply with the reply text first, sorry. > > And this time you committed the sin of including _far_ more of the > previous mail(s) than was necessary. But I'm not feeling cruel today so > I'll not continue to ignore you :) > > > I have only managed to get this to fail if the jffs2 filesystem is > > mounted as root. I do not seem to be able to get it to close and unmount Thanks for the reply I am really struggling with this. I appreciate that this stuff keeps you really busy and I will try to make it as easy for you as I can. I only seem to get this problem if I use the SMC as root. I have put a prink in mtdblock.c which tells me when the flush and release modules are called. After trawling thw archives I also set the debugging level to 1. Now I get the message mtdblock_open \n ok when the kernel mounts root. If I boot from the network, mount the SMC, build a JFFS2 filesystem,copy files, on umount I get a release message. I can then reboot via the network and make changes to the JFFS2 filesystem., many times, and all is ok. If I boot my system and I pass the root=/dev/mtdblock1 argument to the kernel it comes up fine on the SMC. This modifies mainly files in /var. If I then reboot the system I do not get any messages from the mtdblock flush or release routines. Next time the filesystem is mounted it is corrupt. I have looked int the util/MAKEDEV file and the readme but I don't know how to select the mtdblock_ro device. I take it what I am trying to do is possible and is suitable for a production environment ? Thanks again Simon ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 12:00 ` David Woodhouse 2004-02-24 12:54 ` Simon Haynes @ 2004-02-24 13:04 ` Simon Haynes 2004-02-24 13:40 ` Simon Haynes 1 sibling, 1 reply; 38+ messages in thread From: Simon Haynes @ 2004-02-24 13:04 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Tuesday 24 Feb 2004 12:54 pm, Simon Haynes wrote: > On Tuesday 24 Feb 2004 12:00 pm, David Woodhouse wrote: > > On Tue, 2004-02-24 at 09:48 +0000, simon@baydel.com wrote: > > > Comitted the sin of posting a reply with the reply text first, sorry. > > > > And this time you committed the sin of including _far_ more of the > > previous mail(s) than was necessary. But I'm not feeling cruel today so > > I'll not continue to ignore you :) > > > > > I have only managed to get this to fail if the jffs2 filesystem is > > > mounted as root. I do not seem to be able to get it to close and > > > unmount > > Thanks for the reply I am really struggling with this. I appreciate that > this stuff keeps you really busy and I will try to make it as easy for you > as I can. > Ah, I see the ro stuff is a kernel change. I will give it a go. Cheers Simon. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 13:04 ` Simon Haynes @ 2004-02-24 13:40 ` Simon Haynes 2004-02-24 14:22 ` David Woodhouse 0 siblings, 1 reply; 38+ messages in thread From: Simon Haynes @ 2004-02-24 13:40 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Tuesday 24 Feb 2004 1:04 pm, Simon Haynes wrote: > On Tuesday 24 Feb 2004 12:54 pm, Simon Haynes wrote: > > On Tuesday 24 Feb 2004 12:00 pm, David Woodhouse wrote: > > > On Tue, 2004-02-24 at 09:48 +0000, simon@baydel.com wrote: > > > > Comitted the sin of posting a reply with the reply text first, sorry. > > > > > > And this time you committed the sin of including _far_ more of the > > > previous mail(s) than was necessary. But I'm not feeling cruel today so > > > I'll not continue to ignore you :) > > > > > > > I have only managed to get this to fail if the jffs2 filesystem is > > > > mounted as root. I do not seem to be able to get it to close and > > > > unmount > > > > Thanks for the reply I am really struggling with this. I appreciate that > > this stuff keeps you really busy and I will try to make it as easy for > > you as I can. > > Ah, I see the ro stuff is a kernel change. I will give it a go. > I don't understand. I changed the kernel to use the read only device. I expected this to work rw without caching but it does not. I have already tried mounting root ro via the caching mtd block and although my system does not fully start I can't see how I would get corruption ? I have looked at the mtdblock_ro code and it seems you allow writing if flag certain bits are set you allow writing. Do I need to set these somewhere ? Cheers Simon. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 13:40 ` Simon Haynes @ 2004-02-24 14:22 ` David Woodhouse 2004-02-24 14:25 ` Simon Haynes 0 siblings, 1 reply; 38+ messages in thread From: David Woodhouse @ 2004-02-24 14:22 UTC (permalink / raw) To: simon; +Cc: linux-mtd On Tue, 2004-02-24 at 13:40 +0000, Simon Haynes wrote: > I don't understand. I changed the kernel to use the read only device. I > expected this to work rw without caching but it does not. You can't use flash RW without caching. It _has_ to read/modify/erase/writeback to write to flash. But for JFFS2, you _don't_ use flash RW through mtdblock. It operates directly and should work fine. What failure mode do you observe? > I have already > tried mounting root ro via the caching mtd block and although my system does > not fully start I can't see how I would get corruption ? As I said -- I don't know. Maybe there's a bug which causes the mtdblock device to consider its cache dirty, and write it out to the detriment of the real data which JFFS2 has already put there on the flash. That's why I was asking you to test mtdblock_ro. > I have looked at the mtdblock_ro code and it seems you allow writing if flag > certain bits are set you allow writing. Do I need to set these somewhere ? No. Leave them turned off. We don't actually want to write via the mtdblock device -- that's the whole _point_ in this experiment. -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 14:22 ` David Woodhouse @ 2004-02-24 14:25 ` Simon Haynes 2004-02-24 14:56 ` David Woodhouse 0 siblings, 1 reply; 38+ messages in thread From: Simon Haynes @ 2004-02-24 14:25 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Tuesday 24 Feb 2004 2:22 pm, David Woodhouse wrote: > On Tue, 2004-02-24 at 13:40 +0000, Simon Haynes wrote: > > I don't understand. I changed the kernel to use the read only device. I > > expected this to work rw without caching but it does not. > > You can't use flash RW without caching. It _has_ to > read/modify/erase/writeback to write to flash. > > But for JFFS2, you _don't_ use flash RW through mtdblock. It operates > directly and should work fine. What failure mode do you observe? The problem is when I mount the erased partition using mount -t jffs2 /dev/mtdblock1 /smc /smc is mounted read only and I cannot write to it. surely this is the same as not remounting root read write ? Cheers Simon. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 14:25 ` Simon Haynes @ 2004-02-24 14:56 ` David Woodhouse 2004-02-24 14:58 ` Simon Haynes 0 siblings, 1 reply; 38+ messages in thread From: David Woodhouse @ 2004-02-24 14:56 UTC (permalink / raw) To: simon; +Cc: linux-mtd On Tue, 2004-02-24 at 14:25 +0000, Simon Haynes wrote: > /smc is mounted read only and I cannot write to it. > > surely this is the same as not remounting root read write ? Hmmm. What happens if you mount -oremount,rw? -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 14:56 ` David Woodhouse @ 2004-02-24 14:58 ` Simon Haynes 2004-02-24 15:35 ` David Woodhouse 0 siblings, 1 reply; 38+ messages in thread From: Simon Haynes @ 2004-02-24 14:58 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Tuesday 24 Feb 2004 2:56 pm, David Woodhouse wrote: > On Tue, 2004-02-24 at 14:25 +0000, Simon Haynes wrote: > > /smc is mounted read only and I cannot write to it. > > > > surely this is the same as not remounting root read write ? > > Hmmm. What happens if you mount -oremount,rw? I have just created the fs with a rw mtdblock kernel and rebooted with a ro mtdblock. This attempts to remount rw but it cannot. I have tried booting from the network with a ro mtdblock kernel and performing the following operation. -bash-2.05b# mount -t jffs2 /dev/mtdblock2 /smc mount: block device /dev/mtdblock2 is write-protected, mounting read-only -bash-2.05b# mount -oremount,rw /smc mount: block device /dev/mtdblock2 is write-protected, mounting read-only -bash-2.05b# cat /proc/mounts rootfs / rootfs rw 0 0 /dev/root / nfs rw,v2,rsize=4096,wsize=4096,hard,udp,nolock,addr=192.9.200.22 0 0 /proc /proc proc rw 0 0 /dev/mtdblock2 /smc jffs2 ro 0 0 -bash-2.05b# I guess this is not how things are supposed to work. Are there any kernel settings I should check ? Cheers Simon. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 14:58 ` Simon Haynes @ 2004-02-24 15:35 ` David Woodhouse 2004-02-24 15:47 ` Simon Haynes 0 siblings, 1 reply; 38+ messages in thread From: David Woodhouse @ 2004-02-24 15:35 UTC (permalink / raw) To: simon; +Cc: linux-mtd On Tue, 2004-02-24 at 14:58 +0000, Simon Haynes wrote: > I have just created the fs with a rw mtdblock kernel and rebooted with a ro > mtdblock. This attempts to remount rw but it cannot. Hmm. OK, hack jffs2 to use the same major number as ramdisk, then try to mount /dev/ram2 instead :) -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 15:35 ` David Woodhouse @ 2004-02-24 15:47 ` Simon Haynes 2004-02-24 16:14 ` David Woodhouse 0 siblings, 1 reply; 38+ messages in thread From: Simon Haynes @ 2004-02-24 15:47 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Tuesday 24 Feb 2004 3:35 pm, David Woodhouse wrote: > On Tue, 2004-02-24 at 14:58 +0000, Simon Haynes wrote: > > I have just created the fs with a rw mtdblock kernel and rebooted with a > > ro mtdblock. This attempts to remount rw but it cannot. > > Hmm. OK, hack jffs2 to use the same major number as ramdisk, then try to > mount /dev/ram2 instead :) I hacked fs/jffs2/super.c and fs/jffs2/super_v24.c. Only the v24 got compiled. I guess this is because my kernel is 2.4.21. I had to also include ramdisk in the kernel as it was previously missing ? Does this effect MTD ? This mounted rw. -bash-2.05b# ls -l /dev/ram2 brw-rw---- 1 root disk 1, 2 Jan 31 2003 /dev/ram2 -bash-2.05b# mount -t jffs2 /dev/ram2 /smc -bash-2.05b# cat /proc/mounts rootfs / rootfs rw 0 0 /dev/root / nfs rw,v2,rsize=4096,wsize=4096,hard,udp,nolock,addr=192.9.200.22 0 0 /proc /proc proc rw 0 0 /dev/ram2 /smc jffs2 rw 0 0 -bash-2.05b# ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 15:47 ` Simon Haynes @ 2004-02-24 16:14 ` David Woodhouse 2004-02-24 16:17 ` Simon Haynes 0 siblings, 1 reply; 38+ messages in thread From: David Woodhouse @ 2004-02-24 16:14 UTC (permalink / raw) To: simon; +Cc: linux-mtd On Tue, 2004-02-24 at 15:47 +0000, Simon Haynes wrote: > I hacked fs/jffs2/super.c and fs/jffs2/super_v24.c. Only the v24 got > compiled. I guess this is because my kernel is 2.4.21. I had to also include > ramdisk in the kernel as it was previously missing ? Does this effect MTD ? No, it neither effects nor affects the MTD drivers. > This mounted rw. OK... now can you reproduce the same _problem_ that way? -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 16:14 ` David Woodhouse @ 2004-02-24 16:17 ` Simon Haynes 2004-02-24 16:51 ` David Woodhouse 2004-02-24 16:55 ` David Woodhouse 0 siblings, 2 replies; 38+ messages in thread From: Simon Haynes @ 2004-02-24 16:17 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Tuesday 24 Feb 2004 4:14 pm, David Woodhouse wrote: > On Tue, 2004-02-24 at 15:47 +0000, Simon Haynes wrote: > > I hacked fs/jffs2/super.c and fs/jffs2/super_v24.c. Only the v24 got > > compiled. I guess this is because my kernel is 2.4.21. I had to also > > include ramdisk in the kernel as it was previously missing ? Does this > > effect MTD ? > > No, it neither effects nor affects the MTD drivers. > > > This mounted rw. > > OK... now can you reproduce the same _problem_ that way? My problem only occurs across reboots. As the ramdisk is not persistent storage how can I try and reproduce the problem. I have tried to reproduce the problem on mtd devices without rebooting and I have not managed to to this. Cheers SImon. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 16:17 ` Simon Haynes @ 2004-02-24 16:51 ` David Woodhouse 2004-02-24 17:05 ` Simon Haynes 2004-02-24 17:12 ` Simon Haynes 2004-02-24 16:55 ` David Woodhouse 1 sibling, 2 replies; 38+ messages in thread From: David Woodhouse @ 2004-02-24 16:51 UTC (permalink / raw) To: simon; +Cc: linux-mtd On Tue, 2004-02-24 at 16:17 +0000, Simon Haynes wrote: > My problem only occurs across reboots. As the ramdisk is not persistent > storage how can I try and reproduce the problem. I have tried to reproduce > the problem on mtd devices without rebooting and I have not managed to to > this. The ramdisk isn't _involved_ as a form of storage. JFFS2 doesn't _use_ the block device for _anything_ except for looking at its minor number to decide which actual MTD device to use. Hack JFFS2 to always get_mtd_device number 2 and boot with root=/dev/ram0 -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 16:51 ` David Woodhouse @ 2004-02-24 17:05 ` Simon Haynes 2004-02-24 18:05 ` David Woodhouse 2004-02-24 17:12 ` Simon Haynes 1 sibling, 1 reply; 38+ messages in thread From: Simon Haynes @ 2004-02-24 17:05 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Tuesday 24 Feb 2004 4:51 pm, David Woodhouse wrote: > Hack JFFS2 to always get_mtd_device number 2 and boot with > root=/dev/ram0 Sorry again for being so stupid. I know you have only told me half a dozen times that JFFS2 does not use the block read and write routines. I have done this and rebooted several times. Normally I would expect a CRC error within a couple or reboots. As yet I have no CRC errors but I do get a large number of "Empty Flash" messages, about 15 per reboot. I do not see any of these if I boot via the network and mount and dismount the smc ? I will keep rebooting over night and hack the kernel to panic on a CRC error. Would you recommend changing the printk for the "Empty Flash" messages to a different level and using the ramdisk as a permanent solution ? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 17:05 ` Simon Haynes @ 2004-02-24 18:05 ` David Woodhouse 2004-02-24 18:04 ` Simon Haynes 2004-02-25 9:49 ` simon 0 siblings, 2 replies; 38+ messages in thread From: David Woodhouse @ 2004-02-24 18:05 UTC (permalink / raw) To: simon; +Cc: linux-mtd On Tue, 2004-02-24 at 17:05 +0000, Simon Haynes wrote: > Would you recommend changing the printk for the "Empty Flash" messages to a > different level and using the ramdisk as a permanent solution ? Changing the printk level for the 'Empty Flash' messages does seem appropriate -- or preferably finding a way to eliminate the ones which we don't want. Are you mounting the fs read-only before rebooting? If not, the occasional CRC failure is acceptable. You get those with an unclean restart. It doesn't indicate data loss; it indicates that one particular log entry which you were writing _while_ you rebooted was lost. That was not data which userspace thought was already on the medium. The problem with mtdblock is interesting. Can you make it BUG() when it dirties its cache and put it back to how it was, using mtdblock? -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 18:05 ` David Woodhouse @ 2004-02-24 18:04 ` Simon Haynes 2004-02-25 9:49 ` simon 1 sibling, 0 replies; 38+ messages in thread From: Simon Haynes @ 2004-02-24 18:04 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Tuesday 24 Feb 2004 6:05 pm, David Woodhouse wrote: > On Tue, 2004-02-24 at 17:05 +0000, Simon Haynes wrote: > > Would you recommend changing the printk for the "Empty Flash" messages to > > a different level and using the ramdisk as a permanent solution ? > > Changing the printk level for the 'Empty Flash' messages does seem > appropriate -- or preferably finding a way to eliminate the ones which > we don't want. Are you mounting the fs read-only before rebooting? If > not, the occasional CRC failure is acceptable. You get those with an > unclean restart. It doesn't indicate data loss; it indicates that one > particular log entry which you were writing _while_ you rebooted was > lost. That was not data which userspace thought was already on the > medium. > > The problem with mtdblock is interesting. Can you make it BUG() when it > dirties its cache and put it back to how it was, using mtdblock? I do remount the filesystem read only as part of a shutdown. I also cat /proc/mounts so that I can check it has happened. So the filesystem should be clean. After this halt is called. I think this tries to write to utmp. If the kernel already has the device open for write will this be allowed ? I guess it is strange that the read only mtdblock device prevents write access via jffs2. I don't know if the cache in mtdblock is ever being used but I can certainly put a BUG() in there. Cheers Simon. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 18:05 ` David Woodhouse 2004-02-24 18:04 ` Simon Haynes @ 2004-02-25 9:49 ` simon 2004-02-25 10:25 ` David Woodhouse 1 sibling, 1 reply; 38+ messages in thread From: simon @ 2004-02-25 9:49 UTC (permalink / raw) To: dwmw2; +Cc: linux-mtd On 24 Feb 2004 at 18:05, David Woodhouse wrote: > On Tue, 2004-02-24 at 17:05 +0000, Simon Haynes wrote: > > Would you recommend changing the printk for the "Empty Flash" > > messages to a different level and using the ramdisk as a permanent > > solution ? > > Changing the printk level for the 'Empty Flash' messages does seem > appropriate -- or preferably finding a way to eliminate the ones which Yesteday I managed to reproduce the problem using /dev/ram. The first message I noticed was jffs2_get_inode_nodes(): Data CRC failed on node at 0x00050748: Read 0x0b5da171, calculated 0xb1b1dbb This happened some time after the kernel mounted root. On following reboots this message was not displayed but I did get one jffs2_scan_dirent_node(): Node CRC failed on node at 0x0046a7f0 read 0xffffffff calculated 0xdec8161b message. As I do mount read only on shutdown I assume this is corruption ? Cheers Simon. __________________________ Simon Haynes - Baydel Phone : 44 (0) 1372 378811 Email : simon@baydel.com __________________________ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-25 9:49 ` simon @ 2004-02-25 10:25 ` David Woodhouse 2004-02-26 11:08 ` Simon Haynes 0 siblings, 1 reply; 38+ messages in thread From: David Woodhouse @ 2004-02-25 10:25 UTC (permalink / raw) To: simon; +Cc: linux-mtd On Wed, 2004-02-25 at 09:49 +0000, simon@baydel.com wrote: > As I do mount read only on shutdown I assume this is corruption ? Maybe. I doubt it's _harmful_ but I am very interested. Any chance you can run with CONFIG_JFFS2_FS_DEBUG=1 and log _all_ the messages over a serial console, then see what it wrote at the offending address when you get a CRC error? -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-25 10:25 ` David Woodhouse @ 2004-02-26 11:08 ` Simon Haynes 2004-02-26 11:55 ` David Woodhouse 2004-03-03 15:31 ` David Woodhouse 0 siblings, 2 replies; 38+ messages in thread From: Simon Haynes @ 2004-02-26 11:08 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On 25 Feb 2004 at 10:25, David Woodhouse wrote: > On Wed, 2004-02-25 at 09:49 +0000, simon@baydel.com wrote: > > As I do mount read only on shutdown I assume this is corruption ? > > Maybe. I doubt it's _harmful_ but I am very interested. > This has taken some time to produce. I have a 40Mb logfile which thankfully compresses to 4Mb. To generate the log I booted the system via the network with JFFS2 patched to use /dev/ram MAJOR, but no JFFS2 debug. I erased the SMC and made a clean JFFS2 filesystem. I then copied all of my root files. I mounted and umounted this a few times and each time I created and deleted a few files. I did not get one error. I then rebooted using a similar kernel with CONFIG_JFFS2_FS_DEBUG=1 and passed arguments "root=/dev/ram1 debug". It took about 6 hours before I could log in. I then halted the system. On the first reboot, fortunately, I did get a CRC error but I cannot find where this was previously written. The node is 0x000303f0. I also observe the Empty flash XXXX ends at XXX. Which do not appear before the filesystem is used as rootfs and restarted. Beyond that I don't really know what I am looking for in the log. I can mail it to you personally but as I said it's 4Mb compressed. Cheers Simon. __________________________ Simon Haynes - Baydel Phone : 44 (0) 1372 378811 Email : simon@baydel.com __________________________ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-26 11:08 ` Simon Haynes @ 2004-02-26 11:55 ` David Woodhouse 2004-03-03 15:31 ` David Woodhouse 1 sibling, 0 replies; 38+ messages in thread From: David Woodhouse @ 2004-02-26 11:55 UTC (permalink / raw) To: Simon Haynes; +Cc: linux-mtd On Thu, 2004-02-26 at 11:08 +0000, Simon Haynes wrote: > Beyond that I don't really know what I am looking for in the log. I can mail it to you > personally but as I said it's 4Mb compressed. Lemme see it. Although if there's a node at 0x000303f0 and no existence of 000303f0 in the logfile, your logfile is missing bits. Also take a dump of the offending file system. -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-26 11:08 ` Simon Haynes 2004-02-26 11:55 ` David Woodhouse @ 2004-03-03 15:31 ` David Woodhouse 2004-03-08 15:10 ` Simon Haynes 1 sibling, 1 reply; 38+ messages in thread From: David Woodhouse @ 2004-03-03 15:31 UTC (permalink / raw) To: Simon Haynes; +Cc: linux-mtd On Thu, 2004-02-26 at 11:08 +0000, Simon Haynes wrote: > Beyond that I don't really know what I am looking for in the log. I can mail it to you > personally but as I said it's 4Mb compressed. (for the benefit of the peanut gallery...) The CRC error was harmless. It was a node being garbage-collected, which we didn't bother to finish writing because it contained only a redundant copy of data which existed elsewhere on the flash already. We could try to be slightly quieter about such things, but then we might actually miss something which _is_ a problem. Better to be concerned when there isn't a problem, than to be blissfully unaware when there _is_ one. Perhaps. -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-03-03 15:31 ` David Woodhouse @ 2004-03-08 15:10 ` Simon Haynes 2004-03-09 15:33 ` Simon Haynes 0 siblings, 1 reply; 38+ messages in thread From: Simon Haynes @ 2004-03-08 15:10 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Wednesday 03 Mar 2004 3:31 pm, David Woodhouse wrote: > On Thu, 2004-02-26 at 11:08 +0000, Simon Haynes wrote: > > Beyond that I don't really know what I am looking for in the log. I can > > We could try to be slightly quieter about such things, but then we might > actually miss something which _is_ a problem. Better to be concerned > when there isn't a problem, than to be blissfully unaware when there > _is_ one. Perhaps. As you suggested I have changed the JFFS2 remount code to flush wbuf when the filesystem is mounted read only. As I said via IRC I have performed tens of reboots and I have not seen any CRC messages. Now on occasions, when the kernel is mounting JFFS2 as root on NAND I get. NAND device: Manufacturer ID: 0xec, Chip ID: 0x79 (Samsung NAND 128MiB 3,3V) Creating 3 MTD partitions on "NAND 128MiB 3,3V": 0x00000000-0x01000000 : "Boot / config partition" mtd: Giving out device 0 to Boot / config partition 0x01000000-0x05000000 : "JFFS2 Root Filesystem partition" mtd: Giving out device 1 to JFFS2 Root Filesystem partition 0x05000000-0x08000000 : "Write Cache Backup partition" mtd: Giving out device 2 to Write Cache Backup partition NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP IP: routing cache hash table of 512 buckets, 4Kbytes TCP: Hash tables configured (established 4096 bind 4096) NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. jffs2: Erase block size too small (16KiB). Using virtual blocks size (32KiB) instead ofs 0x00c00400 has already been seen. Skipping jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00408: 0x273c instead jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00420: 0x404d instead jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00424: 0x404d instead jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00428: 0x404d instead jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c0043c: 0x0f33 instead jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00440: 0x88f8 instead jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00444: 0x4d61 instead jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00448: 0x2039 instead jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c0044c: 0x343a instead jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00450: 0x3a32 instead Further such events for this erase block will not be printed Can you suggest what might be going on ? Cheers Simon. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-03-08 15:10 ` Simon Haynes @ 2004-03-09 15:33 ` Simon Haynes 2004-03-16 16:14 ` David Woodhouse 0 siblings, 1 reply; 38+ messages in thread From: Simon Haynes @ 2004-03-09 15:33 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Monday 08 Mar 2004 3:10 pm, Simon Haynes wrote: > On Wednesday 03 Mar 2004 3:31 pm, David Woodhouse wrote: > > On Thu, 2004-02-26 at 11:08 +0000, Simon Haynes wrote: > > > Beyond that I don't really know what I am looking for in the log. I can > ofs 0x00c00400 has already been seen. Skipping > jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00408: > 0x273c instead Since lunch interrupted our converstation yesterday I have tried a few things. The SMC partition I was using was 64 Mb and contained a bad block. I modified the kernel to make this partition 48Mb. I booted root over nfs and tried to rebuild root on the SMC. I can no longer mount the original SMC so I tried a new one which worked OK. -bash-2.05b# /mtd/eraseall /dev/mtd1 > /dev/null nand_erase: attempt to erase a bad block at page 0x0001ee60 /mtd/eraseall: /dev/mtd1: MTD Erase failure: Input/output error -bash-2.05b# mount -t jffs2 /dev/ram1 /smc Cowardly refusing to erase blocks on filesystem with no valid JFFS2 nodes empty_blocks 2047, bad_blocks 0, c->nr_blocks 2048 mount: wrong fs type, bad option, bad superblock on /dev/ram1, or too many mounted file systems -bash-2.05b# /mtd/eraseall /dev/mtd1 > /dev/null -bash-2.05b# mount -t jffs2 /dev/ram1 /smc -bash-2.05b# After 5 reboots the new SMC gave this magic bitmask failure. jffs2: Erase block size too small (16KiB). Using virtual blocks size (32KiB) instead ofs 0x000a8400 has already been seen. Skipping, jeb 0xa8000, sector size 0x8000 saved ofs 0x000a8000, previous 0xa7fff, buf_len 0x7c00, scanned 0x0 jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a8408: 0x273c instead, 0x0 jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a841c: 0x000b instead, 0x0 jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a8420: 0x404d . . . It would appear that the first buffer location does not contain 0xFFFFFFFF and cleanmarkerfound is set but I don't know what this means to JFFS2. I then switched the partition back to 64Mb, set the kernel to use the read only block device and hacked the MAJOR number for JFFS2 to be the same as /dev/ram. I have rebooted the system at least 20 times and as yet I have not seen any errors. I am unsure as where to go next besides trying more reboots. Any ideas ? Cheers Simon. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-03-09 15:33 ` Simon Haynes @ 2004-03-16 16:14 ` David Woodhouse 2004-03-19 10:37 ` Simon Haynes 0 siblings, 1 reply; 38+ messages in thread From: David Woodhouse @ 2004-03-16 16:14 UTC (permalink / raw) To: simon; +Cc: linux-mtd On Tue, 2004-03-09 at 15:33 +0000, Simon Haynes wrote: > -bash-2.05b# mount -t jffs2 /dev/ram1 /smc > Cowardly refusing to erase blocks on filesystem with no valid JFFS2 nodes > empty_blocks 2047, bad_blocks 0, c->nr_blocks 2048 > mount: wrong fs type, bad option, bad superblock on /dev/ram1, > or too many mounted file systems This I believe we put down to user error? > After 5 reboots the new SMC gave this magic bitmask failure. > > jffs2: Erase block size too small (16KiB). Using virtual blocks size (32KiB) > instead > ofs 0x000a8400 has already been seen. Skipping, jeb 0xa8000, sector size > 0x8000 > saved ofs 0x000a8000, previous 0xa7fff, buf_len 0x7c00, scanned 0x0 > jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a8408: 0x273c > instead, 0x0 That was a bug in the scanning code. Should be fixed in v1.58 of scan.c in CVS. Please could you try that and let me know if it works? -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-03-16 16:14 ` David Woodhouse @ 2004-03-19 10:37 ` Simon Haynes 2004-03-19 11:11 ` David Woodhouse 0 siblings, 1 reply; 38+ messages in thread From: Simon Haynes @ 2004-03-19 10:37 UTC (permalink / raw) To: David Woodhouse, linux-mtd; +Cc: linux-mtd On 16 Mar 2004 at 16:14, David Woodhouse wrote: > On Tue, 2004-03-09 at 15:33 +0000, Simon Haynes wrote: > > -bash-2.05b# mount -t jffs2 /dev/ram1 /smc > > Cowardly refusing to erase blocks on filesystem with no valid JFFS2 > > This I believe we put down to user error? Yes, this was a problem I introduced when trying to track down why I got the Magic bitmask messages. > > > After 5 reboots the new SMC gave this magic bitmask failure. > > > > jffs2: Erase block size too small (16KiB). Using virtual blocks size > > (32KiB) instead ofs 0x000a8400 has already been seen. Skipping, jeb > > That was a bug in the scanning code. Should be fixed in v1.58 of > scan.c in CVS. Please could you try that and let me know if it works? > I have not had chance a yet to download scan from CVS. However I did make the simple change you suggested and everything worked fine. Thankyou, Thankyou, Thankyou Simom. __________________________ Simon Haynes - Baydel Phone : 44 (0) 1372 378811 Email : simon@baydel.com __________________________ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-03-19 10:37 ` Simon Haynes @ 2004-03-19 11:11 ` David Woodhouse 0 siblings, 0 replies; 38+ messages in thread From: David Woodhouse @ 2004-03-19 11:11 UTC (permalink / raw) To: Simon Haynes; +Cc: linux-mtd On Fri, 2004-03-19 at 10:37 +0000, Simon Haynes wrote: > > That was a bug in the scanning code. Should be fixed in v1.58 of > > scan.c in CVS. Please could you try that and let me know if it works? > > > I have not had chance a yet to download scan from CVS. However I did make the > simple change you suggested and everything worked fine. What I committed was slightly more complicated -- and also broken. Thomas fixed it though. :) -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 16:51 ` David Woodhouse 2004-02-24 17:05 ` Simon Haynes @ 2004-02-24 17:12 ` Simon Haynes 1 sibling, 0 replies; 38+ messages in thread From: Simon Haynes @ 2004-02-24 17:12 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Tuesday 24 Feb 2004 5:05 pm, Simon Haynes wrote: > On Tuesday 24 Feb 2004 4:51 pm, David Woodhouse wrote: > > Hack JFFS2 to always get_mtd_device number 2 and boot with > > root=/dev/ram0 > > Sorry again for being so stupid. I know you have only told me half a dozen > times that JFFS2 does not use the block read and write routines. > > I have done this and rebooted several times. Normally I would expect a CRC > error within a couple or reboots. As yet I have no CRC errors but I do get > a large number of "Empty Flash" messages, about 15 per reboot. I do not see > any of these if I boot via the network and mount and dismount the smc ? > > I will keep rebooting over night and hack the kernel to panic on a CRC > error. > > Would you recommend changing the printk for the "Empty Flash" messages to a > different level and using the ramdisk as a permanent solution ? After several more reboots I got the message jffs2_get_inode_nodes(): Data CRC failed on node at 0x00050748: Read 0x0b5da171, calculated 0xb1b1dbb This actually came out after I had logged in ? I will also look into changing my list to irc.freenode.net. Many Thanks Simon. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption. 2004-02-24 16:17 ` Simon Haynes 2004-02-24 16:51 ` David Woodhouse @ 2004-02-24 16:55 ` David Woodhouse 1 sibling, 0 replies; 38+ messages in thread From: David Woodhouse @ 2004-02-24 16:55 UTC (permalink / raw) To: simon; +Cc: linux-mtd Btw if you join #mtd on irc.freenode.net you may experience lower latency in getting responses. -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* JFFS2 corruption @ 2004-02-05 13:15 Florian Schirmer 2004-02-08 10:37 ` David Woodhouse 2004-02-08 11:28 ` David Woodhouse 0 siblings, 2 replies; 38+ messages in thread From: Florian Schirmer @ 2004-02-05 13:15 UTC (permalink / raw) To: linux-mtd Hi, i'm having a strange problem with a JFFS2 filesystem: I always get messages about incorrect CRC values while mounting the filesystem. What i did: <create a file which contains nothing but lots of 0x42 called dummy.bin> eraseall -j /dev/mtdblock/3 mount -t jffs2 /dev/mtdblock /mnt cp /dummy.bin /mnt/dummy1.bin cp /dummy.bin /mnt/dummy2.bin cp /dummy.bin /mnt/dummy3.bin cp /dummy.bin /mnt/dummy4.bin sync cat /dev/mtdblock/3 > /dump1.raw umount /mnt cat /dev/mtdblock/3 > /dump2.raw mount -t jffs2 /dev/mtdblock /mnt <Multiple messages about invalid JFFS2 CRC values> cat /dev/mtdblock/3 > /dump3.raw cmp -l /dump1.raw /dump2.raw => files are the same cmp -l /dump2.raw /dump3.raw => Multiple differences. Did some addition tests: umount /mnt cat /dev/mtdblock/3 > /dump2.raw mount -t jffs2 /dev/mtdblock /mnt Mounts just fine. But there are huge chunks of 0x00 in my dummy files on the jffs2 partition. Any ideas on how to track down this problem? Basic MTD/flash stuff seems to work just fine. I can work with a cramfs image on this partition without any errors. So i assume something is going wrong in the JFFS2 code. Specs: Build Host: x86 Target: MIPS (LE) Flash: NOR (AMD) MTD/JFFS2 Codebase: current CVS Thanks, Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption 2004-02-05 13:15 JFFS2 corruption Florian Schirmer @ 2004-02-08 10:37 ` David Woodhouse 2004-02-08 11:38 ` Florian Schirmer 2004-02-08 11:28 ` David Woodhouse 1 sibling, 1 reply; 38+ messages in thread From: David Woodhouse @ 2004-02-08 10:37 UTC (permalink / raw) To: Florian Schirmer; +Cc: linux-mtd On Thu, 2004-02-05 at 14:15 +0100, Florian Schirmer wrote: > Hi, > > i'm having a strange problem with a JFFS2 filesystem: I always get messages > about incorrect CRC values while mounting the filesystem. What i did: Please set CONFIG_JFFS2_FS_DEBUG=1, then reproduce this with full logging -- you'll need to do it over a serial console to make sure you catch everything. -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption 2004-02-08 10:37 ` David Woodhouse @ 2004-02-08 11:38 ` Florian Schirmer 2004-02-08 11:53 ` David Woodhouse 0 siblings, 1 reply; 38+ messages in thread From: Florian Schirmer @ 2004-02-08 11:38 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd Hi, > > i'm having a strange problem with a JFFS2 filesystem: I always get > > messages about incorrect CRC values while mounting the filesystem. What i > > did: > > Please set CONFIG_JFFS2_FS_DEBUG=1, then reproduce this with full > logging -- you'll need to do it over a serial console to make sure you > catch everything. Thanks for your suggestion. I'm a little bit further finding this problem. It turns out to be a cmdset2 issue and not a jffs2 problem. I'm still digging what is going on. What i've got so far is: jffs2_read_flash returns bogous data. In the case of a bad crc the last odd byte is wrong. Looks like a byte access instead of a word access. (Buswidth of the flash: 2) If you look at this: <7>Node at 000bfbd4 (0) is a data node <7>version 269, highest_version now 274 <4>crc'ing unpointed <4>Node data dump (66 bytes) <4>0x0000: 0x78 (0x78) <4>0x0001: 0x5E (0x5E) <4>0x0002: 0x72 (0x72) <4>0x0003: 0x72 (0x72) <4>0x0004: 0x1A (0x1A) <4>0x0005: 0x05 (0x05) <4>0x0006: 0xA3 (0xA3) <4>0x0007: 0x21 (0x21) <4>0x0008: 0x30 (0x30) <4>0x0009: 0x1A (0x1A) <4>0x000A: 0x02 (0x02) <4>0x000B: 0xC3 (0xC3) <4>0x000C: 0x31 (0x31) <4>0x000D: 0x04 (0x04) <4>0x000E: 0x00 (0x00) <4>0x000F: 0x02 (0x02) <4>0x0010: 0x68 (0x68) <4>0x0011: 0xD4 (0xD4) <4>0x0012: 0x4F (0x4F) <4>0x0013: 0xA3 (0xA3) <4>0x0014: 0x21 (0x21) <4>0x0015: 0x30 (0x30) <4>0x0016: 0x1A (0x1A) <4>0x0017: 0x02 (0x02) <4>0x0018: 0xC3 (0xC3) <4>0x0019: 0x23 (0x23) <4>0x001A: 0x04 (0x04) <4>0x001B: 0x00 (0x00) <4>0x001C: 0x02 (0x02) <4>0x001D: 0x68 (0x68) <4>0x001E: 0xD4 (0xD4) <4>0x001F: 0x17 (0x17) <4>0x0020: 0xA3 (0xA3) <4>0x0021: 0x21 (0x21) <4>0x0022: 0x30 (0x30) <4>0x0023: 0x1A (0x1A) <4>0x0024: 0x02 (0x02) <4>0x0025: 0x43 (0x43) <4>0x0026: 0x36 (0x36) <4>0x0027: 0x04 (0x04) <4>0x0028: 0x00 (0x00) <4>0x0029: 0x02 (0x02) <4>0x002A: 0x68 (0x68) <4>0x002B: 0xD4 (0xD4) <4>0x002C: 0xE1 (0xE1) <4>0x002D: 0xA3 (0xA3) <4>0x002E: 0x21 (0x21) <4>0x002F: 0x30 (0x30) <4>0x0030: 0x1A (0x1A) <4>0x0031: 0x02 (0x02) <4>0x0032: 0x43 (0x43) <4>0x0033: 0x29 (0x29) <4>0x0034: 0x04 (0x04) <4>0x0035: 0x00 (0x00) <4>0x0036: 0x02 (0x02) <4>0x0037: 0x68 (0x68) <4>0x0038: 0x84 (0x84) <4>0x0039: 0xB8 (0xB8) <4>0x003A: 0x15 (0x15) <4>0x003B: 0x20 (0x20) <4>0x003C: 0xC0 (0xC0) <4>0x003D: 0x00 (0x00) <4>0x003E: 0x0F (0x0F) <4>0x003F: 0x3C (0x3C) <4>0x0040: 0x20 (0x20) <4>0x0041: 0x20 (0x3D) <5>jffs2_get_inode_nodes(): Data CRC failed on node at 0x000bfbd4: Read 0xc073b413, calculated 0xa375d8ca <7>Obsoleting previously unchecked node at 0x000bfbd4 of len 88: Dirtying you will see that the last byte (offset 0x41) mirrors byte at offset 0x40. So i suspect a bug in the cmdset2 driver here. The third value is the value received by directly accessing the flash using the memory address and is always correct. If the node data size is odd then the 2nd byte from behind is affected: <4>0x002F: 0x0F (0x0F) <4>0x0030: 0x3C (0x3C) <4>0x0031: 0x3C (0x20) <4>0x0032: 0x3D (0x3D) Am still collecting patterns to determine wether it is really a bug in the mtd core or just a cache issue or whatever. BTW: Is %zd a leagal printk formatting tag? Doesn't seem to work for me. I can commit a fix (removing the z) for that if you want. Regards, Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption 2004-02-08 11:38 ` Florian Schirmer @ 2004-02-08 11:53 ` David Woodhouse 2004-02-08 17:02 ` Florian Schirmer 0 siblings, 1 reply; 38+ messages in thread From: David Woodhouse @ 2004-02-08 11:53 UTC (permalink / raw) To: Florian Schirmer; +Cc: linux-mtd On Sun, 2004-02-08 at 12:38 +0100, Florian Schirmer wrote: > Hi, > > > > i'm having a strange problem with a JFFS2 filesystem: I always get > > > messages about incorrect CRC values while mounting the filesystem. What i > > > did: > > > > Please set CONFIG_JFFS2_FS_DEBUG=1, then reproduce this with full > > logging -- you'll need to do it over a serial console to make sure you > > catch everything. > > Thanks for your suggestion. I'm a little bit further finding this problem. It > turns out to be a cmdset2 issue and not a jffs2 problem. I'm still digging > what is going on. > > What i've got so far is: jffs2_read_flash returns bogous data. In the case of > a bad crc the last odd byte is wrong. Looks like a byte access instead of a > word access. (Buswidth of the flash: 2) If you look at this: Interesting. That would be a problem in map_copy_from(), which is either just memcpy_fromio() or a function provided by your map driver, depending on whether you have CONFIG_MTD_COMPLEX_MAPPINGS enabled. Can you check what memcpy_fromio() does directly? Try changing the definition of map_copy_from() in include/linux/mtd/map.h or drivers/mtd/maps/map_funcs.c or your map driver (depending on your configuration) to copy a byte at a time, and see if that fixes it. The '%zd' modifier is the correct way to print a size_t; the old %Z was a gcc-ism. It's supported by the 2.6 and I thought also the current 2.4 kernels; what kernel are you using, precisely? It's a trivial patch to add it. -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption 2004-02-08 11:53 ` David Woodhouse @ 2004-02-08 17:02 ` Florian Schirmer 2004-02-08 17:13 ` David Woodhouse 0 siblings, 1 reply; 38+ messages in thread From: Florian Schirmer @ 2004-02-08 17:02 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd [-- Attachment #1: Type: text/plain, Size: 1941 bytes --] Hi, > Interesting. That would be a problem in map_copy_from(), which is either > just memcpy_fromio() or a function provided by your map driver, > depending on whether you have CONFIG_MTD_COMPLEX_MAPPINGS enabled. > > Can you check what memcpy_fromio() does directly? Try changing the > definition of map_copy_from() in include/linux/mtd/map.h or > drivers/mtd/maps/map_funcs.c or your map driver (depending on your > configuration) to copy a byte at a time, and see if that fixes it. Yeah. You're absolutely right. I should have checked cmdset2 before claiming it is broken. memcopy_fromio is memcpy on MIPS. I've checked the memcpy implementation in arch/mips/lib/memcpy.S and it is using byte transfers for remaining byte transfers. So i expect at least the MIPS and SH plattform to be broken for non buswidth==1 setups. Haven't checked other archs yet. I'm wondering what assumptions can be made about memcpy's access patterns? Looks like nobody even guarantees a specific access type. But we need to make sure every access to the flash matches the bus width, don't we? My proposed solution would be to handle remaining transfers in maps.h/map_funcs.c already. So that the arch code only has to handle full transfers. See attached patch. I've to admit that its very hard/impossible to detect the remaining byte size. BITS_PER_LONG / 2 seems at least the best guess we can make. This will handle SH, MIPS32 and MIPS64 just fine. > The '%zd' modifier is the correct way to print a size_t; the old %Z was > a gcc-ism. It's supported by the 2.6 and I thought also the current 2.4 > kernels; what kernel are you using, precisely? It's a trivial patch to > add it. I'm still using what was provided by Broadcom/Linksys (2.4.20). So dont worry about that. I've ported the Broadcom code over to current 2.4.x but went back to 2.4.20 due to the jffs2 corruption. Regards, Florian [-- Attachment #2: mtd-memcpy.diff --] [-- Type: text/x-diff, Size: 1113 bytes --] --- linux-bcm/include/linux/mtd/map.h-old 2003-05-28 14:42:22.000000000 +0200 +++ linux-bcm/include/linux/mtd/map.h 2004-02-08 17:43:55.398928752 +0100 @@ -156,7 +156,30 @@ static inline void map_write64(struct ma static inline void map_copy_from(struct map_info *map, void *to, unsigned long from, ssize_t len) { - memcpy_fromio(to, map->virt + from, len); +#define MTD_ALIGN (4 - 1) +//#define MTD_ALIGN ((BITS_PER_LONG / 2) - 1) + u64 buf; + ssize_t transfer; + ssize_t done = (map->buswidth == 1) ? len : (len & ~MTD_ALIGN); + memcpy_fromio(to, map->virt + from, done); + while (done < len) { + switch(map->buswidth) { + case 2: + buf = map_read16(map, from + done); + break; + case 4: + buf = map_read32(map, from + done); + break; + case 8: + buf = map_read64(map, from + done); + break; + } + transfer = len - done; + if (transfer > map->buswidth) + transfer = map->buswidth; + memcpy((void *)((unsigned long)to + done), &buf, transfer); + done += transfer; + } } static inline void map_copy_to(struct map_info *map, unsigned long to, const void *from, ssize_t len) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption 2004-02-08 17:02 ` Florian Schirmer @ 2004-02-08 17:13 ` David Woodhouse 0 siblings, 0 replies; 38+ messages in thread From: David Woodhouse @ 2004-02-08 17:13 UTC (permalink / raw) To: Florian Schirmer; +Cc: linux-mtd On Sun, 2004-02-08 at 18:02 +0100, Florian Schirmer wrote: > memcopy_fromio is memcpy on MIPS. I've checked the memcpy implementation in > arch/mips/lib/memcpy.S and it is using byte transfers for remaining byte > transfers. So i expect at least the MIPS and SH plattform to be broken for > non buswidth==1 setups. Haven't checked other archs yet. I don't understand. Why should it be broken? If your bus controller is set up right for that region, byte loads ought to work fine. It'll _issue_ a wider load, then pick the right byte from it. What's happening is that it's picking the _wrong_ byte from it. Is the bus controller set up to byte-swap transparently when we do 16-bit access? Can you show the result of a single 32-bit load, two consecutive 16-bit loads, and four consecutive 8-bit loads all starting at the same address? > I'm wondering what assumptions can be made about memcpy's access patterns? > Looks like nobody even guarantees a specific access type. But we need to make > sure every access to the flash matches the bus width, don't we? I don't think so. The bus controller really ought to handle this for us, for reading. -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption 2004-02-05 13:15 JFFS2 corruption Florian Schirmer 2004-02-08 10:37 ` David Woodhouse @ 2004-02-08 11:28 ` David Woodhouse 1 sibling, 0 replies; 38+ messages in thread From: David Woodhouse @ 2004-02-08 11:28 UTC (permalink / raw) To: Florian Schirmer; +Cc: linux-mtd On Thu, 2004-02-05 at 14:15 +0100, Florian Schirmer wrote: > i'm having a strange problem with a JFFS2 filesystem: I always get messages > about incorrect CRC values while mounting the filesystem. What i did: Also apply this. Index: fs/jffs2/write.c =================================================================== RCS file: /home/cvs/mtd/fs/jffs2/write.c,v retrieving revision 1.80 diff -u -p -r1.80 write.c --- fs/jffs2/write.c 27 Jan 2004 13:21:50 -0000 1.80 +++ fs/jffs2/write.c 8 Feb 2004 11:29:14 -0000 @@ -97,6 +97,12 @@ struct jffs2_full_dnode *jffs2_write_dno int retried = 0; unsigned long cnt = 2; + if (data && datalen && + (je32_to_cpu(ri->data_crc) != crc32(0, data, datalen))) { + printk(KERN_CRIT "Eep. Data CRC not correct in jffs2_write_dnode()\n"); + BUG(); + } + D1(if(je32_to_cpu(ri->hdr_crc) != crc32(0, ri, sizeof(struct jffs2_unknown_node)-4)) { printk(KERN_CRIT "Eep. CRC not correct in jffs2_write_dnode()\n"); BUG(); -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2004-03-19 11:11 UTC | newest] Thread overview: 38+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-02-19 16:48 JFFS2 Corruption simon 2004-02-23 11:07 ` simon 2004-02-24 9:48 ` simon 2004-02-24 12:00 ` David Woodhouse 2004-02-24 12:54 ` Simon Haynes 2004-02-24 13:04 ` Simon Haynes 2004-02-24 13:40 ` Simon Haynes 2004-02-24 14:22 ` David Woodhouse 2004-02-24 14:25 ` Simon Haynes 2004-02-24 14:56 ` David Woodhouse 2004-02-24 14:58 ` Simon Haynes 2004-02-24 15:35 ` David Woodhouse 2004-02-24 15:47 ` Simon Haynes 2004-02-24 16:14 ` David Woodhouse 2004-02-24 16:17 ` Simon Haynes 2004-02-24 16:51 ` David Woodhouse 2004-02-24 17:05 ` Simon Haynes 2004-02-24 18:05 ` David Woodhouse 2004-02-24 18:04 ` Simon Haynes 2004-02-25 9:49 ` simon 2004-02-25 10:25 ` David Woodhouse 2004-02-26 11:08 ` Simon Haynes 2004-02-26 11:55 ` David Woodhouse 2004-03-03 15:31 ` David Woodhouse 2004-03-08 15:10 ` Simon Haynes 2004-03-09 15:33 ` Simon Haynes 2004-03-16 16:14 ` David Woodhouse 2004-03-19 10:37 ` Simon Haynes 2004-03-19 11:11 ` David Woodhouse 2004-02-24 17:12 ` Simon Haynes 2004-02-24 16:55 ` David Woodhouse -- strict thread matches above, loose matches on Subject: below -- 2004-02-05 13:15 JFFS2 corruption Florian Schirmer 2004-02-08 10:37 ` David Woodhouse 2004-02-08 11:38 ` Florian Schirmer 2004-02-08 11:53 ` David Woodhouse 2004-02-08 17:02 ` Florian Schirmer 2004-02-08 17:13 ` David Woodhouse 2004-02-08 11:28 ` David Woodhouse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox