* JFFS2 corruption
@ 2004-02-05 13:15 Florian Schirmer
2004-02-08 10:37 ` David Woodhouse
2004-02-08 11:28 ` David Woodhouse
0 siblings, 2 replies; 38+ messages in thread
From: Florian Schirmer @ 2004-02-05 13:15 UTC (permalink / raw)
To: linux-mtd
Hi,
i'm having a strange problem with a JFFS2 filesystem: I always get messages
about incorrect CRC values while mounting the filesystem. What i did:
<create a file which contains nothing but lots of 0x42 called dummy.bin>
eraseall -j /dev/mtdblock/3
mount -t jffs2 /dev/mtdblock /mnt
cp /dummy.bin /mnt/dummy1.bin
cp /dummy.bin /mnt/dummy2.bin
cp /dummy.bin /mnt/dummy3.bin
cp /dummy.bin /mnt/dummy4.bin
sync
cat /dev/mtdblock/3 > /dump1.raw
umount /mnt
cat /dev/mtdblock/3 > /dump2.raw
mount -t jffs2 /dev/mtdblock /mnt
<Multiple messages about invalid JFFS2 CRC values>
cat /dev/mtdblock/3 > /dump3.raw
cmp -l /dump1.raw /dump2.raw => files are the same
cmp -l /dump2.raw /dump3.raw => Multiple differences.
Did some addition tests:
umount /mnt
cat /dev/mtdblock/3 > /dump2.raw
mount -t jffs2 /dev/mtdblock /mnt
Mounts just fine. But there are huge chunks of 0x00 in my dummy files on the
jffs2 partition.
Any ideas on how to track down this problem? Basic MTD/flash stuff seems to
work just fine. I can work with a cramfs image on this partition without any
errors. So i assume something is going wrong in the JFFS2 code.
Specs:
Build Host: x86
Target: MIPS (LE)
Flash: NOR (AMD)
MTD/JFFS2 Codebase: current CVS
Thanks,
Florian
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption
2004-02-05 13:15 JFFS2 corruption Florian Schirmer
@ 2004-02-08 10:37 ` David Woodhouse
2004-02-08 11:38 ` Florian Schirmer
2004-02-08 11:28 ` David Woodhouse
1 sibling, 1 reply; 38+ messages in thread
From: David Woodhouse @ 2004-02-08 10:37 UTC (permalink / raw)
To: Florian Schirmer; +Cc: linux-mtd
On Thu, 2004-02-05 at 14:15 +0100, Florian Schirmer wrote:
> Hi,
>
> i'm having a strange problem with a JFFS2 filesystem: I always get messages
> about incorrect CRC values while mounting the filesystem. What i did:
Please set CONFIG_JFFS2_FS_DEBUG=1, then reproduce this with full
logging -- you'll need to do it over a serial console to make sure you
catch everything.
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption
2004-02-05 13:15 JFFS2 corruption Florian Schirmer
2004-02-08 10:37 ` David Woodhouse
@ 2004-02-08 11:28 ` David Woodhouse
1 sibling, 0 replies; 38+ messages in thread
From: David Woodhouse @ 2004-02-08 11:28 UTC (permalink / raw)
To: Florian Schirmer; +Cc: linux-mtd
On Thu, 2004-02-05 at 14:15 +0100, Florian Schirmer wrote:
> i'm having a strange problem with a JFFS2 filesystem: I always get messages
> about incorrect CRC values while mounting the filesystem. What i did:
Also apply this.
Index: fs/jffs2/write.c
===================================================================
RCS file: /home/cvs/mtd/fs/jffs2/write.c,v
retrieving revision 1.80
diff -u -p -r1.80 write.c
--- fs/jffs2/write.c 27 Jan 2004 13:21:50 -0000 1.80
+++ fs/jffs2/write.c 8 Feb 2004 11:29:14 -0000
@@ -97,6 +97,12 @@ struct jffs2_full_dnode *jffs2_write_dno
int retried = 0;
unsigned long cnt = 2;
+ if (data && datalen &&
+ (je32_to_cpu(ri->data_crc) != crc32(0, data, datalen))) {
+ printk(KERN_CRIT "Eep. Data CRC not correct in jffs2_write_dnode()\n");
+ BUG();
+ }
+
D1(if(je32_to_cpu(ri->hdr_crc) != crc32(0, ri, sizeof(struct jffs2_unknown_node)-4)) {
printk(KERN_CRIT "Eep. CRC not correct in jffs2_write_dnode()\n");
BUG();
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption
2004-02-08 10:37 ` David Woodhouse
@ 2004-02-08 11:38 ` Florian Schirmer
2004-02-08 11:53 ` David Woodhouse
0 siblings, 1 reply; 38+ messages in thread
From: Florian Schirmer @ 2004-02-08 11:38 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
Hi,
> > i'm having a strange problem with a JFFS2 filesystem: I always get
> > messages about incorrect CRC values while mounting the filesystem. What i
> > did:
>
> Please set CONFIG_JFFS2_FS_DEBUG=1, then reproduce this with full
> logging -- you'll need to do it over a serial console to make sure you
> catch everything.
Thanks for your suggestion. I'm a little bit further finding this problem. It
turns out to be a cmdset2 issue and not a jffs2 problem. I'm still digging
what is going on.
What i've got so far is: jffs2_read_flash returns bogous data. In the case of
a bad crc the last odd byte is wrong. Looks like a byte access instead of a
word access. (Buswidth of the flash: 2) If you look at this:
<7>Node at 000bfbd4 (0) is a data node
<7>version 269, highest_version now 274
<4>crc'ing unpointed
<4>Node data dump (66 bytes)
<4>0x0000: 0x78 (0x78)
<4>0x0001: 0x5E (0x5E)
<4>0x0002: 0x72 (0x72)
<4>0x0003: 0x72 (0x72)
<4>0x0004: 0x1A (0x1A)
<4>0x0005: 0x05 (0x05)
<4>0x0006: 0xA3 (0xA3)
<4>0x0007: 0x21 (0x21)
<4>0x0008: 0x30 (0x30)
<4>0x0009: 0x1A (0x1A)
<4>0x000A: 0x02 (0x02)
<4>0x000B: 0xC3 (0xC3)
<4>0x000C: 0x31 (0x31)
<4>0x000D: 0x04 (0x04)
<4>0x000E: 0x00 (0x00)
<4>0x000F: 0x02 (0x02)
<4>0x0010: 0x68 (0x68)
<4>0x0011: 0xD4 (0xD4)
<4>0x0012: 0x4F (0x4F)
<4>0x0013: 0xA3 (0xA3)
<4>0x0014: 0x21 (0x21)
<4>0x0015: 0x30 (0x30)
<4>0x0016: 0x1A (0x1A)
<4>0x0017: 0x02 (0x02)
<4>0x0018: 0xC3 (0xC3)
<4>0x0019: 0x23 (0x23)
<4>0x001A: 0x04 (0x04)
<4>0x001B: 0x00 (0x00)
<4>0x001C: 0x02 (0x02)
<4>0x001D: 0x68 (0x68)
<4>0x001E: 0xD4 (0xD4)
<4>0x001F: 0x17 (0x17)
<4>0x0020: 0xA3 (0xA3)
<4>0x0021: 0x21 (0x21)
<4>0x0022: 0x30 (0x30)
<4>0x0023: 0x1A (0x1A)
<4>0x0024: 0x02 (0x02)
<4>0x0025: 0x43 (0x43)
<4>0x0026: 0x36 (0x36)
<4>0x0027: 0x04 (0x04)
<4>0x0028: 0x00 (0x00)
<4>0x0029: 0x02 (0x02)
<4>0x002A: 0x68 (0x68)
<4>0x002B: 0xD4 (0xD4)
<4>0x002C: 0xE1 (0xE1)
<4>0x002D: 0xA3 (0xA3)
<4>0x002E: 0x21 (0x21)
<4>0x002F: 0x30 (0x30)
<4>0x0030: 0x1A (0x1A)
<4>0x0031: 0x02 (0x02)
<4>0x0032: 0x43 (0x43)
<4>0x0033: 0x29 (0x29)
<4>0x0034: 0x04 (0x04)
<4>0x0035: 0x00 (0x00)
<4>0x0036: 0x02 (0x02)
<4>0x0037: 0x68 (0x68)
<4>0x0038: 0x84 (0x84)
<4>0x0039: 0xB8 (0xB8)
<4>0x003A: 0x15 (0x15)
<4>0x003B: 0x20 (0x20)
<4>0x003C: 0xC0 (0xC0)
<4>0x003D: 0x00 (0x00)
<4>0x003E: 0x0F (0x0F)
<4>0x003F: 0x3C (0x3C)
<4>0x0040: 0x20 (0x20)
<4>0x0041: 0x20 (0x3D)
<5>jffs2_get_inode_nodes(): Data CRC failed on node at 0x000bfbd4: Read
0xc073b413, calculated 0xa375d8ca
<7>Obsoleting previously unchecked node at 0x000bfbd4 of len 88: Dirtying
you will see that the last byte (offset 0x41) mirrors byte at offset 0x40. So
i suspect a bug in the cmdset2 driver here. The third value is the value
received by directly accessing the flash using the memory address and is
always correct. If the node data size is odd then the 2nd byte from behind is
affected:
<4>0x002F: 0x0F (0x0F)
<4>0x0030: 0x3C (0x3C)
<4>0x0031: 0x3C (0x20)
<4>0x0032: 0x3D (0x3D)
Am still collecting patterns to determine wether it is really a bug in the mtd
core or just a cache issue or whatever.
BTW: Is %zd a leagal printk formatting tag? Doesn't seem to work for me. I can
commit a fix (removing the z) for that if you want.
Regards,
Florian
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption
2004-02-08 11:38 ` Florian Schirmer
@ 2004-02-08 11:53 ` David Woodhouse
2004-02-08 17:02 ` Florian Schirmer
0 siblings, 1 reply; 38+ messages in thread
From: David Woodhouse @ 2004-02-08 11:53 UTC (permalink / raw)
To: Florian Schirmer; +Cc: linux-mtd
On Sun, 2004-02-08 at 12:38 +0100, Florian Schirmer wrote:
> Hi,
>
> > > i'm having a strange problem with a JFFS2 filesystem: I always get
> > > messages about incorrect CRC values while mounting the filesystem. What i
> > > did:
> >
> > Please set CONFIG_JFFS2_FS_DEBUG=1, then reproduce this with full
> > logging -- you'll need to do it over a serial console to make sure you
> > catch everything.
>
> Thanks for your suggestion. I'm a little bit further finding this problem. It
> turns out to be a cmdset2 issue and not a jffs2 problem. I'm still digging
> what is going on.
>
> What i've got so far is: jffs2_read_flash returns bogous data. In the case of
> a bad crc the last odd byte is wrong. Looks like a byte access instead of a
> word access. (Buswidth of the flash: 2) If you look at this:
Interesting. That would be a problem in map_copy_from(), which is either
just memcpy_fromio() or a function provided by your map driver,
depending on whether you have CONFIG_MTD_COMPLEX_MAPPINGS enabled.
Can you check what memcpy_fromio() does directly? Try changing the
definition of map_copy_from() in include/linux/mtd/map.h or
drivers/mtd/maps/map_funcs.c or your map driver (depending on your
configuration) to copy a byte at a time, and see if that fixes it.
The '%zd' modifier is the correct way to print a size_t; the old %Z was
a gcc-ism. It's supported by the 2.6 and I thought also the current 2.4
kernels; what kernel are you using, precisely? It's a trivial patch to
add it.
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption
2004-02-08 11:53 ` David Woodhouse
@ 2004-02-08 17:02 ` Florian Schirmer
2004-02-08 17:13 ` David Woodhouse
0 siblings, 1 reply; 38+ messages in thread
From: Florian Schirmer @ 2004-02-08 17:02 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]
Hi,
> Interesting. That would be a problem in map_copy_from(), which is either
> just memcpy_fromio() or a function provided by your map driver,
> depending on whether you have CONFIG_MTD_COMPLEX_MAPPINGS enabled.
>
> Can you check what memcpy_fromio() does directly? Try changing the
> definition of map_copy_from() in include/linux/mtd/map.h or
> drivers/mtd/maps/map_funcs.c or your map driver (depending on your
> configuration) to copy a byte at a time, and see if that fixes it.
Yeah. You're absolutely right. I should have checked cmdset2 before claiming
it is broken.
memcopy_fromio is memcpy on MIPS. I've checked the memcpy implementation in
arch/mips/lib/memcpy.S and it is using byte transfers for remaining byte
transfers. So i expect at least the MIPS and SH plattform to be broken for
non buswidth==1 setups. Haven't checked other archs yet.
I'm wondering what assumptions can be made about memcpy's access patterns?
Looks like nobody even guarantees a specific access type. But we need to make
sure every access to the flash matches the bus width, don't we?
My proposed solution would be to handle remaining transfers in
maps.h/map_funcs.c already. So that the arch code only has to handle full
transfers. See attached patch. I've to admit that its very hard/impossible to
detect the remaining byte size. BITS_PER_LONG / 2 seems at least the best
guess we can make. This will handle SH, MIPS32 and MIPS64 just fine.
> The '%zd' modifier is the correct way to print a size_t; the old %Z was
> a gcc-ism. It's supported by the 2.6 and I thought also the current 2.4
> kernels; what kernel are you using, precisely? It's a trivial patch to
> add it.
I'm still using what was provided by Broadcom/Linksys (2.4.20). So dont worry
about that. I've ported the Broadcom code over to current 2.4.x but went back
to 2.4.20 due to the jffs2 corruption.
Regards,
Florian
[-- Attachment #2: mtd-memcpy.diff --]
[-- Type: text/x-diff, Size: 1113 bytes --]
--- linux-bcm/include/linux/mtd/map.h-old 2003-05-28 14:42:22.000000000 +0200
+++ linux-bcm/include/linux/mtd/map.h 2004-02-08 17:43:55.398928752 +0100
@@ -156,7 +156,30 @@ static inline void map_write64(struct ma
static inline void map_copy_from(struct map_info *map, void *to, unsigned long from, ssize_t len)
{
- memcpy_fromio(to, map->virt + from, len);
+#define MTD_ALIGN (4 - 1)
+//#define MTD_ALIGN ((BITS_PER_LONG / 2) - 1)
+ u64 buf;
+ ssize_t transfer;
+ ssize_t done = (map->buswidth == 1) ? len : (len & ~MTD_ALIGN);
+ memcpy_fromio(to, map->virt + from, done);
+ while (done < len) {
+ switch(map->buswidth) {
+ case 2:
+ buf = map_read16(map, from + done);
+ break;
+ case 4:
+ buf = map_read32(map, from + done);
+ break;
+ case 8:
+ buf = map_read64(map, from + done);
+ break;
+ }
+ transfer = len - done;
+ if (transfer > map->buswidth)
+ transfer = map->buswidth;
+ memcpy((void *)((unsigned long)to + done), &buf, transfer);
+ done += transfer;
+ }
}
static inline void map_copy_to(struct map_info *map, unsigned long to, const void *from, ssize_t len)
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 corruption
2004-02-08 17:02 ` Florian Schirmer
@ 2004-02-08 17:13 ` David Woodhouse
0 siblings, 0 replies; 38+ messages in thread
From: David Woodhouse @ 2004-02-08 17:13 UTC (permalink / raw)
To: Florian Schirmer; +Cc: linux-mtd
On Sun, 2004-02-08 at 18:02 +0100, Florian Schirmer wrote:
> memcopy_fromio is memcpy on MIPS. I've checked the memcpy implementation in
> arch/mips/lib/memcpy.S and it is using byte transfers for remaining byte
> transfers. So i expect at least the MIPS and SH plattform to be broken for
> non buswidth==1 setups. Haven't checked other archs yet.
I don't understand. Why should it be broken? If your bus controller is
set up right for that region, byte loads ought to work fine. It'll
_issue_ a wider load, then pick the right byte from it.
What's happening is that it's picking the _wrong_ byte from it. Is the
bus controller set up to byte-swap transparently when we do 16-bit
access?
Can you show the result of a single 32-bit load, two consecutive 16-bit
loads, and four consecutive 8-bit loads all starting at the same
address?
> I'm wondering what assumptions can be made about memcpy's access patterns?
> Looks like nobody even guarantees a specific access type. But we need to make
> sure every access to the flash matches the bus width, don't we?
I don't think so. The bus controller really ought to handle this for us,
for reading.
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* JFFS2 Corruption.
@ 2004-02-19 16:48 simon
2004-02-23 11:07 ` simon
0 siblings, 1 reply; 38+ messages in thread
From: simon @ 2004-02-19 16:48 UTC (permalink / raw)
To: linux-mtd
I am having problems using JFFS2 filesystems on a NAND device. I have had
these problems in the past an decided not to use this method for storing my data.
I would really like to do this and I was wondering if anyone else had been able to
run such a system reliably ? The NAND device is a 128MB SMC and I have
downloaded the mtd code from CVS within the last week.
I am using mtdpart to provide partitions but for the moment I am only using mtd1 as
a root file system. To build this I perform the following steps.
1. Boot system via network and mount nfs root
2. eraseall /dev/mtd1
3. mount -t jffs2 /dev/mtdblock1 /smc
4. cd /smc
5. tar xvzf /rootfilesystem.tgz
6. unmount /smc
I then reboot using the SMC as my root filesystem. As the system gets rebooted
and incurrs more writes I get the following kinds of message
Empty flash at 0x00469ffcb ends at 0x0046a000
or
jffs2_scan_dirent_node(): Node CRC failed on node at 0x0046a7f0
read 0xffffffff calculated 0xdec8161b
I have written in before about these messages and I understand that the first is of
no concern but the second relates to write data which may not have been written to
the SMC.
The thing is I have had this happen on two completely different harware designs.
On an X86 board and the other a PPC.
I have checked and the OS umounts root as it shuts down. Also I cannot find
fsck.jffs2 in the util directory. Does it exist ?
In trying to debug this I noticed that the device was still reporting busy when
nand_command was entered. I have put a line of code in to delay until the device is
ready at the start of the routine. It is my understanding that if the device is busy
when you try and select/deselect or send a command the outcome cannot be
predicted. If this is not the case I would like to understand why. If it is how does any
of the NAND code work ?
Any help greatly appreciated
Cheers Simon.
__________________________
Simon Haynes - Baydel
Phone : 44 (0) 1372 378811
Email : simon@baydel.com
__________________________
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-19 16:48 JFFS2 Corruption simon
@ 2004-02-23 11:07 ` simon
2004-02-24 9:48 ` simon
0 siblings, 1 reply; 38+ messages in thread
From: simon @ 2004-02-23 11:07 UTC (permalink / raw)
To: linux-mtd
As I have been trying to use this as a root filesystem I checked made checks into the
dismounting of the filesystem. It seems that the halt script remounts root read only.
However I found that this does not cause mtdblock to flush it's cache. I have written a
small program to perform an ioctl BLKFLSBUF and I call it after the system mounts
root read only. This seems to help but it is not the whole story.
I have also trawled the archives. There are some articles which list similar problems.
It is proposed that these are related to unaligned memory access. Can anyone
explain why this matters and how to prevent it.
My current architecture is PPC.
Many Thanks
Simon.
On 19 Feb 2004 at 16:48, simon@baydel.com wrote:
> I am having problems using JFFS2 filesystems on a NAND device. I have
> had these problems in the past an decided not to use this method for
> storing my data. I would really like to do this and I was wondering if
> anyone else had been able to run such a system reliably ? The NAND
> device is a 128MB SMC and I have downloaded the mtd code from CVS
> within the last week.
>
> I am using mtdpart to provide partitions but for the moment I am only
> using mtd1 as a root file system. To build this I perform the
> following steps.
>
>
> 1. Boot system via network and mount nfs root
> 2. eraseall /dev/mtd1
> 3. mount -t jffs2 /dev/mtdblock1 /smc
> 4. cd /smc
> 5. tar xvzf /rootfilesystem.tgz
> 6. unmount /smc
>
> I then reboot using the SMC as my root filesystem. As the system gets
> rebooted and incurrs more writes I get the following kinds of message
>
> Empty flash at 0x00469ffcb ends at 0x0046a000
>
> or
>
> jffs2_scan_dirent_node(): Node CRC failed on node at 0x0046a7f0
> read 0xffffffff calculated 0xdec8161b
>
> I have written in before about these messages and I understand that
> the first is of no concern but the second relates to write data which
> may not have been written to the SMC.
>
> The thing is I have had this happen on two completely different
> harware designs. On an X86 board and the other a PPC.
>
> I have checked and the OS umounts root as it shuts down. Also I cannot
> find fsck.jffs2 in the util directory. Does it exist ?
>
> In trying to debug this I noticed that the device was still reporting
> busy when nand_command was entered. I have put a line of code in to
> delay until the device is ready at the start of the routine. It is my
> understanding that if the device is busy when you try and
> select/deselect or send a command the outcome cannot be predicted. If
> this is not the case I would like to understand why. If it is how does
> any of the NAND code work ?
>
> Any help greatly appreciated
>
>
> Cheers Simon.
>
>
>
>
>
>
>
> __________________________
>
> Simon Haynes - Baydel
> Phone : 44 (0) 1372 378811
> Email : simon@baydel.com
> __________________________
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
__________________________
Simon Haynes - Baydel
Phone : 44 (0) 1372 378811
Email : simon@baydel.com
__________________________
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-23 11:07 ` simon
@ 2004-02-24 9:48 ` simon
2004-02-24 12:00 ` David Woodhouse
0 siblings, 1 reply; 38+ messages in thread
From: simon @ 2004-02-24 9:48 UTC (permalink / raw)
To: linux-mtd
> On 19 Feb 2004 at 16:48, simon@baydel.com wrote:
>
> > I am having problems using JFFS2 filesystems on a NAND device. I
> > have had these problems in the past an decided not to use this
> > method for storing my data. I would really like to do this and I was
> > wondering if anyone else had been able to run such a system reliably
> > ? The NAND device is a 128MB SMC and I have downloaded the mtd code
> > from CVS within the last week.
> >
> > I am using mtdpart to provide partitions but for the moment I am
> > only using mtd1 as a root file system. To build this I perform the
> > following steps.
> >
> >
> > 1. Boot system via network and mount nfs root
> > 2. eraseall /dev/mtd1
> > 3. mount -t jffs2 /dev/mtdblock1 /smc
> > 4. cd /smc
> > 5. tar xvzf /rootfilesystem.tgz
> > 6. unmount /smc
> >
> > I then reboot using the SMC as my root filesystem. As the system
> > gets rebooted and incurrs more writes I get the following kinds of
> > message
> >
> > Empty flash at 0x00469ffcb ends at 0x0046a000
> >
> > or
> >
> > jffs2_scan_dirent_node(): Node CRC failed on node at 0x0046a7f0 read
> > 0xffffffff calculated 0xdec8161b
> >
> > I have written in before about these messages and I understand that
> > the first is of no concern but the second relates to write data
> > which may not have been written to the SMC.
> >
> > The thing is I have had this happen on two completely different
> > harware designs. On an X86 board and the other a PPC.
> >
> > I have checked and the OS umounts root as it shuts down. Also I
> > cannot find fsck.jffs2 in the util directory. Does it exist ?
> >
> > In trying to debug this I noticed that the device was still
> > reporting busy when nand_command was entered. I have put a line of
> > code in to delay until the device is ready at the start of the
> > routine. It is my understanding that if the device is busy when you
> > try and select/deselect or send a command the outcome cannot be
> > predicted. If this is not the case I would like to understand why.
> > If it is how does any of the NAND code work ?
> >
> > Any help greatly appreciated
> >
> >
> > Cheers Simon.
> >
> >
> >
On 23 Feb 2004 at 11:07, simon@baydel.com wrote:
> As I have been trying to use this as a root filesystem I checked made
> checks into the dismounting of the filesystem. It seems that the halt
> script remounts root read only. However I found that this does not
> cause mtdblock to flush it's cache. I have written a small program to
> perform an ioctl BLKFLSBUF and I call it after the system mounts root
> read only. This seems to help but it is not the whole story.
>
> I have also trawled the archives. There are some articles which list
> similar problems. It is proposed that these are related to unaligned
> memory access. Can anyone explain why this matters and how to prevent
> it.
>
> My current architecture is PPC.
>
> Many Thanks
>
>
> Simon.
>
>
Comitted the sin of posting a reply with the reply text first, sorry.
I have only managed to get this to fail if the jffs2 filesystem is mounted as root. I do
not seem to be able to get it to close and unmount the filesystem at shutdown. I
guess the BLKFLSBUF I do only flushes the buffer that is created when I open the
device and not the one that was created when the kernel opened the device.
My current thoughts are to create a ro root and and additional rw. Is there a better
way that I can do this ?
Cheers
Simon. __________________________
Simon Haynes - Baydel
Phone : 44 (0) 1372 378811
Email : simon@baydel.com
__________________________
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 9:48 ` simon
@ 2004-02-24 12:00 ` David Woodhouse
2004-02-24 12:54 ` Simon Haynes
2004-02-24 13:04 ` Simon Haynes
0 siblings, 2 replies; 38+ messages in thread
From: David Woodhouse @ 2004-02-24 12:00 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Tue, 2004-02-24 at 09:48 +0000, simon@baydel.com wrote:
> Comitted the sin of posting a reply with the reply text first, sorry.
And this time you committed the sin of including _far_ more of the
previous mail(s) than was necessary. But I'm not feeling cruel today so
I'll not continue to ignore you :)
> I have only managed to get this to fail if the jffs2 filesystem is mounted as root. I do
> not seem to be able to get it to close and unmount the filesystem at shutdown. I
> guess the BLKFLSBUF I do only flushes the buffer that is created when I open the
> device and not the one that was created when the kernel opened the device.
JFFS2 doesn't actually _use_ the mtdblock device. If you look closely at
the code, you'll see we never read or write to/from it, we just use the
minor number as an argument to get_mtd_device().
In fact, it's perfectly possible to use any _other_ device driver
instead of the mtdblock device, as long as it has the major number which
JFFS2 is looking for.
I wonder if the rootfs-mounting is opening the _actual_ block device and
doing some I/O, and that's later getting flushed, causing corruption.
Although I can't comprehend why a failed attempt to mount, for example,
ext2 would cause the mtdblock device to consider its buffer _dirty_ and
try to write it back on close.
What happens if you use the 'mtdblock_ro' device instead? That shares
the major number, but doesn't share the caching.
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 12:00 ` David Woodhouse
@ 2004-02-24 12:54 ` Simon Haynes
2004-02-24 13:04 ` Simon Haynes
1 sibling, 0 replies; 38+ messages in thread
From: Simon Haynes @ 2004-02-24 12:54 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Tuesday 24 Feb 2004 12:00 pm, David Woodhouse wrote:
> On Tue, 2004-02-24 at 09:48 +0000, simon@baydel.com wrote:
> > Comitted the sin of posting a reply with the reply text first, sorry.
>
> And this time you committed the sin of including _far_ more of the
> previous mail(s) than was necessary. But I'm not feeling cruel today so
> I'll not continue to ignore you :)
>
> > I have only managed to get this to fail if the jffs2 filesystem is
> > mounted as root. I do not seem to be able to get it to close and unmount
Thanks for the reply I am really struggling with this. I appreciate that this
stuff keeps you really busy and I will try to make it as easy for you as I
can.
I only seem to get this problem if I use the SMC as root. I have put a prink
in mtdblock.c which tells me when the flush and release modules are called.
After trawling thw archives I also set the debugging level to 1. Now I get
the message mtdblock_open \n ok when the kernel mounts root.
If I boot from the network, mount the SMC, build a JFFS2 filesystem,copy
files, on umount I get a release message. I can then reboot via the network
and make changes to the JFFS2 filesystem., many times, and all is ok. If I
boot my system and I pass the root=/dev/mtdblock1 argument to the kernel it
comes up fine on the SMC. This modifies mainly files in /var. If I then
reboot the system I do not get any messages from the mtdblock flush or
release routines. Next time the filesystem is mounted it is corrupt.
I have looked int the util/MAKEDEV file and the readme but I don't know how
to select the mtdblock_ro device.
I take it what I am trying to do is possible and is suitable for a production
environment ?
Thanks again
Simon
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 12:00 ` David Woodhouse
2004-02-24 12:54 ` Simon Haynes
@ 2004-02-24 13:04 ` Simon Haynes
2004-02-24 13:40 ` Simon Haynes
1 sibling, 1 reply; 38+ messages in thread
From: Simon Haynes @ 2004-02-24 13:04 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Tuesday 24 Feb 2004 12:54 pm, Simon Haynes wrote:
> On Tuesday 24 Feb 2004 12:00 pm, David Woodhouse wrote:
> > On Tue, 2004-02-24 at 09:48 +0000, simon@baydel.com wrote:
> > > Comitted the sin of posting a reply with the reply text first, sorry.
> >
> > And this time you committed the sin of including _far_ more of the
> > previous mail(s) than was necessary. But I'm not feeling cruel today so
> > I'll not continue to ignore you :)
> >
> > > I have only managed to get this to fail if the jffs2 filesystem is
> > > mounted as root. I do not seem to be able to get it to close and
> > > unmount
>
> Thanks for the reply I am really struggling with this. I appreciate that
> this stuff keeps you really busy and I will try to make it as easy for you
> as I can.
>
Ah, I see the ro stuff is a kernel change. I will give it a go.
Cheers Simon.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 13:04 ` Simon Haynes
@ 2004-02-24 13:40 ` Simon Haynes
2004-02-24 14:22 ` David Woodhouse
0 siblings, 1 reply; 38+ messages in thread
From: Simon Haynes @ 2004-02-24 13:40 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Tuesday 24 Feb 2004 1:04 pm, Simon Haynes wrote:
> On Tuesday 24 Feb 2004 12:54 pm, Simon Haynes wrote:
> > On Tuesday 24 Feb 2004 12:00 pm, David Woodhouse wrote:
> > > On Tue, 2004-02-24 at 09:48 +0000, simon@baydel.com wrote:
> > > > Comitted the sin of posting a reply with the reply text first, sorry.
> > >
> > > And this time you committed the sin of including _far_ more of the
> > > previous mail(s) than was necessary. But I'm not feeling cruel today so
> > > I'll not continue to ignore you :)
> > >
> > > > I have only managed to get this to fail if the jffs2 filesystem is
> > > > mounted as root. I do not seem to be able to get it to close and
> > > > unmount
> >
> > Thanks for the reply I am really struggling with this. I appreciate that
> > this stuff keeps you really busy and I will try to make it as easy for
> > you as I can.
>
> Ah, I see the ro stuff is a kernel change. I will give it a go.
>
I don't understand. I changed the kernel to use the read only device. I
expected this to work rw without caching but it does not. I have already
tried mounting root ro via the caching mtd block and although my system does
not fully start I can't see how I would get corruption ?
I have looked at the mtdblock_ro code and it seems you allow writing if flag
certain bits are set you allow writing. Do I need to set these somewhere ?
Cheers Simon.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 13:40 ` Simon Haynes
@ 2004-02-24 14:22 ` David Woodhouse
2004-02-24 14:25 ` Simon Haynes
0 siblings, 1 reply; 38+ messages in thread
From: David Woodhouse @ 2004-02-24 14:22 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Tue, 2004-02-24 at 13:40 +0000, Simon Haynes wrote:
> I don't understand. I changed the kernel to use the read only device. I
> expected this to work rw without caching but it does not.
You can't use flash RW without caching. It _has_ to
read/modify/erase/writeback to write to flash.
But for JFFS2, you _don't_ use flash RW through mtdblock. It operates
directly and should work fine. What failure mode do you observe?
> I have already
> tried mounting root ro via the caching mtd block and although my system does
> not fully start I can't see how I would get corruption ?
As I said -- I don't know. Maybe there's a bug which causes the mtdblock
device to consider its cache dirty, and write it out to the detriment of
the real data which JFFS2 has already put there on the flash. That's why
I was asking you to test mtdblock_ro.
> I have looked at the mtdblock_ro code and it seems you allow writing if flag
> certain bits are set you allow writing. Do I need to set these somewhere ?
No. Leave them turned off. We don't actually want to write via the
mtdblock device -- that's the whole _point_ in this experiment.
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 14:22 ` David Woodhouse
@ 2004-02-24 14:25 ` Simon Haynes
2004-02-24 14:56 ` David Woodhouse
0 siblings, 1 reply; 38+ messages in thread
From: Simon Haynes @ 2004-02-24 14:25 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Tuesday 24 Feb 2004 2:22 pm, David Woodhouse wrote:
> On Tue, 2004-02-24 at 13:40 +0000, Simon Haynes wrote:
> > I don't understand. I changed the kernel to use the read only device. I
> > expected this to work rw without caching but it does not.
>
> You can't use flash RW without caching. It _has_ to
> read/modify/erase/writeback to write to flash.
>
> But for JFFS2, you _don't_ use flash RW through mtdblock. It operates
> directly and should work fine. What failure mode do you observe?
The problem is when I mount the erased partition using
mount -t jffs2 /dev/mtdblock1 /smc
/smc is mounted read only and I cannot write to it.
surely this is the same as not remounting root read write ?
Cheers Simon.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 14:25 ` Simon Haynes
@ 2004-02-24 14:56 ` David Woodhouse
2004-02-24 14:58 ` Simon Haynes
0 siblings, 1 reply; 38+ messages in thread
From: David Woodhouse @ 2004-02-24 14:56 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Tue, 2004-02-24 at 14:25 +0000, Simon Haynes wrote:
> /smc is mounted read only and I cannot write to it.
>
> surely this is the same as not remounting root read write ?
Hmmm. What happens if you mount -oremount,rw?
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 14:56 ` David Woodhouse
@ 2004-02-24 14:58 ` Simon Haynes
2004-02-24 15:35 ` David Woodhouse
0 siblings, 1 reply; 38+ messages in thread
From: Simon Haynes @ 2004-02-24 14:58 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Tuesday 24 Feb 2004 2:56 pm, David Woodhouse wrote:
> On Tue, 2004-02-24 at 14:25 +0000, Simon Haynes wrote:
> > /smc is mounted read only and I cannot write to it.
> >
> > surely this is the same as not remounting root read write ?
>
> Hmmm. What happens if you mount -oremount,rw?
I have just created the fs with a rw mtdblock kernel and rebooted with a ro
mtdblock. This attempts to remount rw but it cannot.
I have tried booting from the network with a ro mtdblock kernel and
performing the following operation.
-bash-2.05b# mount -t jffs2 /dev/mtdblock2 /smc
mount: block device /dev/mtdblock2 is write-protected, mounting read-only
-bash-2.05b# mount -oremount,rw /smc
mount: block device /dev/mtdblock2 is write-protected, mounting read-only
-bash-2.05b# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / nfs rw,v2,rsize=4096,wsize=4096,hard,udp,nolock,addr=192.9.200.22
0 0
/proc /proc proc rw 0 0
/dev/mtdblock2 /smc jffs2 ro 0 0
-bash-2.05b#
I guess this is not how things are supposed to work. Are there any kernel
settings I should check ?
Cheers
Simon.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 14:58 ` Simon Haynes
@ 2004-02-24 15:35 ` David Woodhouse
2004-02-24 15:47 ` Simon Haynes
0 siblings, 1 reply; 38+ messages in thread
From: David Woodhouse @ 2004-02-24 15:35 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Tue, 2004-02-24 at 14:58 +0000, Simon Haynes wrote:
> I have just created the fs with a rw mtdblock kernel and rebooted with a ro
> mtdblock. This attempts to remount rw but it cannot.
Hmm. OK, hack jffs2 to use the same major number as ramdisk, then try to
mount /dev/ram2 instead :)
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 15:35 ` David Woodhouse
@ 2004-02-24 15:47 ` Simon Haynes
2004-02-24 16:14 ` David Woodhouse
0 siblings, 1 reply; 38+ messages in thread
From: Simon Haynes @ 2004-02-24 15:47 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Tuesday 24 Feb 2004 3:35 pm, David Woodhouse wrote:
> On Tue, 2004-02-24 at 14:58 +0000, Simon Haynes wrote:
> > I have just created the fs with a rw mtdblock kernel and rebooted with a
> > ro mtdblock. This attempts to remount rw but it cannot.
>
> Hmm. OK, hack jffs2 to use the same major number as ramdisk, then try to
> mount /dev/ram2 instead :)
I hacked fs/jffs2/super.c and fs/jffs2/super_v24.c. Only the v24 got
compiled. I guess this is because my kernel is 2.4.21. I had to also include
ramdisk in the kernel as it was previously missing ? Does this effect MTD ?
This mounted rw.
-bash-2.05b# ls -l /dev/ram2
brw-rw---- 1 root disk 1, 2 Jan 31 2003 /dev/ram2
-bash-2.05b# mount -t jffs2 /dev/ram2 /smc
-bash-2.05b# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / nfs rw,v2,rsize=4096,wsize=4096,hard,udp,nolock,addr=192.9.200.22
0 0
/proc /proc proc rw 0 0
/dev/ram2 /smc jffs2 rw 0 0
-bash-2.05b#
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 15:47 ` Simon Haynes
@ 2004-02-24 16:14 ` David Woodhouse
2004-02-24 16:17 ` Simon Haynes
0 siblings, 1 reply; 38+ messages in thread
From: David Woodhouse @ 2004-02-24 16:14 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Tue, 2004-02-24 at 15:47 +0000, Simon Haynes wrote:
> I hacked fs/jffs2/super.c and fs/jffs2/super_v24.c. Only the v24 got
> compiled. I guess this is because my kernel is 2.4.21. I had to also include
> ramdisk in the kernel as it was previously missing ? Does this effect MTD ?
No, it neither effects nor affects the MTD drivers.
> This mounted rw.
OK... now can you reproduce the same _problem_ that way?
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 16:14 ` David Woodhouse
@ 2004-02-24 16:17 ` Simon Haynes
2004-02-24 16:51 ` David Woodhouse
2004-02-24 16:55 ` David Woodhouse
0 siblings, 2 replies; 38+ messages in thread
From: Simon Haynes @ 2004-02-24 16:17 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Tuesday 24 Feb 2004 4:14 pm, David Woodhouse wrote:
> On Tue, 2004-02-24 at 15:47 +0000, Simon Haynes wrote:
> > I hacked fs/jffs2/super.c and fs/jffs2/super_v24.c. Only the v24 got
> > compiled. I guess this is because my kernel is 2.4.21. I had to also
> > include ramdisk in the kernel as it was previously missing ? Does this
> > effect MTD ?
>
> No, it neither effects nor affects the MTD drivers.
>
> > This mounted rw.
>
> OK... now can you reproduce the same _problem_ that way?
My problem only occurs across reboots. As the ramdisk is not persistent
storage how can I try and reproduce the problem. I have tried to reproduce
the problem on mtd devices without rebooting and I have not managed to to
this.
Cheers
SImon.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 16:17 ` Simon Haynes
@ 2004-02-24 16:51 ` David Woodhouse
2004-02-24 17:05 ` Simon Haynes
2004-02-24 17:12 ` Simon Haynes
2004-02-24 16:55 ` David Woodhouse
1 sibling, 2 replies; 38+ messages in thread
From: David Woodhouse @ 2004-02-24 16:51 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Tue, 2004-02-24 at 16:17 +0000, Simon Haynes wrote:
> My problem only occurs across reboots. As the ramdisk is not persistent
> storage how can I try and reproduce the problem. I have tried to reproduce
> the problem on mtd devices without rebooting and I have not managed to to
> this.
The ramdisk isn't _involved_ as a form of storage. JFFS2 doesn't _use_
the block device for _anything_ except for looking at its minor number
to decide which actual MTD device to use.
Hack JFFS2 to always get_mtd_device number 2 and boot with
root=/dev/ram0
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 16:17 ` Simon Haynes
2004-02-24 16:51 ` David Woodhouse
@ 2004-02-24 16:55 ` David Woodhouse
1 sibling, 0 replies; 38+ messages in thread
From: David Woodhouse @ 2004-02-24 16:55 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
Btw if you join #mtd on irc.freenode.net you may experience lower
latency in getting responses.
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 16:51 ` David Woodhouse
@ 2004-02-24 17:05 ` Simon Haynes
2004-02-24 18:05 ` David Woodhouse
2004-02-24 17:12 ` Simon Haynes
1 sibling, 1 reply; 38+ messages in thread
From: Simon Haynes @ 2004-02-24 17:05 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Tuesday 24 Feb 2004 4:51 pm, David Woodhouse wrote:
> Hack JFFS2 to always get_mtd_device number 2 and boot with
> root=/dev/ram0
Sorry again for being so stupid. I know you have only told me half a dozen
times that JFFS2 does not use the block read and write routines.
I have done this and rebooted several times. Normally I would expect a CRC
error within a couple or reboots. As yet I have no CRC errors but I do get a
large number of "Empty Flash" messages, about 15 per reboot. I do not see any
of these if I boot via the network and mount and dismount the smc ?
I will keep rebooting over night and hack the kernel to panic on a CRC error.
Would you recommend changing the printk for the "Empty Flash" messages to a
different level and using the ramdisk as a permanent solution ?
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 16:51 ` David Woodhouse
2004-02-24 17:05 ` Simon Haynes
@ 2004-02-24 17:12 ` Simon Haynes
1 sibling, 0 replies; 38+ messages in thread
From: Simon Haynes @ 2004-02-24 17:12 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Tuesday 24 Feb 2004 5:05 pm, Simon Haynes wrote:
> On Tuesday 24 Feb 2004 4:51 pm, David Woodhouse wrote:
> > Hack JFFS2 to always get_mtd_device number 2 and boot with
> > root=/dev/ram0
>
> Sorry again for being so stupid. I know you have only told me half a dozen
> times that JFFS2 does not use the block read and write routines.
>
> I have done this and rebooted several times. Normally I would expect a
CRC
> error within a couple or reboots. As yet I have no CRC errors but I do get
> a large number of "Empty Flash" messages, about 15 per reboot. I do not see
> any of these if I boot via the network and mount and dismount the smc ?
>
> I will keep rebooting over night and hack the kernel to panic on a CRC
> error.
>
> Would you recommend changing the printk for the "Empty Flash" messages to a
> different level and using the ramdisk as a permanent solution ?
After several more reboots I got the message
jffs2_get_inode_nodes(): Data CRC failed on node at 0x00050748: Read
0x0b5da171, calculated 0xb1b1dbb
This actually came out after I had logged in ?
I will also look into changing my list to irc.freenode.net.
Many Thanks
Simon.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 18:05 ` David Woodhouse
@ 2004-02-24 18:04 ` Simon Haynes
2004-02-25 9:49 ` simon
1 sibling, 0 replies; 38+ messages in thread
From: Simon Haynes @ 2004-02-24 18:04 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Tuesday 24 Feb 2004 6:05 pm, David Woodhouse wrote:
> On Tue, 2004-02-24 at 17:05 +0000, Simon Haynes wrote:
> > Would you recommend changing the printk for the "Empty Flash" messages to
> > a different level and using the ramdisk as a permanent solution ?
>
> Changing the printk level for the 'Empty Flash' messages does seem
> appropriate -- or preferably finding a way to eliminate the ones which
> we don't want. Are you mounting the fs read-only before rebooting? If
> not, the occasional CRC failure is acceptable. You get those with an
> unclean restart. It doesn't indicate data loss; it indicates that one
> particular log entry which you were writing _while_ you rebooted was
> lost. That was not data which userspace thought was already on the
> medium.
>
> The problem with mtdblock is interesting. Can you make it BUG() when it
> dirties its cache and put it back to how it was, using mtdblock?
I do remount the filesystem read only as part of a shutdown. I also cat
/proc/mounts so that I can check it has happened. So the filesystem should be
clean. After this halt is called. I think this tries to write to utmp. If the
kernel already has the device open for write will this be allowed ?
I guess it is strange that the read only mtdblock device prevents write
access via jffs2. I don't know if the cache in mtdblock is ever being used
but I can certainly put a BUG() in there.
Cheers
Simon.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 17:05 ` Simon Haynes
@ 2004-02-24 18:05 ` David Woodhouse
2004-02-24 18:04 ` Simon Haynes
2004-02-25 9:49 ` simon
0 siblings, 2 replies; 38+ messages in thread
From: David Woodhouse @ 2004-02-24 18:05 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Tue, 2004-02-24 at 17:05 +0000, Simon Haynes wrote:
> Would you recommend changing the printk for the "Empty Flash" messages to a
> different level and using the ramdisk as a permanent solution ?
Changing the printk level for the 'Empty Flash' messages does seem
appropriate -- or preferably finding a way to eliminate the ones which
we don't want. Are you mounting the fs read-only before rebooting? If
not, the occasional CRC failure is acceptable. You get those with an
unclean restart. It doesn't indicate data loss; it indicates that one
particular log entry which you were writing _while_ you rebooted was
lost. That was not data which userspace thought was already on the
medium.
The problem with mtdblock is interesting. Can you make it BUG() when it
dirties its cache and put it back to how it was, using mtdblock?
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-24 18:05 ` David Woodhouse
2004-02-24 18:04 ` Simon Haynes
@ 2004-02-25 9:49 ` simon
2004-02-25 10:25 ` David Woodhouse
1 sibling, 1 reply; 38+ messages in thread
From: simon @ 2004-02-25 9:49 UTC (permalink / raw)
To: dwmw2; +Cc: linux-mtd
On 24 Feb 2004 at 18:05, David Woodhouse wrote:
> On Tue, 2004-02-24 at 17:05 +0000, Simon Haynes wrote:
> > Would you recommend changing the printk for the "Empty Flash"
> > messages to a different level and using the ramdisk as a permanent
> > solution ?
>
> Changing the printk level for the 'Empty Flash' messages does seem
> appropriate -- or preferably finding a way to eliminate the ones which
Yesteday I managed to reproduce the problem using /dev/ram. The first message I
noticed was
jffs2_get_inode_nodes(): Data CRC failed on node at 0x00050748: Read
0x0b5da171, calculated 0xb1b1dbb
This happened some time after the kernel mounted root. On following reboots this
message was not displayed but I did get one
jffs2_scan_dirent_node(): Node CRC failed on node at 0x0046a7f0
read 0xffffffff calculated 0xdec8161b
message.
As I do mount read only on shutdown I assume this is corruption ?
Cheers
Simon.
__________________________
Simon Haynes - Baydel
Phone : 44 (0) 1372 378811
Email : simon@baydel.com
__________________________
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-25 9:49 ` simon
@ 2004-02-25 10:25 ` David Woodhouse
2004-02-26 11:08 ` Simon Haynes
0 siblings, 1 reply; 38+ messages in thread
From: David Woodhouse @ 2004-02-25 10:25 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Wed, 2004-02-25 at 09:49 +0000, simon@baydel.com wrote:
> As I do mount read only on shutdown I assume this is corruption ?
Maybe. I doubt it's _harmful_ but I am very interested.
Any chance you can run with CONFIG_JFFS2_FS_DEBUG=1 and log _all_ the
messages over a serial console, then see what it wrote at the offending
address when you get a CRC error?
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-25 10:25 ` David Woodhouse
@ 2004-02-26 11:08 ` Simon Haynes
2004-02-26 11:55 ` David Woodhouse
2004-03-03 15:31 ` David Woodhouse
0 siblings, 2 replies; 38+ messages in thread
From: Simon Haynes @ 2004-02-26 11:08 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On 25 Feb 2004 at 10:25, David Woodhouse wrote:
> On Wed, 2004-02-25 at 09:49 +0000, simon@baydel.com wrote:
> > As I do mount read only on shutdown I assume this is corruption ?
>
> Maybe. I doubt it's _harmful_ but I am very interested.
>
This has taken some time to produce. I have a 40Mb logfile which thankfully
compresses to 4Mb. To generate the log I booted the system via the network with
JFFS2 patched to use /dev/ram MAJOR, but no JFFS2 debug. I erased the SMC and
made a clean JFFS2 filesystem. I then copied all of my root files. I mounted and
umounted this a few times and each time I created and deleted a few files. I did not
get one error.
I then rebooted using a similar kernel with CONFIG_JFFS2_FS_DEBUG=1 and
passed arguments "root=/dev/ram1 debug".
It took about 6 hours before I could log in. I then halted the system. On the first
reboot, fortunately, I did get a CRC error but I cannot find where this was previously
written. The node is 0x000303f0.
I also observe the Empty flash XXXX ends at XXX. Which do not appear before the
filesystem is used as rootfs and restarted.
Beyond that I don't really know what I am looking for in the log. I can mail it to you
personally but as I said it's 4Mb compressed.
Cheers
Simon.
__________________________
Simon Haynes - Baydel
Phone : 44 (0) 1372 378811
Email : simon@baydel.com
__________________________
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-26 11:08 ` Simon Haynes
@ 2004-02-26 11:55 ` David Woodhouse
2004-03-03 15:31 ` David Woodhouse
1 sibling, 0 replies; 38+ messages in thread
From: David Woodhouse @ 2004-02-26 11:55 UTC (permalink / raw)
To: Simon Haynes; +Cc: linux-mtd
On Thu, 2004-02-26 at 11:08 +0000, Simon Haynes wrote:
> Beyond that I don't really know what I am looking for in the log. I can mail it to you
> personally but as I said it's 4Mb compressed.
Lemme see it. Although if there's a node at 0x000303f0 and no existence
of 000303f0 in the logfile, your logfile is missing bits.
Also take a dump of the offending file system.
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-02-26 11:08 ` Simon Haynes
2004-02-26 11:55 ` David Woodhouse
@ 2004-03-03 15:31 ` David Woodhouse
2004-03-08 15:10 ` Simon Haynes
1 sibling, 1 reply; 38+ messages in thread
From: David Woodhouse @ 2004-03-03 15:31 UTC (permalink / raw)
To: Simon Haynes; +Cc: linux-mtd
On Thu, 2004-02-26 at 11:08 +0000, Simon Haynes wrote:
> Beyond that I don't really know what I am looking for in the log. I can mail it to you
> personally but as I said it's 4Mb compressed.
(for the benefit of the peanut gallery...)
The CRC error was harmless. It was a node being garbage-collected, which
we didn't bother to finish writing because it contained only a redundant
copy of data which existed elsewhere on the flash already.
We could try to be slightly quieter about such things, but then we might
actually miss something which _is_ a problem. Better to be concerned
when there isn't a problem, than to be blissfully unaware when there
_is_ one. Perhaps.
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-03-03 15:31 ` David Woodhouse
@ 2004-03-08 15:10 ` Simon Haynes
2004-03-09 15:33 ` Simon Haynes
0 siblings, 1 reply; 38+ messages in thread
From: Simon Haynes @ 2004-03-08 15:10 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Wednesday 03 Mar 2004 3:31 pm, David Woodhouse wrote:
> On Thu, 2004-02-26 at 11:08 +0000, Simon Haynes wrote:
> > Beyond that I don't really know what I am looking for in the log. I can
>
> We could try to be slightly quieter about such things, but then we might
> actually miss something which _is_ a problem. Better to be concerned
> when there isn't a problem, than to be blissfully unaware when there
> _is_ one. Perhaps.
As you suggested I have changed the JFFS2 remount code to flush wbuf when the
filesystem is mounted read only. As I said via IRC I have performed tens of
reboots and I have not seen any CRC messages.
Now on occasions, when the kernel is mounting JFFS2 as root on NAND I get.
NAND device: Manufacturer ID: 0xec, Chip ID: 0x79 (Samsung NAND 128MiB 3,3V)
Creating 3 MTD partitions on "NAND 128MiB 3,3V":
0x00000000-0x01000000 : "Boot / config partition"
mtd: Giving out device 0 to Boot / config partition
0x01000000-0x05000000 : "JFFS2 Root Filesystem partition"
mtd: Giving out device 1 to JFFS2 Root Filesystem partition
0x05000000-0x08000000 : "Write Cache Backup partition"
mtd: Giving out device 2 to Write Cache Backup partition
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 4096 bind 4096)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
jffs2: Erase block size too small (16KiB). Using virtual blocks size (32KiB)
instead
ofs 0x00c00400 has already been seen. Skipping
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00408: 0x273c
instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00420: 0x404d
instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00424: 0x404d
instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00428: 0x404d
instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c0043c: 0x0f33
instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00440: 0x88f8
instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00444: 0x4d61
instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00448: 0x2039
instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c0044c: 0x343a
instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00450: 0x3a32
instead
Further such events for this erase block will not be printed
Can you suggest what might be going on ?
Cheers
Simon.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-03-08 15:10 ` Simon Haynes
@ 2004-03-09 15:33 ` Simon Haynes
2004-03-16 16:14 ` David Woodhouse
0 siblings, 1 reply; 38+ messages in thread
From: Simon Haynes @ 2004-03-09 15:33 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Monday 08 Mar 2004 3:10 pm, Simon Haynes wrote:
> On Wednesday 03 Mar 2004 3:31 pm, David Woodhouse wrote:
> > On Thu, 2004-02-26 at 11:08 +0000, Simon Haynes wrote:
> > > Beyond that I don't really know what I am looking for in the log. I can
> ofs 0x00c00400 has already been seen. Skipping
> jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00c00408:
> 0x273c instead
Since lunch interrupted our converstation yesterday I have tried a few
things. The SMC partition I was using was 64 Mb and contained a bad block. I
modified the kernel to make this partition 48Mb. I booted root over nfs and
tried to rebuild root on the SMC. I can no longer mount the original SMC so I
tried a new one which worked OK.
-bash-2.05b# /mtd/eraseall /dev/mtd1 > /dev/null
nand_erase: attempt to erase a bad block at page 0x0001ee60
/mtd/eraseall: /dev/mtd1: MTD Erase failure: Input/output error
-bash-2.05b# mount -t jffs2 /dev/ram1 /smc
Cowardly refusing to erase blocks on filesystem with no valid JFFS2 nodes
empty_blocks 2047, bad_blocks 0, c->nr_blocks 2048
mount: wrong fs type, bad option, bad superblock on /dev/ram1,
or too many mounted file systems
-bash-2.05b# /mtd/eraseall /dev/mtd1 > /dev/null
-bash-2.05b# mount -t jffs2 /dev/ram1 /smc
-bash-2.05b#
After 5 reboots the new SMC gave this magic bitmask failure.
jffs2: Erase block size too small (16KiB). Using virtual blocks size (32KiB)
instead
ofs 0x000a8400 has already been seen. Skipping, jeb 0xa8000, sector size
0x8000
saved ofs 0x000a8000, previous 0xa7fff, buf_len 0x7c00, scanned 0x0
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a8408: 0x273c
instead, 0x0
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a841c: 0x000b
instead, 0x0
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a8420: 0x404d
.
.
.
It would appear that the first buffer location does not contain 0xFFFFFFFF
and cleanmarkerfound is set but I don't know what this means to JFFS2.
I then switched the partition back to 64Mb, set the kernel to use the read
only block device and hacked the MAJOR number for JFFS2 to be the same as
/dev/ram. I have rebooted the system at least 20 times and as yet I have not
seen any errors.
I am unsure as where to go next besides trying more reboots.
Any ideas ?
Cheers
Simon.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-03-09 15:33 ` Simon Haynes
@ 2004-03-16 16:14 ` David Woodhouse
2004-03-19 10:37 ` Simon Haynes
0 siblings, 1 reply; 38+ messages in thread
From: David Woodhouse @ 2004-03-16 16:14 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Tue, 2004-03-09 at 15:33 +0000, Simon Haynes wrote:
> -bash-2.05b# mount -t jffs2 /dev/ram1 /smc
> Cowardly refusing to erase blocks on filesystem with no valid JFFS2 nodes
> empty_blocks 2047, bad_blocks 0, c->nr_blocks 2048
> mount: wrong fs type, bad option, bad superblock on /dev/ram1,
> or too many mounted file systems
This I believe we put down to user error?
> After 5 reboots the new SMC gave this magic bitmask failure.
>
> jffs2: Erase block size too small (16KiB). Using virtual blocks size (32KiB)
> instead
> ofs 0x000a8400 has already been seen. Skipping, jeb 0xa8000, sector size
> 0x8000
> saved ofs 0x000a8000, previous 0xa7fff, buf_len 0x7c00, scanned 0x0
> jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000a8408: 0x273c
> instead, 0x0
That was a bug in the scanning code. Should be fixed in v1.58 of scan.c
in CVS. Please could you try that and let me know if it works?
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-03-16 16:14 ` David Woodhouse
@ 2004-03-19 10:37 ` Simon Haynes
2004-03-19 11:11 ` David Woodhouse
0 siblings, 1 reply; 38+ messages in thread
From: Simon Haynes @ 2004-03-19 10:37 UTC (permalink / raw)
To: David Woodhouse, linux-mtd; +Cc: linux-mtd
On 16 Mar 2004 at 16:14, David Woodhouse wrote:
> On Tue, 2004-03-09 at 15:33 +0000, Simon Haynes wrote:
> > -bash-2.05b# mount -t jffs2 /dev/ram1 /smc
> > Cowardly refusing to erase blocks on filesystem with no valid JFFS2
>
> This I believe we put down to user error?
Yes, this was a problem I introduced when trying to track down why I got the Magic
bitmask messages.
>
> > After 5 reboots the new SMC gave this magic bitmask failure.
> >
> > jffs2: Erase block size too small (16KiB). Using virtual blocks size
> > (32KiB) instead ofs 0x000a8400 has already been seen. Skipping, jeb
>
> That was a bug in the scanning code. Should be fixed in v1.58 of
> scan.c in CVS. Please could you try that and let me know if it works?
>
I have not had chance a yet to download scan from CVS. However I did make the
simple change you suggested and everything worked fine.
Thankyou, Thankyou, Thankyou
Simom.
__________________________
Simon Haynes - Baydel
Phone : 44 (0) 1372 378811
Email : simon@baydel.com
__________________________
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: JFFS2 Corruption.
2004-03-19 10:37 ` Simon Haynes
@ 2004-03-19 11:11 ` David Woodhouse
0 siblings, 0 replies; 38+ messages in thread
From: David Woodhouse @ 2004-03-19 11:11 UTC (permalink / raw)
To: Simon Haynes; +Cc: linux-mtd
On Fri, 2004-03-19 at 10:37 +0000, Simon Haynes wrote:
> > That was a bug in the scanning code. Should be fixed in v1.58 of
> > scan.c in CVS. Please could you try that and let me know if it works?
> >
> I have not had chance a yet to download scan from CVS. However I did make the
> simple change you suggested and everything worked fine.
What I committed was slightly more complicated -- and also broken.
Thomas fixed it though. :)
--
dwmw2
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2004-03-19 11:11 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-19 16:48 JFFS2 Corruption simon
2004-02-23 11:07 ` simon
2004-02-24 9:48 ` simon
2004-02-24 12:00 ` David Woodhouse
2004-02-24 12:54 ` Simon Haynes
2004-02-24 13:04 ` Simon Haynes
2004-02-24 13:40 ` Simon Haynes
2004-02-24 14:22 ` David Woodhouse
2004-02-24 14:25 ` Simon Haynes
2004-02-24 14:56 ` David Woodhouse
2004-02-24 14:58 ` Simon Haynes
2004-02-24 15:35 ` David Woodhouse
2004-02-24 15:47 ` Simon Haynes
2004-02-24 16:14 ` David Woodhouse
2004-02-24 16:17 ` Simon Haynes
2004-02-24 16:51 ` David Woodhouse
2004-02-24 17:05 ` Simon Haynes
2004-02-24 18:05 ` David Woodhouse
2004-02-24 18:04 ` Simon Haynes
2004-02-25 9:49 ` simon
2004-02-25 10:25 ` David Woodhouse
2004-02-26 11:08 ` Simon Haynes
2004-02-26 11:55 ` David Woodhouse
2004-03-03 15:31 ` David Woodhouse
2004-03-08 15:10 ` Simon Haynes
2004-03-09 15:33 ` Simon Haynes
2004-03-16 16:14 ` David Woodhouse
2004-03-19 10:37 ` Simon Haynes
2004-03-19 11:11 ` David Woodhouse
2004-02-24 17:12 ` Simon Haynes
2004-02-24 16:55 ` David Woodhouse
-- strict thread matches above, loose matches on Subject: below --
2004-02-05 13:15 JFFS2 corruption Florian Schirmer
2004-02-08 10:37 ` David Woodhouse
2004-02-08 11:38 ` Florian Schirmer
2004-02-08 11:53 ` David Woodhouse
2004-02-08 17:02 ` Florian Schirmer
2004-02-08 17:13 ` David Woodhouse
2004-02-08 11:28 ` David Woodhouse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox