* Kernel oops with an unclean unmounted filesystem
@ 2003-02-19 12:35 Enrico Scholz
2003-02-19 14:42 ` Thomas Gleixner
2003-02-19 20:22 ` Thomas Gleixner
0 siblings, 2 replies; 8+ messages in thread
From: Enrico Scholz @ 2003-02-19 12:35 UTC (permalink / raw)
To: linux-mtd
Hello,
after creating and/or modifying a file in a JFFS2 filesystem, I
turned off the device without unmounting/syncing the device.
Because the JFFS documentation states
| JFFS is aimed at providing a crash/powerdown-safe filesystem
JFFS2 should cope with such a situation.
Indeed, when restarting the device and mounting the filesystem
the kernel oopses:
| jffs2_get_inode_nodes() for ino 283 returned -12
| Checked all inodes but still 0x44 bytes of unchecked space?
| kernel BUG at fs/jffs2/gc.c:140!
| Unable to handle kernel NULL pointer dereference at virtual address 00000000
| ...
| Backtrace:
| [<c0026030>] (__bug+0x0/0x58) from [<c00c6f50>] (jffs2_garbage_collect_pass+0x240/0x66c)
| r4 = C1E38000
| [<c00c6d10>] (jffs2_garbage_collect_pass+0x0/0x66c) from [<c00c9ff8>] (jffs2_garbage_collect_thread+0x1c4/0x200)
| r8 = FFFFFFFF r7 = 00000000 r6 = 00000000 r5 = C1F2E000
| r4 = C1E38000
| [<c00c9e34>] (jffs2_garbage_collect_thread+0x0/0x200) from [<c0022040>] (kernel_thread+0x40/0x48)
| r6 = C1F2E014 r5 = 00000000 r4 = C1F2E000
| Code: 1b005243 e59f0014 eb005241 e3a03000 (e5833000)
I am running the 2.5.59 kernel + rmk patches on an ARM XScale
platform. I tried the recent jffs2 fs-driver from CVS also but
the oops still happens. The unclean "unmounting" happened with
the original 2.5.59 JFFS2 driver.
I get an I/O error when reading the questionable directory before
the gc-thread dies.
Does there exists a way to recover or have I to recreate the
filesystem from scratch?
Enrico
--
q: If you were young again, would you start writing TeX again or would
you use Microsoft Word, or another word processor?
a: I hope to die before I have to use Microsoft Word.
-- Harald Koenig <koenig@tat.physik.uni-tuebingen.de> asking D.E.Knuth
^ permalink raw reply [flat|nested] 8+ messages in thread
* Kernel oops with an unclean unmounted filesystem
2003-02-19 12:35 Kernel oops with an unclean unmounted filesystem Enrico Scholz
@ 2003-02-19 14:42 ` Thomas Gleixner
2003-02-19 20:22 ` Thomas Gleixner
1 sibling, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2003-02-19 14:42 UTC (permalink / raw)
To: linux-mtd
On Wednesday 19 February 2003 13:35, Enrico Scholz wrote:
> Indeed, when restarting the device and mounting the filesystem
> the kernel oopses:
> | jffs2_get_inode_nodes() for ino 283 returned -12
> | Checked all inodes but still 0x44 bytes of unchecked space?
> | kernel BUG at fs/jffs2/gc.c:140!
> | Unable to handle kernel NULL pointer dereference at virtual address
> | 00000000 ...
> | Backtrace:
> | [<c0026030>] (__bug+0x0/0x58) from [<c00c6f50>]
> | (jffs2_garbage_collect_pass+0x240/0x66c) r4 = C1E38000
> | [<c00c6d10>] (jffs2_garbage_collect_pass+0x0/0x66c) from [<c00c9ff8>]
> | (jffs2_garbage_collect_thread+0x1c4/0x200) r8 = FFFFFFFF r7 = 00000000
> | r6 = 00000000 r5 = C1F2E000
> | r4 = C1E38000
> | [<c00c9e34>] (jffs2_garbage_collect_thread+0x0/0x200) from [<c0022040>]
> | (kernel_thread+0x40/0x48) r6 = C1F2E014 r5 = 00000000 r4 = C1F2E000
> | Code: 1b005243 e59f0014 eb005241 e3a03000 (e5833000)
Strange
> I am running the 2.5.59 kernel + rmk patches on an ARM XScale
> platform. I tried the recent jffs2 fs-driver from CVS also but
> the oops still happens. The unclean "unmounting" happened with
> the original 2.5.59 JFFS2 driver.
> I get an I/O error when reading the questionable directory before
> the gc-thread dies.
> Does there exists a way to recover or have I to recreate the
> filesystem from scratch?
It should be a way. But first we have to know how it happens
Can you please
1. build a device image (cat /dev/mtdX >image)
2. turn on debugging in JFFS2 (JFFS2_DEBUG = 1)
3. set debug level to 9 before mounting the fs (echo 9
>/proc/sys/kernel/printk)
4. get logs from serial console (not from klogd) and record them.
Send the logs and image to me in private
--
Thomas
________________________________________________________________________
linutronix - competence in embedded & realtime linux
http://www.linutronix.de
mail: tglx at linutronix.de
^ permalink raw reply [flat|nested] 8+ messages in thread
* Kernel oops with an unclean unmounted filesystem
2003-02-19 12:35 Kernel oops with an unclean unmounted filesystem Enrico Scholz
2003-02-19 14:42 ` Thomas Gleixner
@ 2003-02-19 20:22 ` Thomas Gleixner
2003-02-19 20:18 ` Enrico Scholz
2003-02-20 13:34 ` David Woodhouse
1 sibling, 2 replies; 8+ messages in thread
From: Thomas Gleixner @ 2003-02-19 20:22 UTC (permalink / raw)
To: linux-mtd
On Wednesday 19 February 2003 13:35, Enrico Scholz wrote:
> Hello,
>
> after creating and/or modifying a file in a JFFS2 filesystem, I
> turned off the device without unmounting/syncing the device.
What exactly did you do there ? I mean creating and/or modifying.
The problem is the node is written with valid CRC, so it seems to be
correct, but the node content is totally crap. Compressed data size =
0x6b6b6b6b.
--
Thomas
________________________________________________________________________
linutronix - competence in embedded & realtime linux
http://www.linutronix.de
mail: tglx at linutronix.de
^ permalink raw reply [flat|nested] 8+ messages in thread
* Kernel oops with an unclean unmounted filesystem
2003-02-19 20:22 ` Thomas Gleixner
@ 2003-02-19 20:18 ` Enrico Scholz
2003-02-20 11:48 ` Thomas Gleixner
2003-02-20 13:34 ` David Woodhouse
1 sibling, 1 reply; 8+ messages in thread
From: Enrico Scholz @ 2003-02-19 20:18 UTC (permalink / raw)
To: linux-mtd
Thomas Gleixner <tglx@linutronix.de> writes:
>> after creating and/or modifying a file in a JFFS2 filesystem,
>> I turned off the device without unmounting/syncing the device.
> What exactly did you do there ? I mean creating and/or modifying.
AFAIR I did:
1. rebooted the device and used the JFFS2 fs as the root-fs
2. cd /etc/minit/getty
3. - mv depends depends_
- { echo 'getty/sleep'; cat depends_; } >depends
- rm -f depends_
4. - mkdir sleep
- ln -s /bin/sleep sleep/run
- echo 1 >sleep/params
- touch sleep/sync
5. waited 1-2 seconds
6. pressed the reset-button of the device
I am not sure about the order of the operations in 3 and 4, or if
3 and 4 were swapped.
The used toolset was busybox linked against glibc-2.3.1.
Because I wanted to keep a reproducible testcase I have not tried
to repeat it.
Enrico
^ permalink raw reply [flat|nested] 8+ messages in thread* Kernel oops with an unclean unmounted filesystem
2003-02-19 20:18 ` Enrico Scholz
@ 2003-02-20 11:48 ` Thomas Gleixner
2003-02-24 12:21 ` Enrico Scholz
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2003-02-20 11:48 UTC (permalink / raw)
To: linux-mtd
On Wednesday 19 February 2003 21:18, Enrico Scholz wrote:
> 1. rebooted the device and used the JFFS2 fs as the root-fs
> 2. cd /etc/minit/getty
> 3. - mv depends depends_
> - { echo 'getty/sleep'; cat depends_; } >depends
> - rm -f depends_
> 4. - mkdir sleep
> - ln -s /bin/sleep sleep/run
> - echo 1 >sleep/params
> - touch sleep/sync
> 5. waited 1-2 seconds
> 6. pressed the reset-button of the device
> I am not sure about the order of the operations in 3 and 4, or if
> 3 and 4 were swapped.
No touch sleep/sync was your last action.
Current CVS has a sanity check for this, so it should not Oops any more.
It will give you just a warning.
Please try, if you can reproduce it, by using the same procedure.
Nobody can explain how it got there.
--
Thomas
________________________________________________________________________
linutronix - competence in embedded & realtime linux
http://www.linutronix.de
mail: tglx at linutronix.de
^ permalink raw reply [flat|nested] 8+ messages in thread* Kernel oops with an unclean unmounted filesystem
2003-02-20 11:48 ` Thomas Gleixner
@ 2003-02-24 12:21 ` Enrico Scholz
0 siblings, 0 replies; 8+ messages in thread
From: Enrico Scholz @ 2003-02-24 12:21 UTC (permalink / raw)
To: linux-mtd
Thomas Gleixner <tglx@linutronix.de> writes:
>> ...
>> 4. - mkdir sleep
>> - ln -s /bin/sleep sleep/run
>> - echo 1 >sleep/params
>> - touch sleep/sync
>> 5. waited 1-2 seconds
>> 6. pressed the reset-button of the device
> ...
> No touch sleep/sync was your last action.
> Current CVS has a sanity check for this, so it should not Oops any more.
Thanks; with recent CVS version (20030223) it does not oops
anymore when mounting the filesystem.
When accessing the questionable directory (ls
/etc/minit/getty/sleep) the first time, the 'ls' process hangs in
D state. After rebooting 'ls' does not hang anymore but gives an
I/O error at sleep/sync. I can not remove this file; but because
the corruption seems to be caused by the writing-part of JFFS2
(see below), this is not a problem.
> It will give you just a warning. Please try, if you can
> reproduce it, by using the same procedure.
I can reproduce it 100% on a 2.5.59 kernel with any 'touch foo'
command. When using the CVS version of jffs2 (and commmenting out
the unpoint() stuff which does not compile with 2.5), things seem
to be fine and the filesystem does not become corrupted.
Thanks for your help
Enrico
^ permalink raw reply [flat|nested] 8+ messages in thread
* Kernel oops with an unclean unmounted filesystem
2003-02-19 20:22 ` Thomas Gleixner
2003-02-19 20:18 ` Enrico Scholz
@ 2003-02-20 13:34 ` David Woodhouse
2003-02-24 12:37 ` Enrico Scholz
1 sibling, 1 reply; 8+ messages in thread
From: David Woodhouse @ 2003-02-20 13:34 UTC (permalink / raw)
To: linux-mtd
On Wed, 2003-02-19 at 20:22, Thomas Gleixner wrote:
> What exactly did you do there ? I mean creating and/or modifying.
> The problem is the node is written with valid CRC, so it seems to be
> correct, but the node content is totally crap. Compressed data size =
> 0x6b6b6b6b.
That's slab poisoning. Looking at the actual node, the 'offset',
'csize', 'dsize', 'usercompr' and 'flags' fields all seem to be filled
with 0x6B, but other fields are OK.
I suspect that we're allocating a jffs2_raw_inode structure, and we're
being given a slab address that someone else has already allocated and
freed. We're filling in some of the fields (offset, csize, etc), and
then said 'someone else' is freeing it _again_. At which point the slab
debugging code memsets it to all 0x6B.
Then we fill in the rest of the fields, calculate the crcs and write it
to the flash, blissfully unaware that someone stomped on the offset,
csize, etc fields after we'd set them up.
What other drivers are present in your system? Can you reproduce this?
We can stick debugging checks in the write path to check for 0x6B in
bogus places, and try to debug further.
--
dwmw2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Kernel oops with an unclean unmounted filesystem
2003-02-20 13:34 ` David Woodhouse
@ 2003-02-24 12:37 ` Enrico Scholz
0 siblings, 0 replies; 8+ messages in thread
From: Enrico Scholz @ 2003-02-24 12:37 UTC (permalink / raw)
To: linux-mtd
David Woodhouse <dwmw2@infradead.org> writes:
>> What exactly did you do there ? I mean creating and/or
>> modifying. The problem is the node is written with valid
>> CRC, so it seems to be correct, but the node content is
>> totally crap. Compressed data size = 0x6b6b6b6b.
>
> That's slab poisoning...
>
> What other drivers are present in your system?
I am running a heavily patched 2.5.59 kernel (rmk + ported PXA +
specific architecture patches) with potentially broken hardware
(PXA250 B1 stepping) so it is possible that other parts of the
system are guilty.
But because the problem disappears with recent CVS version of
JFFS2 it is probably caused by it. Since it is fixed in CVS I do
not know if it is worth to trace it further; when you give me
patches adding additional debugging checks I can try to get the
necessary information.
> Can you reproduce this?
Yes; 100% when doing a 'touch foobar' on the original 2.5.59
kernel (JFFS2 is not touched by my patches).
Thanks for your explainations
Enrico
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-02-24 12:37 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-19 12:35 Kernel oops with an unclean unmounted filesystem Enrico Scholz
2003-02-19 14:42 ` Thomas Gleixner
2003-02-19 20:22 ` Thomas Gleixner
2003-02-19 20:18 ` Enrico Scholz
2003-02-20 11:48 ` Thomas Gleixner
2003-02-24 12:21 ` Enrico Scholz
2003-02-20 13:34 ` David Woodhouse
2003-02-24 12:37 ` Enrico Scholz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox