* Reiser3 bug in 2.6.11.11
@ 2005-11-14 23:13 Konstantin Münning
2005-11-16 23:36 ` evilninja
0 siblings, 1 reply; 3+ messages in thread
From: Konstantin Münning @ 2005-11-14 23:13 UTC (permalink / raw)
To: reiserfs-list
Hello!
A few days ago I encountered a reiser3 bug in vanilla kernel 2.6.11.11.
I have no idea if it has been fixed in a recent kernel but here some
info if somebody is interested.
Short: some values seem to be untested and a corrupted fs generates
kernel oopses.
Details: on a laptop something caused a fs corruption (probably in
connection with swsusp but that's only a guess as I got it a few days
later) which caused it to oops/panic/hang shortly after first accesses
to the disk. Grub seems to have no problems and initial access was OK as
init started and first system startup messages appeared. But then a
bunch of oopses appeared fast so I was not able to find which part of
the kernel was causing the first error and then the keyboard stops
responding so I couldn't scrollback. At that point only powering down
was possible. I can't tell if it happens while fs was RO or when/after
it was remounted RW. And as there was no network or disk access at that
point recovering some information was not possible.
But maybe the log files of reiserfsck can help identify the culprit
(could it be something with the blocksize messages?):
reiserfsck:
---------------------------------
bad_path: block 8435, pointer 11: The used space (3888) of the child
block (32773) is not equal to the (blocksize (4096) - free space (224) -
header size (24))
bad_path: block 2283225, pointer 29: The used space (4072) of the child
block (6160385) is not equal to the (blocksize (4096) - free space (180)
- header size (24))
block 1049101: The number of items (59) is incorrect, should be (57)
the problem in the internal node occured (1049101), whole subtree is
skipped
bad_path: block 3145901, pointer 40: The used space (2432) of the child
block (557840) is not equal to the (blocksize (4096) - free space (1740)
- header size (24))
vpf-10640: The on-disk and the correct bitmaps differs.
---------------------------------
reiserfsck -rebuild-tree:
---------------------------------
####### Pass 0 #######
block 1049101: The number of items (59) is incorrect, should be (57) -
corrected
block 1049101: The free space (65504) is incorrect, should be (68) -
corrected
block 1545017: The number of items (2) is incorrect, should be (0) -
corrected
block 1545017: The free space (43432) is incorrect, should be (4072) -
corrected
block 4131356: The number of items (7) is incorrect, should be (0) -
corrected
block 4131356: The free space (0) is incorrect, should be (4072) - corrected
508677 directory entries were hashed with "r5" hash.
####### Pass 1 #######
####### Pass 2 #######
####### Pass 3 #########
vpf-10650: The directory [2 5300] has the wrong size in the StatData
(5544) - corrected to (5504)
vpf-10680: The file [397629 106971] has the wrong block count in the
StatData (0) - corrected to (8)
rebuild_semantic_pass: The entry [397629 111711] ("xinetd.pid") in
directory [397629 403911] points to nowhere - is removed
vpf-10680: The file [397629 111702] has the wrong block count in the
StatData (8) - corrected to (0)
vpf-10650: The directory [397629 403911] has the wrong size in the
StatData (432) - corrected to (400)
vpf-10650: The directory [102361 1849502] has the wrong size in the
StatData (840) - corrected to (808)
####### Pass 3a (lost+found pass) #########
---------------------------------
As you can see, it seems to be a tiny corruption but with devastating
results ;-). No data seemed to be lost after rebuild-tree.
Have a nice day,
--
Konstantin Münning
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Reiser3 bug in 2.6.11.11
2005-11-14 23:13 Reiser3 bug in 2.6.11.11 Konstantin Münning
@ 2005-11-16 23:36 ` evilninja
2005-11-17 10:19 ` Konstantin Münning
0 siblings, 1 reply; 3+ messages in thread
From: evilninja @ 2005-11-16 23:36 UTC (permalink / raw)
To: Konstantin Münning; +Cc: reiserfs-list
Konstantin Münning schrieb:
> init started and first system startup messages appeared. But then a
> bunch of oopses appeared fast so I was not able to find which part of
> the kernel was causing the first error and then the keyboard stops
> responding so I couldn't scrollback. At that point only powering down
> was possible. I can't tell if it happens while fs was RO or when/after
> it was remounted RW. And as there was no network or disk access at that
> point recovering some information was not possible.
can you reproduce the oopses and redirect the errors/oops message to a
serial console or netconsole? does the error go away with a current
kernel? perhaps some reiserfs guru can decode them....
--
BOFH excuse #293:
You must've hit the wrong any key.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Reiser3 bug in 2.6.11.11
2005-11-16 23:36 ` evilninja
@ 2005-11-17 10:19 ` Konstantin Münning
0 siblings, 0 replies; 3+ messages in thread
From: Konstantin Münning @ 2005-11-17 10:19 UTC (permalink / raw)
To: reiserfs-list; +Cc: evilninja@gmx.net
Hi!
evilninja@gmx.net wrote:
> Konstantin Münning schrieb:
>
>> init started and first system startup messages appeared. But then a
>> bunch of oopses appeared fast so I was not able to find which part of
>> the kernel was causing the first error and then the keyboard stops
>> responding so I couldn't scrollback. At that point only powering down
>> was possible. I can't tell if it happens while fs was RO or when/after
>> it was remounted RW. And as there was no network or disk access at that
>> point recovering some information was not possible.
>
>
> can you reproduce the oopses and redirect the errors/oops message to a
> serial console or netconsole? does the error go away with a current
> kernel? perhaps some reiserfs guru can decode them....
Sorry I had no time to play as it is an "in-production" laptop and had
to be functional fast so I currently have no way to reproduce the fault.
Redirecting output would require another kernel and the device has no
serial or network port :-(.
Maybe extracting metadata (debugreiserfs -p) would have been good for
debugging but at that point I had nowhere to store it. If this happens a
second time I will find a way to save it.
As there were only a few fs errors reported by fsck probably looking
over range/overflow checks in the code regarding used space/block size
might give a hint...
Have a nice day,
--
Konstantin Münning
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-11-17 10:19 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-14 23:13 Reiser3 bug in 2.6.11.11 Konstantin Münning
2005-11-16 23:36 ` evilninja
2005-11-17 10:19 ` Konstantin Münning
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.