* Strange problems/bugs with reiserfs and reiserfschk
@ 2005-08-07 22:02 Konstantin Münning
[not found] ` <b14e81f0050807152830555213@mail.gmail.com>
2005-08-08 13:18 ` Vitaly Fertman
0 siblings, 2 replies; 7+ messages in thread
From: Konstantin Münning @ 2005-08-07 22:02 UTC (permalink / raw)
To: reiserfs-list
Hi Folks!
There seems to be something I would call a bud in ReiserFS at least in
kernel 2.6.11.11 which can cause the system/computer to freeze. It is
caused by a corruption of the FS but at that point I expected to have
some inaccessable files which I already know from FS corruptions but not
to hang the system. If someone thinks it's worth investigating, please
read further.
Yes, I know that working with a corrupt FS is nothing good but my
intention was simply to save as much of the files as possible before
doing a rebuild-tree just in case it's all gone after that. As I said,
my experience with corrupt ReiserFS was good with the knowledge that
some files/direcories would be incaccessible. But this time the System
was rendered unuseable when accessing certain directories - no more
mount/umount or even sync were possible (they simply did not return) so
there was no way to shutdown the machine. IMHO this should be considered
as a severe bug - refusing to read a corrupt portions of a FS is OK but
rendering the system unuseable is bad.
I'm wondering what kind of information I can provide so the source of
this can be found. Here some but if you want more, please tell me:
Kernel 2.6.11.11, Gentoo-Linux, SMP (HyperThreading P4, 3GHz)
Here some portions of /var/log/messages which may show what's about:
the messages just before the system got unuseable:
**************
Aug 5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid format found in block 27594920. Fsck?
Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345
does not match to the expected one 1
(****snip****)
Aug 5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid format found in block 27594920. Fsck?
Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345
does not match to the expected one 1
Aug 5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid format found in block 27594920. Fsck?
Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345
does not match tond in block 27594920. Fsck?
Aug 5 22:17:22 master ReiserFS: warmatch to the expected one 1
Aug 5 22:17:22 master unparseable log message: "<nd in block 27594920.
Fsck?"
Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node levematch
to the expected ond in block 27594920. Fsck?
Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: nmatch to the
expected one 1
Aug 5 22:17:22 master unparseable log message: "<nd in block 27594920.
Fsck?"
Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node lmatch to
the expected ond in block 27594920. Fsck?
Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node match to
the expected ond in block 27594920. Fsck?
(****snip****)
Aug 5 22:17:26 master ReiserFS: warning:nd in block 27608085. Fsck?
Aug 5 22:17:26 master ReiserFS: warninmatch to the expected onend in
block 2760 nd in block 27608085. Fsck?
Aug 5 22:17:26 master ReiserFS: warning: is_tree_node: nodematch to the
expected nd in block 27608085. Fsck?
(****snip****)
Aug 5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid
format found in block 122418791. Fsck?
Aug 5 22:18:03 master ReiserFS: warning: is_tree_node: node level 18499
does no
t match to the expected one 1
Aug 5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid
format found in block 122418791. Fsck?
Aug 5 22:18:03 master init_special_inode: bogus i_mode (177777)
Aug 5 22:18:03 master ReiserFS: warning: is_tree_node: node level 65471
does no
t match to the expected one 1
Aug 5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid
format found in block 82406293. Fsck?
**************
The interesting point is that the messages are getting weird at some
point - see portion after the first (****snip****). As if something is
overwriting an internal buffer or something. Maye caused by high
frequency of messages or some race condition between processors? I have
no idea if this is an indication of the suspected bug but that seems
likely to me. The last portion are the last messages just before the
next boot of the computer. Just if you ask - CPU/Memory of that server
are fine as long as Memtest86(+) can tell. So, what's next?
Now to the second part. After giving up to save more data (well, I saved
the important 30% of these 400GB) I started a reiserfsck --rebuild-tree.
It worked quite good until about the end. There it seems to be frozen
and consumes 100% CPU. Here some data:
reiserfsprogs-3.6.19, messages of reiserfsck:
**************
.pass1: block 145817616, item 1, entry 0: The entry ".." of the [259961
259992 0x2 DIR (3)] is hashed with not set whereas proper hash is "r5" -
deleted
100% left 0, 212 /sec
Flushing..finished
268526 leaves read
203192 inserted
- pointers in indirect items pointing to
metadata 1890 (zeroed)
65334 not inserted
non-unique pointers in indirect items (zeroed) 28669
####### Pass 2 #######
Pass 2:
0%....20%....40%..vpf-10260: The file we are inserting the new item
(2198390 2199894 0x381 DRCT (2), len 1088, location 3008 entry count
65535, fsck need 0, format new) into has no StatData, insertion was skipped
vpf-10260: The file we are inserting the new item (2195955 2195990 0x1
DRCT (2), len 792, location 3304 entry count 65535, fsck need 0, format
new) into has no StatData, insertion was skipped
(****snip****)
vpf-10260: The file we are inserting the new item (563378 563928 0x1
DRCT (2), len 1384, location 2712 entry count 65535, fsck need 0, format
new) into has no StatData, insertion was skipped
vpf-10260: The file we are inserting the new item (563378 564627 0xa9
DRCT (2), len 688, location 3408 entry count 65535, fsck need 0, format
new) into has no StatData, insertion was skipped
left 32269, 422 /sec
**************
And that's the point where it's staying for hours now consuming 100%
CPU. I didn't try to abort and start again (yet) as it takes about 20
hours to get so far (the device is big and not so fast) so I'm still
hoping it may finish by itself...
So, is it worth investigating and if, what other info I could provide?
If nobody is interested I will create a fresh FS on the drive and forget
about it. I have my important data so that would be OK.
Keep doing the great job!
Konstantin
^ permalink raw reply [flat|nested] 7+ messages in thread[parent not found: <b14e81f0050807152830555213@mail.gmail.com>]
* Re: Strange problems/bugs with reiserfs and reiserfschk [not found] ` <b14e81f0050807152830555213@mail.gmail.com> @ 2005-08-08 11:51 ` Konstantin Münning 2005-08-13 12:36 ` Konstantin Münning 1 sibling, 0 replies; 7+ messages in thread From: Konstantin Münning @ 2005-08-08 11:51 UTC (permalink / raw) To: reiserfs-list Hi. This processor produces much heat but this is only a question of how you cool it. The system has no troubles with heat and stability. It's a server which is constantly running and except the mentioned problem there are no other troubles. I can compile things for hours on that system so memory and/or heat shouldn't be the problem. When doing disk access the cpu is mostly idle. Working with lots of files on several drives had not produced any problems. There are no recent hardware or software upgrades which coincide with the "bug". The only thing which coincide is the FS corruption. The other drives are still fine. michael chang wrote: > On 8/7/05, Konstantin Münning <konstantin@muenning.com> wrote: > >>There seems to be something I would call a bud in ReiserFS at least in >>kernel 2.6.11.11 which can cause the system/computer to freeze. It is >>caused by a corruption of the FS but at that point I expected to have > >>Kernel 2.6.11.11, Gentoo-Linux, SMP (HyperThreading P4, 3GHz) > > <snip> > > If memory serves me right, any 3GHz processor will get very hot, very > fast. Is it possible that it got hot and started messing up data? > Maybe consider downclocking your cpu or using CPUFreq (or similar) and > see if there isn't data loss running at e.g. 2.5 or 1.5 GHz. Either > that, or don't run it for more than a few hours at a time. I'm pretty > sure 2.6.11.11 has CPUFreq in it somewheres. Something to look at in > the future. > > Of course, this is all speculation. I have absolutely no idea. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Strange problems/bugs with reiserfs and reiserfschk [not found] ` <b14e81f0050807152830555213@mail.gmail.com> 2005-08-08 11:51 ` Konstantin Münning @ 2005-08-13 12:36 ` Konstantin Münning 2005-08-15 13:20 ` Vitaly Fertman 1 sibling, 1 reply; 7+ messages in thread From: Konstantin Münning @ 2005-08-13 12:36 UTC (permalink / raw) To: reiserfs-list Hi Everyone. OK, there seems definitely to be some kind of bug in reiserfsck 3.6.19. Or is it a feature? ;-) I tried once again with reiserfsck --rebuild-tree to repair the FS and here it is again. About the end of pass 2 (about 20h after starting) counting stopped at "left 32022, 500 /sec" but there was heavy acccess of the drive. After about an hour reiserfsck started consuming 100% CPU and is doning some minimal access to the drive (the drive light blinks every second or so, SCSI reports about 40 commands for each of these accesses). What could be causing this? Is the drive too large for reiserfsck? I wouldn't believe that 0,6TB are but it is consuming at least quite a lot of memory: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15185 root 39 19 76304 51m 688 R 99.9 10.3 4591:30 reiserfsck I have left it like this the last 3 days just to make sure that it's not my lack of patience. But now still... So any advices? How to find out what is causing reiserfsck to hang? Or would I have to build a debug version and check for myself? The drive itself is working, I checked several times, the server is working all the time as well. Here some reiserfs output it it helps somebody to have an idea: (***snip***) block 125371211: The number of items (1) is incorrect, should be (0) - corrected block 125371211: The free space (0) is incorrect, should be (4072) - corrected block 125506454: The number of items (1) is incorrect, should be (0) - corrected block 125506454: The free space (0) is incorrect, should be (4072) - corrected pass0: vpf-10160: block 129485584: item 2: No "." entry found in the first item of a directory pass0: vpf-10160: block 129485584: item 4: No "." entry found in the first item of a directory pass0: vpf-10160: block 133169584: item 14: No "." entry found in the first item of a directory pass0: vpf-10160: block 133169648: item 25: No "." entry found in the first item of a directory pass0: vpf-10160: block 133170064: item 10: No "." entry found in the first item of a directory pass0: vpf-10160: block 134316400: item 18: No "." entry found in the first item of a directory pass0: vpf-10560: block 145031172, item 7: Wrong order of items - change the obj ect_id of the key [2237738 2237741 0x1 DRCT (2)] to 2237740 pass0: vpf-10160: block 145817616: item 1: No "." entry found in the first item of a directory left 0, 2623 /sec 914346 directory entries were hashed with "r5" hash. "r5" hash is selected Flushing..finished Read blocks (but not data blocks) 94209553 Leaves among those 3522616 - corrected leaves 319 - leaves all contents of which could not be saved and de leted 15 pointers in indirect items to wrong area 24 (zeroed) Objectids found 984719 Pass 1 (will try to insert 3522601 leaves): ####### Pass 1 ####### Looking for allocable blocks .. finished 0%....20%....40%....is_leaf_bad: block 35452246, item 25: The corrupted item fou nd (878203 878203 0x0 SD (0), len 44, location 3056 entry count 65535, fsck need 1, format new) is_leaf_bad: block 35452246, item 26: The corrupted item found (878203 878203 0x 1 DRCT (2), len 1464, location 1592 entry count 65535, fsck need 1, format new) is_leaf_bad: WARNING: The leaf (35452246) is formatted badly. Will be handled on the the pass2. 60%....80%....100% left 0, 701 /sec Flushing..finished 3522601 leaves read 3489821 inserted 32780 not inserted non-unique pointers in indirect items (zeroed) 656 ####### Pass 2 ####### Pass 2: 0%....20%....40%..rewrite_file: 2 items of file [2340286 2340312] moved to [2340 286 16] vpf-10260: The file we are inserting the new item (432679 432760 0xf001 IND (1), len 160, location 3936 entry count 0, fsck need 1, format new) into has no Stat Data, insertion was skipped vpf-10260: The file we are inserting the new item (432679 432828 0x2a001 IND (1) , len 132, location 3964 entry count 0, fsck need 1, format new) into has no Sta tData, insertion was skipped (***snip***) vpf-10260: The file we are inserting the new item (526132 526780 0x1 IND (1), len 8, location 4088 entry count 0, fsck need 1, format new) into has no StatData, insertion was skipped vpf-10260: The file we are inserting the new item (557044 557073 0x1 IND (1), len 4, location 4092 entry count 0, fsck need 3, format new) into has no StatData, insertion was skipped vpf-10260: The file we are inserting the new item (558483 558492 0x20001 IND (1), len 96, location 4000 entry count 0, fsck need 1, format new) into has no StatData, insertion was skipped vpf-10260: The file we are inserting the new item (759159 759160 0x1 IND (1), len 3208, location 888 entry count 0, fsck need 1, format new) into has no StatData, insertion was skipped left 32022, 500 /sec ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Strange problems/bugs with reiserfs and reiserfschk 2005-08-13 12:36 ` Konstantin Münning @ 2005-08-15 13:20 ` Vitaly Fertman 2005-08-23 21:30 ` Konstantin Münning 0 siblings, 1 reply; 7+ messages in thread From: Vitaly Fertman @ 2005-08-15 13:20 UTC (permalink / raw) To: reiserfs-list; +Cc: Konstantin Münning On Saturday 13 August 2005 16:36, Konstantin Münning wrote: > Hi Everyone. > > OK, there seems definitely to be some kind of bug in reiserfsck 3.6.19. > Or is it a feature? ;-) > > I tried once again with reiserfsck --rebuild-tree to repair the FS and > here it is again. About the end of pass 2 (about 20h after starting) > counting stopped at "left 32022, 500 /sec" but there was heavy acccess > of the drive. After about an hour reiserfsck started consuming 100% CPU > and is doning some minimal access to the drive (the drive light blinks > every second or so, SCSI reports about 40 commands for each of these > accesses). > > What could be causing this? Is the drive too large for reiserfsck? I > wouldn't believe that 0,6TB are but it is consuming at least quite a lot > of memory: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 15185 root 39 19 76304 51m 688 R 99.9 10.3 4591:30 reiserfsck > > I have left it like this the last 3 days just to make sure that it's not > my lack of patience. But now still... So any advices? How to find out > what is causing reiserfsck to hang? Or would I have to build a debug > version and check for myself? if some file item offsets are corrupted, fsck can work for too long on pass2. or it also can be a bug. I will send you a version of reiserfsprogs that has an optimization fix for former. if it fails email me and provide the metadata please: debugreiserfs -p <device> | bzip2 -c > <device>.bz2 > The drive itself is working, I checked several times, the server is > working all the time as well. Here some reiserfs output it it helps > somebody to have an idea: > > (***snip***) > block 125371211: The number of items (1) is incorrect, should be (0) - > corrected > block 125371211: The free space (0) is incorrect, should be (4072) - > corrected > block 125506454: The number of items (1) is incorrect, should be (0) - > corrected > block 125506454: The free space (0) is incorrect, should be (4072) - > corrected > pass0: vpf-10160: block 129485584: item 2: No "." entry found in the > first item > of a directory > pass0: vpf-10160: block 129485584: item 4: No "." entry found in the > first item > of a directory > pass0: vpf-10160: block 133169584: item 14: No "." entry found in the > first item > of a directory > pass0: vpf-10160: block 133169648: item 25: No "." entry found in the > first item > of a directory > pass0: vpf-10160: block 133170064: item 10: No "." entry found in the > first item > of a directory > pass0: vpf-10160: block 134316400: item 18: No "." entry found in the > first item > of a directory > pass0: vpf-10560: block 145031172, item 7: Wrong order of items - change > the obj > ect_id of the key [2237738 2237741 0x1 DRCT (2)] to 2237740 > pass0: vpf-10160: block 145817616: item 1: No "." entry found in the > first item > of a directory > left 0, 2623 /sec > 914346 directory entries were hashed with "r5" hash. > "r5" hash is selected > Flushing..finished > Read blocks (but not data blocks) 94209553 > Leaves among those 3522616 > - corrected leaves 319 > - leaves all contents of which could not be > saved and de > leted 15 > pointers in indirect items to wrong area 24 (zeroed) > Objectids found 984719 > > Pass 1 (will try to insert 3522601 leaves): > ####### Pass 1 ####### > Looking for allocable blocks .. finished > 0%....20%....40%....is_leaf_bad: block 35452246, item 25: The corrupted > item fou > nd (878203 878203 0x0 SD (0), len 44, location 3056 entry count 65535, > fsck need > 1, format new) > is_leaf_bad: block 35452246, item 26: The corrupted item found (878203 > 878203 0x > 1 DRCT (2), len 1464, location 1592 entry count 65535, fsck need 1, > format new) > is_leaf_bad: WARNING: The leaf (35452246) is formatted badly. Will be > handled on > the the pass2. > 60%....80%....100% left 0, 701 /sec > Flushing..finished > 3522601 leaves read > 3489821 inserted > 32780 not inserted > non-unique pointers in indirect items (zeroed) 656 > ####### Pass 2 ####### > > Pass 2: > 0%....20%....40%..rewrite_file: 2 items of file [2340286 2340312] moved > to [2340 > 286 16] > vpf-10260: The file we are inserting the new item (432679 432760 0xf001 > IND (1), > len 160, location 3936 entry count 0, fsck need 1, format new) into has > no Stat > Data, insertion was skipped > vpf-10260: The file we are inserting the new item (432679 432828 0x2a001 > IND (1) > , len 132, location 3964 entry count 0, fsck need 1, format new) into > has no Sta > tData, insertion was skipped > (***snip***) > vpf-10260: The file we are inserting the new item (526132 526780 0x1 IND > (1), len 8, location 4088 entry count 0, fsck need 1, format new) into > has no StatData, insertion was skipped > vpf-10260: The file we are inserting the new item (557044 557073 0x1 IND > (1), len 4, location 4092 entry count 0, fsck need 3, format new) into > has no StatData, insertion was skipped > vpf-10260: The file we are inserting the new item (558483 558492 0x20001 > IND (1), len 96, location 4000 entry count 0, fsck need 1, format new) > into has no StatData, insertion was skipped > vpf-10260: The file we are inserting the new item (759159 759160 0x1 IND > (1), len 3208, location 888 entry count 0, fsck need 1, format new) into > has no StatData, insertion was skipped > left 32022, 500 /sec > > > -- Thanks, Vitaly Fertman ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Strange problems/bugs with reiserfs and reiserfschk 2005-08-15 13:20 ` Vitaly Fertman @ 2005-08-23 21:30 ` Konstantin Münning 2005-08-24 10:47 ` Vitaly Fertman 0 siblings, 1 reply; 7+ messages in thread From: Konstantin Münning @ 2005-08-23 21:30 UTC (permalink / raw) To: reiserfs-list; +Cc: Vitaly Fertman Hi Vitaly! Thank you for the reiserfsck 3.9.20. It in fact had different results on that drive. I had it run in gdb (as I did with 3.6.19 to see what/where the trouble may be) and the result is: (***snip***) vpf-10680: The file [641222 641239] has the wrong block count in the StatData (1528) - corrected to (1520) vpf-10680: The file [641222 641241] has the wrong block count in the StatData (47192) - corrected to (47168) vpf-10680: The file [641222 641242] has the wrong block count in the StatData (16624) - corrected to (16528) are_file_items_correct: All bytes we look for must be first items byte (position 0). Program received signal SIGABRT, Aborted. 0xffffe410 in __kernel_vsyscall () (***snip***) Hmm... Ugly ;-). Vitaly Fertman wrote: > if some file item offsets are corrupted, fsck can work for too long on > pass2. or it also can be a bug. I will send you a version of reiserfsprogs > that has an optimization fix for former. if it fails email me and provide > the metadata please: > debugreiserfs -p <device> | bzip2 -c > <device>.bz2 Do you need the metadata or the full logfile or should I send you something more/else for that? Thanks and have a nice day, Konstantin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Strange problems/bugs with reiserfs and reiserfschk 2005-08-23 21:30 ` Konstantin Münning @ 2005-08-24 10:47 ` Vitaly Fertman 0 siblings, 0 replies; 7+ messages in thread From: Vitaly Fertman @ 2005-08-24 10:47 UTC (permalink / raw) To: Konstantin Münning; +Cc: reiserfs-list On Wednesday 24 August 2005 01:30, Konstantin Münning wrote: > Hi Vitaly! > > Thank you for the reiserfsck 3.9.20. It in fact had different results on > that drive. I had it run in gdb (as I did with 3.6.19 to see what/where > the trouble may be) and the result is: > > (***snip***) > vpf-10680: The file [641222 641239] has the wrong block count in the > StatData (1528) - corrected to (1520) > vpf-10680: The file [641222 641241] has the wrong block count in the > StatData (47192) - corrected to (47168) > vpf-10680: The file [641222 641242] has the wrong block count in the > StatData (16624) - corrected to (16528) > > are_file_items_correct: All bytes we look for must be first items byte > (position 0). > > Program received signal SIGABRT, Aborted. > 0xffffe410 in __kernel_vsyscall () > (***snip***) > > Hmm... Ugly ;-). > > Vitaly Fertman wrote: > > > if some file item offsets are corrupted, fsck can work for too long on > > pass2. or it also can be a bug. I will send you a version of reiserfsprogs > > that has an optimization fix for former. if it fails email me and provide > > the metadata please: > > debugreiserfs -p <device> | bzip2 -c > <device>.bz2 > > Do you need the metadata or the full logfile or should I send you > something more/else for that? yes, metadata would be enough. -- Thanks, Vitaly Fertman ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Strange problems/bugs with reiserfs and reiserfschk 2005-08-07 22:02 Strange problems/bugs with reiserfs and reiserfschk Konstantin Münning [not found] ` <b14e81f0050807152830555213@mail.gmail.com> @ 2005-08-08 13:18 ` Vitaly Fertman 1 sibling, 0 replies; 7+ messages in thread From: Vitaly Fertman @ 2005-08-08 13:18 UTC (permalink / raw) To: reiserfs-list; +Cc: Konstantin Münning On Monday 08 August 2005 02:02, Konstantin Münning wrote: > Hi Folks! > > There seems to be something I would call a bud in ReiserFS at least in > kernel 2.6.11.11 which can cause the system/computer to freeze. It is > caused by a corruption of the FS but at that point I expected to have > some inaccessable files which I already know from FS corruptions but not > to hang the system. If someone thinks it's worth investigating, please > read further. > > Yes, I know that working with a corrupt FS is nothing good but my > intention was simply to save as much of the files as possible before > doing a rebuild-tree just in case it's all gone after that. As I said, > my experience with corrupt ReiserFS was good with the knowledge that > some files/direcories would be incaccessible. But this time the System > was rendered unuseable when accessing certain directories - no more > mount/umount or even sync were possible (they simply did not return) so > there was no way to shutdown the machine. IMHO this should be considered > as a severe bug - refusing to read a corrupt portions of a FS is OK but > rendering the system unuseable is bad. > > I'm wondering what kind of information I can provide so the source of > this can be found. Here some but if you want more, please tell me: would you run debugreiserfs -p <device> | bzip2 -c > <device>-meta.bz2 this will pack the fs metdata what is enough to try to reproduce a bug locally. > Kernel 2.6.11.11, Gentoo-Linux, SMP (HyperThreading P4, 3GHz) > > Here some portions of /var/log/messages which may show what's about: > the messages just before the system got unuseable: > > ************** > Aug 5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key: > invalid format found in block 27594920. Fsck? > Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345 > does not match to the expected one 1 > (****snip****) > Aug 5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key: > invalid format found in block 27594920. Fsck? > Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345 > does not match to the expected one 1 > Aug 5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key: > invalid format found in block 27594920. Fsck? > Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345 > does not match tond in block 27594920. Fsck? > Aug 5 22:17:22 master ReiserFS: warmatch to the expected one 1 > Aug 5 22:17:22 master unparseable log message: "<nd in block 27594920. > Fsck?" > Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node levematch > to the expected ond in block 27594920. Fsck? > Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: nmatch to the > expected one 1 > Aug 5 22:17:22 master unparseable log message: "<nd in block 27594920. > Fsck?" > Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node lmatch to > the expected ond in block 27594920. Fsck? > Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node match to > the expected ond in block 27594920. Fsck? > (****snip****) > Aug 5 22:17:26 master ReiserFS: warning:nd in block 27608085. Fsck? > Aug 5 22:17:26 master ReiserFS: warninmatch to the expected onend in > block 2760 nd in block 27608085. Fsck? > Aug 5 22:17:26 master ReiserFS: warning: is_tree_node: nodematch to the > expected nd in block 27608085. Fsck? > (****snip****) > Aug 5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key: > invalid > format found in block 122418791. Fsck? > Aug 5 22:18:03 master ReiserFS: warning: is_tree_node: node level 18499 > does no > t match to the expected one 1 > Aug 5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key: > invalid > format found in block 122418791. Fsck? > Aug 5 22:18:03 master init_special_inode: bogus i_mode (177777) > Aug 5 22:18:03 master ReiserFS: warning: is_tree_node: node level 65471 > does no > t match to the expected one 1 > Aug 5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key: > invalid > format found in block 82406293. Fsck? > ************** > > The interesting point is that the messages are getting weird at some > point - see portion after the first (****snip****). As if something is > overwriting an internal buffer or something. Maye caused by high > frequency of messages or some race condition between processors? I have > no idea if this is an indication of the suspected bug but that seems > likely to me. The last portion are the last messages just before the > next boot of the computer. Just if you ask - CPU/Memory of that server > are fine as long as Memtest86(+) can tell. So, what's next? > > Now to the second part. After giving up to save more data (well, I saved > the important 30% of these 400GB) I started a reiserfsck --rebuild-tree. > It worked quite good until about the end. There it seems to be frozen > and consumes 100% CPU. Here some data: > > reiserfsprogs-3.6.19, messages of reiserfsck: > > ************** > .pass1: block 145817616, item 1, entry 0: The entry ".." of the [259961 > 259992 0x2 DIR (3)] is hashed with not set whereas proper hash is "r5" - > deleted > 100% left 0, 212 /sec > Flushing..finished > 268526 leaves read > 203192 inserted > - pointers in indirect items pointing to > metadata 1890 (zeroed) > 65334 not inserted > non-unique pointers in indirect items (zeroed) 28669 > ####### Pass 2 ####### > > Pass 2: > 0%....20%....40%..vpf-10260: The file we are inserting the new item > (2198390 2199894 0x381 DRCT (2), len 1088, location 3008 entry count > 65535, fsck need 0, format new) into has no StatData, insertion was skipped > vpf-10260: The file we are inserting the new item (2195955 2195990 0x1 > DRCT (2), len 792, location 3304 entry count 65535, fsck need 0, format > new) into has no StatData, insertion was skipped > (****snip****) > vpf-10260: The file we are inserting the new item (563378 563928 0x1 > DRCT (2), len 1384, location 2712 entry count 65535, fsck need 0, format > new) into has no StatData, insertion was skipped > vpf-10260: The file we are inserting the new item (563378 564627 0xa9 > DRCT (2), len 688, location 3408 entry count 65535, fsck need 0, format > new) into has no StatData, insertion was skipped > left 32269, 422 /sec > ************** > > And that's the point where it's staying for hours now consuming 100% > CPU. I didn't try to abort and start again (yet) as it takes about 20 > hours to get so far (the device is big and not so fast) so I'm still > hoping it may finish by itself... there are some optimizations ready for the pass2. if fsck fails to recover the fs, gets out of disk space, etc, email me and I will send you the optimized reiserfsprogs version. > So, is it worth investigating and if, what other info I could provide? > If nobody is interested I will create a fresh FS on the drive and forget > about it. I have my important data so that would be OK. > > Keep doing the great job! > Konstantin > > -- Thanks, Vitaly Fertman ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-08-24 10:47 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-07 22:02 Strange problems/bugs with reiserfs and reiserfschk Konstantin Münning
[not found] ` <b14e81f0050807152830555213@mail.gmail.com>
2005-08-08 11:51 ` Konstantin Münning
2005-08-13 12:36 ` Konstantin Münning
2005-08-15 13:20 ` Vitaly Fertman
2005-08-23 21:30 ` Konstantin Münning
2005-08-24 10:47 ` Vitaly Fertman
2005-08-08 13:18 ` Vitaly Fertman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.