* rebuild-tree
@ 2003-12-07 20:46 Larry Weldon
2003-12-08 11:01 ` rebuild-tree Vitaly Fertman
2003-12-08 15:51 ` rebuild-tree Hans Reiser
0 siblings, 2 replies; 4+ messages in thread
From: Larry Weldon @ 2003-12-07 20:46 UTC (permalink / raw)
To: reiserfs mailing list
A client's production file server is set up using RAID-1 with 2 IDE
disks and reiserfs. The operating system is Mandrake 9.1. The system has
an APC 1050 SmartUPS on it - the place does have a history of power
glitches so _everything_ has a UPS.
Monday last I noticed the tape backup (I use tar) crashed at less than 1
minute with memory exhausted error. Examination showed a directory made
using a client machine as a 'backup' of the main job directory had an
error showing up as "cannot stat..." when running du. Since it was a
backup directory they just renamed it and tried to delete it using a DOS
shell. All the files and directories were successfully deleted except
the two offending files and their parent directories.
It took me all week to see what was wrong although it was plain...
After a successful backup, excluding the offending directory, I
unmounted and used reiserfsck --check which told me 1 item was badly
broken and to use reiserfsck --rebuild-tree which worked perfectly and
restored all the meta-data and files.
I would call that a nice job of recovering from some corruption. I did
not think to keep the output of the rebuild-tree function - I recall the
two bad files had some record of size 120 bytes (wrong) and it was reset
to 96 (correct).
I use reiserfs mostly because it was recommended by a friend. Now I find
out he has abandoned reiserfs because of:
http://www.wlug.org.nz/ReiserFS
which seems to be down right now so excerpt follows:
=======================================================================
Unfortunately, the tree structure used is also the weak point of
ReiserFS: if any of it gets corrupted, chances are that much more data
will be affected than under traditional FileSystems. Rather than losing
a single file to corruption of an inode, you may lose almost the entire
contents of your disk if metadata close to the root of the BTree is
affected. Fortunately, the likelihood of this happening due to bugs has
been dramatically reduced in more recent version of the driver. Hardware
failure caused corruption is still a serious problem, though.
========================================================================
Now, I can't just stop using reiserfs and I don't want to. I think there
is great merit in it. So, first, with the limited info I have given,
what might have happened to create the problem and how likely might it
be to happen again? Secondly, what is the *real* hazard of corruption
_higher_up_ in the tree which the article says might blow away the whole
partition?
Thanks and regards.
--
Larry Weldon <larry@welcoin.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rebuild-tree
2003-12-07 20:46 rebuild-tree Larry Weldon
@ 2003-12-08 11:01 ` Vitaly Fertman
2003-12-08 15:51 ` rebuild-tree Hans Reiser
1 sibling, 0 replies; 4+ messages in thread
From: Vitaly Fertman @ 2003-12-08 11:01 UTC (permalink / raw)
To: Larry Weldon, reiserfs mailing list
Hi Larry,
On Sunday 07 December 2003 23:46, Larry Weldon wrote:
> A client's production file server is set up using RAID-1 with 2 IDE
> disks and reiserfs. The operating system is Mandrake 9.1. The system has
> an APC 1050 SmartUPS on it - the place does have a history of power
> glitches so _everything_ has a UPS.
>
> Monday last I noticed the tape backup (I use tar) crashed at less than 1
> minute with memory exhausted error. Examination showed a directory made
> using a client machine as a 'backup' of the main job directory had an
> error showing up as "cannot stat..." when running du. Since it was a
> backup directory they just renamed it and tried to delete it using a DOS
> shell. All the files and directories were successfully deleted except
> the two offending files and their parent directories.
>
> It took me all week to see what was wrong although it was plain...
>
> After a successful backup, excluding the offending directory, I
> unmounted and used reiserfsck --check which told me 1 item was badly
> broken and to use reiserfsck --rebuild-tree which worked perfectly and
> restored all the meta-data and files.
>
> I would call that a nice job of recovering from some corruption. I did
> not think to keep the output of the rebuild-tree function - I recall the
> two bad files had some record of size 120 bytes (wrong) and it was reset
> to 96 (correct).
>
> I use reiserfs mostly because it was recommended by a friend. Now I find
> out he has abandoned reiserfs because of:
> http://www.wlug.org.nz/ReiserFS
> which seems to be down right now so excerpt follows:
> =======================================================================
> Unfortunately, the tree structure used is also the weak point of
> ReiserFS: if any of it gets corrupted, chances are that much more data
> will be affected than under traditional FileSystems. Rather than losing
> a single file to corruption of an inode, you may lose almost the entire
> contents of your disk if metadata close to the root of the BTree is
> affected. Fortunately, the likelihood of this happening due to bugs has
> been dramatically reduced in more recent version of the driver. Hardware
> failure caused corruption is still a serious problem, though.
> ========================================================================
>
> Now, I can't just stop using reiserfs and I don't want to. I think there
> is great merit in it. So, first, with the limited info I have given,
> what might have happened to create the problem and how likely might it
> be to happen again? Secondly, what is the *real* hazard of corruption
> _higher_up_ in the tree which the article says might blow away the whole
> partition?
>
> Thanks and regards.
rebuild-tree builds the BTree from the scratch gathering all leaf nodes
of the tree -- only these formatted nodes contain user's data -- throwing
away others and building the new tree from them. Thus if there is the
only corruption in some node close to the root node of the BTree you
will lose nothing.
It also may look like you have lost the whole reiserfs partition if
you repartitioned the hard disk and moved the start of the reiserfs
partition aside on e.g. 1 track consisted of 63 sectors -- no one 4k
reiserfs formatted block will be found in this case (4k is the default
reiserfs blocksize).
And of course if a reiserfs is corrupted and you continue using it, you
may get more corruptions there; and working on some broken hardware
may destroy a lot of data also.
--
Thanks,
Vitaly Fertman
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rebuild-tree
2003-12-07 20:46 rebuild-tree Larry Weldon
2003-12-08 11:01 ` rebuild-tree Vitaly Fertman
@ 2003-12-08 15:51 ` Hans Reiser
2003-12-09 11:16 ` rebuild-tree Larry Weldon
1 sibling, 1 reply; 4+ messages in thread
From: Hans Reiser @ 2003-12-08 15:51 UTC (permalink / raw)
To: Larry Weldon; +Cc: reiserfs mailing list
Larry Weldon wrote:
>A client's production file server is set up using RAID-1 with 2 IDE
>disks and reiserfs. The operating system is Mandrake 9.1.
>
which kernel does it use?
>=======================================================================
>Unfortunately, the tree structure used is also the weak point of
>ReiserFS: if any of it gets corrupted, chances are that much more data
>will be affected than under traditional FileSystems. Rather than losing
>a single file to corruption of an inode, you may lose almost the entire
>contents of your disk if metadata close to the root of the BTree is
>affected. Fortunately, the likelihood of this happening due to bugs has
>been dramatically reduced in more recent version of the driver. Hardware
>failure caused corruption is still a serious problem, though.
>========================================================================
>
>
The guy who wrote this does a nice job of telling one everything about
himself except his email address.
I don't know why he says this, and I don't think it is true. If you
trash the blocks with the root directory in any filesystem you are going
to end up with a lot of files in lost+found, yes?
Maybe others have a different experience or understanding of the remark.
With reiserfs, if you corrupt the internal nodes of the tree, you have
to run fsck before you can use it, and in this we are more fragile, but
once fsck has been run it is no more fragile than anything else
(assuming you use the latest fsck, because fsck took a long time to
stabilize).
>Now, I can't just stop using reiserfs and I don't want to. I think there
>is great merit in it. So, first, with the limited info I have given,
>what might have happened to create the problem and how likely might it
>be to happen again? Secondly, what is the *real* hazard of corruption
>_higher_up_ in the tree which the article says might blow away the whole
>partition?
>
>Thanks and regards.
>
>
--
Hans
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rebuild-tree
2003-12-08 15:51 ` rebuild-tree Hans Reiser
@ 2003-12-09 11:16 ` Larry Weldon
0 siblings, 0 replies; 4+ messages in thread
From: Larry Weldon @ 2003-12-09 11:16 UTC (permalink / raw)
To: reiserfs mailing list
On Mon, 2003-12-08 at 10:51, Hans Reiser wrote:
> Larry Weldon wrote:
>
> >A client's production file server is set up using RAID-1 with 2 IDE
> >disks and reiserfs. The operating system is Mandrake 9.1.
> >
> which kernel does it use?
>
linux-2.4.19-32mdk
reiserfsck, 2002 - reiserfsprogs 3.6.3
[snip]
> With reiserfs, if you corrupt the internal nodes of the tree, you have
> to run fsck before you can use it, and in this we are more fragile, but
> once fsck has been run it is no more fragile than anything else
> (assuming you use the latest fsck, because fsck took a long time to
> stabilize).
>
I would be wise, then to install the latest reiserfs on older servers as
well?
Thank you,
--
Larry Weldon <larry@welcoin.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2003-12-09 11:16 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-07 20:46 rebuild-tree Larry Weldon
2003-12-08 11:01 ` rebuild-tree Vitaly Fertman
2003-12-08 15:51 ` rebuild-tree Hans Reiser
2003-12-09 11:16 ` rebuild-tree Larry Weldon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.