All of lore.kernel.org
 help / color / mirror / Atom feed
* rebuild-tree
@ 2003-12-07 20:46 Larry Weldon
  2003-12-08 11:01 ` rebuild-tree Vitaly Fertman
  2003-12-08 15:51 ` rebuild-tree Hans Reiser
  0 siblings, 2 replies; 4+ messages in thread
From: Larry Weldon @ 2003-12-07 20:46 UTC (permalink / raw)
  To: reiserfs mailing list

A client's production file server is set up using RAID-1 with 2 IDE
disks and reiserfs. The operating system is Mandrake 9.1. The system has
an APC 1050 SmartUPS on it - the place does have a history of power
glitches so _everything_ has a UPS.

Monday last I noticed the tape backup (I use tar) crashed at less than 1
minute with memory exhausted error. Examination showed a directory made
using a client machine as a 'backup' of the main job directory had an
error showing up as "cannot stat..." when running du. Since it was a
backup directory they just renamed it and tried to delete it using a DOS
shell. All the files and directories were successfully deleted except
the two offending files and their parent directories.

It took me all week to see what was wrong although it was plain...

After a successful backup, excluding the offending directory, I
unmounted and used reiserfsck --check which told me 1 item was badly
broken and to use reiserfsck --rebuild-tree which worked perfectly and
restored all the meta-data and files.

I would call that a nice job of recovering from some corruption. I did
not think to keep the output of the rebuild-tree function - I recall the
two bad files had some record of size 120 bytes (wrong) and it was reset
to 96 (correct).

I use reiserfs mostly because it was recommended by a friend. Now I find
out he has abandoned reiserfs because of:
http://www.wlug.org.nz/ReiserFS
which seems to be down right now so excerpt follows:
=======================================================================
Unfortunately, the tree structure used is also the weak point of 
ReiserFS: if any of it gets corrupted, chances are that much more data
will be affected than under traditional FileSystems. Rather than losing
a single file to corruption of an inode, you may lose almost the entire
contents of your disk if metadata close to the root of the BTree is
affected. Fortunately, the likelihood of this happening due to bugs has
been dramatically reduced in more recent version of the driver. Hardware
failure caused corruption is still a serious problem, though.
========================================================================

Now, I can't just stop using reiserfs and I don't want to. I think there
is great merit in it. So, first, with the limited info I have given,
what might have happened to create the problem and how likely might it
be to happen again? Secondly, what is the *real* hazard of corruption
_higher_up_ in the tree which the article says might blow away the whole
partition?

Thanks and regards.
-- 
Larry Weldon <larry@welcoin.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: rebuild-tree
  2003-12-07 20:46 rebuild-tree Larry Weldon
@ 2003-12-08 11:01 ` Vitaly Fertman
  2003-12-08 15:51 ` rebuild-tree Hans Reiser
  1 sibling, 0 replies; 4+ messages in thread
From: Vitaly Fertman @ 2003-12-08 11:01 UTC (permalink / raw)
  To: Larry Weldon, reiserfs mailing list

Hi Larry,

On Sunday 07 December 2003 23:46, Larry Weldon wrote:
> A client's production file server is set up using RAID-1 with 2 IDE
> disks and reiserfs. The operating system is Mandrake 9.1. The system has
> an APC 1050 SmartUPS on it - the place does have a history of power
> glitches so _everything_ has a UPS.
>
> Monday last I noticed the tape backup (I use tar) crashed at less than 1
> minute with memory exhausted error. Examination showed a directory made
> using a client machine as a 'backup' of the main job directory had an
> error showing up as "cannot stat..." when running du. Since it was a
> backup directory they just renamed it and tried to delete it using a DOS
> shell. All the files and directories were successfully deleted except
> the two offending files and their parent directories.
>
> It took me all week to see what was wrong although it was plain...
>
> After a successful backup, excluding the offending directory, I
> unmounted and used reiserfsck --check which told me 1 item was badly
> broken and to use reiserfsck --rebuild-tree which worked perfectly and
> restored all the meta-data and files.
>
> I would call that a nice job of recovering from some corruption. I did
> not think to keep the output of the rebuild-tree function - I recall the
> two bad files had some record of size 120 bytes (wrong) and it was reset
> to 96 (correct).
>
> I use reiserfs mostly because it was recommended by a friend. Now I find
> out he has abandoned reiserfs because of:
> http://www.wlug.org.nz/ReiserFS
> which seems to be down right now so excerpt follows:
> =======================================================================
> Unfortunately, the tree structure used is also the weak point of
> ReiserFS: if any of it gets corrupted, chances are that much more data
> will be affected than under traditional FileSystems. Rather than losing
> a single file to corruption of an inode, you may lose almost the entire
> contents of your disk if metadata close to the root of the BTree is
> affected. Fortunately, the likelihood of this happening due to bugs has
> been dramatically reduced in more recent version of the driver. Hardware
> failure caused corruption is still a serious problem, though.
> ========================================================================
>
> Now, I can't just stop using reiserfs and I don't want to. I think there
> is great merit in it. So, first, with the limited info I have given,
> what might have happened to create the problem and how likely might it
> be to happen again? Secondly, what is the *real* hazard of corruption
> _higher_up_ in the tree which the article says might blow away the whole
> partition?
>
> Thanks and regards.

rebuild-tree builds the BTree from the scratch gathering all leaf nodes 
of  the tree -- only these formatted nodes contain user's data -- throwing 
away others and building the new tree from them. Thus if there is the 
only corruption in some node close to the root node of the BTree you 
will lose nothing. 

It also may look like you have lost the whole reiserfs partition if 
you repartitioned the hard disk and moved the start of the reiserfs 
partition aside on e.g. 1 track consisted of 63 sectors -- no one 4k 
reiserfs formatted block will be found in this case (4k is the default 
reiserfs blocksize).

And of course if a reiserfs is corrupted and you continue using it, you 
may get more corruptions there; and working on some broken hardware 
may destroy a lot of data also.

-- 
Thanks,
Vitaly Fertman


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: rebuild-tree
  2003-12-07 20:46 rebuild-tree Larry Weldon
  2003-12-08 11:01 ` rebuild-tree Vitaly Fertman
@ 2003-12-08 15:51 ` Hans Reiser
  2003-12-09 11:16   ` rebuild-tree Larry Weldon
  1 sibling, 1 reply; 4+ messages in thread
From: Hans Reiser @ 2003-12-08 15:51 UTC (permalink / raw)
  To: Larry Weldon; +Cc: reiserfs mailing list

Larry Weldon wrote:

>A client's production file server is set up using RAID-1 with 2 IDE
>disks and reiserfs. The operating system is Mandrake 9.1.
>
which kernel does it use?

>=======================================================================
>Unfortunately, the tree structure used is also the weak point of 
>ReiserFS: if any of it gets corrupted, chances are that much more data
>will be affected than under traditional FileSystems. Rather than losing
>a single file to corruption of an inode, you may lose almost the entire
>contents of your disk if metadata close to the root of the BTree is
>affected. Fortunately, the likelihood of this happening due to bugs has
>been dramatically reduced in more recent version of the driver. Hardware
>failure caused corruption is still a serious problem, though.
>========================================================================
>  
>
The guy who wrote this does a nice job of telling one everything about 
himself except his email address.

I don't know why he says this, and I don't think it is true.  If you 
trash the blocks with the root directory in any filesystem you are going 
to end up with a lot of files in lost+found, yes?

Maybe others have a different experience or understanding of the remark.

With reiserfs, if you corrupt the internal nodes of the tree, you have 
to run fsck before you can use it, and in this we are more fragile, but 
once fsck has been run it is no more fragile than anything else 
(assuming you use the latest fsck, because fsck took a long time to 
stabilize).

>Now, I can't just stop using reiserfs and I don't want to. I think there
>is great merit in it. So, first, with the limited info I have given,
>what might have happened to create the problem and how likely might it
>be to happen again? Secondly, what is the *real* hazard of corruption
>_higher_up_ in the tree which the article says might blow away the whole
>partition?
>
>Thanks and regards.
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: rebuild-tree
  2003-12-08 15:51 ` rebuild-tree Hans Reiser
@ 2003-12-09 11:16   ` Larry Weldon
  0 siblings, 0 replies; 4+ messages in thread
From: Larry Weldon @ 2003-12-09 11:16 UTC (permalink / raw)
  To: reiserfs mailing list

On Mon, 2003-12-08 at 10:51, Hans Reiser wrote:
> Larry Weldon wrote:
> 
> >A client's production file server is set up using RAID-1 with 2 IDE
> >disks and reiserfs. The operating system is Mandrake 9.1.
> >
> which kernel does it use?
> 

linux-2.4.19-32mdk
reiserfsck, 2002 - reiserfsprogs 3.6.3

[snip]

> With reiserfs, if you corrupt the internal nodes of the tree, you have 
> to run fsck before you can use it, and in this we are more fragile, but 
> once fsck has been run it is no more fragile than anything else 
> (assuming you use the latest fsck, because fsck took a long time to 
> stabilize).
> 

I would be wise, then to install the latest reiserfs on older servers as
well?

Thank you,
-- 
Larry Weldon <larry@welcoin.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-12-09 11:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-07 20:46 rebuild-tree Larry Weldon
2003-12-08 11:01 ` rebuild-tree Vitaly Fertman
2003-12-08 15:51 ` rebuild-tree Hans Reiser
2003-12-09 11:16   ` rebuild-tree Larry Weldon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.