Some ReiserFS failure situations and questions

All of lore.kernel.org
 help / color / mirror / Atom feed

* Some ReiserFS failure situations and questions
@ 2004-03-03 22:48 Ryan Underwood
  2004-03-04  9:37 ` Hans Reiser
  2004-03-05  6:58 ` Sander
  0 siblings, 2 replies; 7+ messages in thread
From: Ryan Underwood @ 2004-03-03 22:48 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 4403 bytes --]

This is on a Debian unstable system running 2.4.25.

I copied a filesystem to another partition and deleted the partition the
original filesystem was on, all while it was mounted R/W (stupid stupid
stupid).  Then I ended up with a corrupt copy, as well as a corrupt
original filesystem even when the partition was restored.  The copy was
completely unusable.  On the corrupted original, reiserfsck
--rebuild-tree gave up on one of the root trees and tried to reconstruct
things from the rest.

Now, I had previously copied this partition from another disk (and
deleted that partition, forgetting its geometry).  So, I thought perhaps
I could determine where that old partition began so I could retrieve the
filesystem from it.  First I used parted's partition search feature.
That yielded nothing for some reason.  Then I made some guesses and ran
reiserfsck to check.  I told reiserfsck to scan the entire partition,
which it did, but for some reason it wasn't able to locate the reiserfs
superblock which was still there. (I was certain that I placed the new
start-of-partition before the old reiserfs location, and that the space
had not been re-used.)  So each time it simply suggested to create a new
superblock, which was not helpful because it would throw away the old
filesystem information when doing so.  Would it be a good idea to add an
option to debugreiserfs to scan for reiserfs superblocks along an entire
disk device and report their LBA location, so that e.g. parted could be
used to recreate the partition at that location?

In the end, I ended up going with --rebuild-tree on the filesystem I had
destroyed.  It was only able to recover an estimated 1/10 of the files
and directory structure.  Everything else went in lost+found if it was
recovered at all.  Worse, most of the files it "recovered" ended up to
be corrupt; executable files with pieces of some other text file in
them, or full of zeros, etc.  There were also some unexplained phantom
files that turned up with names full of garbage (control characters and
such).  These were difficult to delete, but ones in e.g. /usr/lib would
make ldconfig complain and/or crash.

I'm curious why this chain of events resulted in such a disaster. (mount
filesystem R/W, then delete its partition via parted -> kernel re-reads
partition table)  I mean, this should normally never happen, but why
is a R/W mount with no files currently open (single user mode) corrupted
so badly by its partition being deleted while the reiserfs driver is
running?  Wouldn't hitting the reset button be an even worse thing to
do in any case?

I still have a dd dump of the corrupted partition and a log of
--rebuild-tree can be found at http://dbz.icequake.net/share/fsck.log.gz
.  If anyone has any ideas what I can do to better recover this
partition, I would appreciate it.

Previously I had a ReiserFS partition of which the very beginning of the
partition was overwritten with garbage.  This resulted in a similar
disaster.  I wonder if there is a better idea to recover a filesystem
which has had its beginning overwritten.  Or maybe the filesystem trees
can be mirrored elsewhere on the disk for better recovery options?

Anyway, I am a complete newbie at recovering a trashed ReiserFS, so any
general strategy or specific ideas for these two cases would help me
greatly.

----

I have another problem with reiserfs that happens occasionally.  On
reiserfs and only reiserfs partitions, when a crash and journal recovery
happens, occasionally two files that were open will have their contents
"exchanged".  For example, a piece of the text file I was editing ends
up in a licq contact file, or an emule download gets a piece of a web
page from opera in it.  Is this a common problem, and what is the best
way to keep it from happening?  It is irritating because even though the
filesystem is made into a consistent state by the journal recovery, I
usually have to do some manual hacking to figure out what happened to
program's data that is causing it to misbehave, and something like these
things usually turns up.  This sort of thing never occurred with ext3,
or at least I never noticed it.  The stranger thing is that it also
happens on files open for read only, not just those open for writing or
for R/W.

Thanks!

-- 
Ryan Underwood, <nemesis@icequake.net>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some ReiserFS failure situations and questions
  2004-03-03 22:48 Some ReiserFS failure situations and questions Ryan Underwood
@ 2004-03-04  9:37 ` Hans Reiser
  2004-03-04 22:09   ` Ryan Underwood
  2004-03-05  6:58 ` Sander
  1 sibling, 1 reply; 7+ messages in thread
From: Hans Reiser @ 2004-03-04  9:37 UTC (permalink / raw)
  To: Ryan Underwood; +Cc: reiserfs-list

You can probably get your data back if you ask Vitaly to do it using 
www.namesys.com/support.html, and he can probably explain it as well.;-)

Hans

Ryan Underwood wrote:

>This is on a Debian unstable system running 2.4.25.
>
>I copied a filesystem to another partition and deleted the partition the
>original filesystem was on, all while it was mounted R/W (stupid stupid
>stupid).  Then I ended up with a corrupt copy, as well as a corrupt
>original filesystem even when the partition was restored.  The copy was
>completely unusable.  On the corrupted original, reiserfsck
>--rebuild-tree gave up on one of the root trees and tried to reconstruct
>things from the rest.
>
>Now, I had previously copied this partition from another disk (and
>deleted that partition, forgetting its geometry).  So, I thought perhaps
>I could determine where that old partition began so I could retrieve the
>filesystem from it.  First I used parted's partition search feature.
>That yielded nothing for some reason.  Then I made some guesses and ran
>reiserfsck to check.  I told reiserfsck to scan the entire partition,
>which it did, but for some reason it wasn't able to locate the reiserfs
>superblock which was still there. (I was certain that I placed the new
>start-of-partition before the old reiserfs location, and that the space
>had not been re-used.)  So each time it simply suggested to create a new
>superblock, which was not helpful because it would throw away the old
>filesystem information when doing so.  Would it be a good idea to add an
>option to debugreiserfs to scan for reiserfs superblocks along an entire
>disk device and report their LBA location, so that e.g. parted could be
>used to recreate the partition at that location?
>
>In the end, I ended up going with --rebuild-tree on the filesystem I had
>destroyed.  It was only able to recover an estimated 1/10 of the files
>and directory structure.  Everything else went in lost+found if it was
>recovered at all.  Worse, most of the files it "recovered" ended up to
>be corrupt; executable files with pieces of some other text file in
>them, or full of zeros, etc.  There were also some unexplained phantom
>files that turned up with names full of garbage (control characters and
>such).  These were difficult to delete, but ones in e.g. /usr/lib would
>make ldconfig complain and/or crash.
>
>I'm curious why this chain of events resulted in such a disaster. (mount
>filesystem R/W, then delete its partition via parted -> kernel re-reads
>partition table)  I mean, this should normally never happen, but why
>is a R/W mount with no files currently open (single user mode) corrupted
>so badly by its partition being deleted while the reiserfs driver is
>running?  Wouldn't hitting the reset button be an even worse thing to
>do in any case?
>
>I still have a dd dump of the corrupted partition and a log of
>--rebuild-tree can be found at http://dbz.icequake.net/share/fsck.log.gz
>.  If anyone has any ideas what I can do to better recover this
>partition, I would appreciate it.
>
>Previously I had a ReiserFS partition of which the very beginning of the
>partition was overwritten with garbage.  This resulted in a similar
>disaster.  I wonder if there is a better idea to recover a filesystem
>which has had its beginning overwritten.  Or maybe the filesystem trees
>can be mirrored elsewhere on the disk for better recovery options?
>
>Anyway, I am a complete newbie at recovering a trashed ReiserFS, so any
>general strategy or specific ideas for these two cases would help me
>greatly.
>
>----
>
>I have another problem with reiserfs that happens occasionally.  On
>reiserfs and only reiserfs partitions, when a crash and journal recovery
>happens, occasionally two files that were open will have their contents
>"exchanged".  For example, a piece of the text file I was editing ends
>up in a licq contact file, or an emule download gets a piece of a web
>page from opera in it.  Is this a common problem, and what is the best
>way to keep it from happening?  It is irritating because even though the
>filesystem is made into a consistent state by the journal recovery, I
>usually have to do some manual hacking to figure out what happened to
>program's data that is causing it to misbehave, and something like these
>things usually turns up.  This sort of thing never occurred with ext3,
>or at least I never noticed it.  The stranger thing is that it also
>happens on files open for read only, not just those open for writing or
>for R/W.
>
>Thanks!
>
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some ReiserFS failure situations and questions
  2004-03-04  9:37 ` Hans Reiser
@ 2004-03-04 22:09   ` Ryan Underwood
  2004-03-05  6:47     ` Sander
  0 siblings, 1 reply; 7+ messages in thread
From: Ryan Underwood @ 2004-03-04 22:09 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1926 bytes --]

On Thu, Mar 04, 2004 at 12:37:52PM +0300, Hans Reiser wrote:
> You can probably get your data back if you ask Vitaly to do it using 
> www.namesys.com/support.html, and he can probably explain it as well.;-)

I was tempted, but I don't even have that money really. (I know, neither
do you, but I'm sorry....)  I lost three days of university work while
rebuilding my machine so I could use it again.  I also discovered, at
the end of those three days, an error in my mail-virus scanning script
that had told procmail to happily /dev/null all my mail during those
days. X-(  A bad week-end, all in all...

Anyway, it was just /usr, so not a critical loss (rebuilt mostly
everything using dpkg database and apt) but it took a long time, and I
lost some custom programs/scripts/source in /usr/local since that whole
tree died.  The reason I posted is to find out some general Reiser
recovery strategy.  I had two filesystems corrupted in unusual ways as I
described, and in both cases reiserfsck --rebuild-tree was mostly
useless in recovering the data.  Furthermore the recovered filesystem on
the partition-deleted one ended up in a very strange state, with
"recovered" files being corrupted, and garbage files appearing from
nowhere.  This seemed like something that Reiser guys would want to know
about.

debugreiserfs seems like it should include the option to scan a whole
disk for reiserfs superblocks so that the LBA of a superblock can be
used to recreate a deleted partition.  As it was, I had a perfectly good
copy of the partition "somewhere", but parted could not search it out.

I am also still curious, (back to "normal" problem here) about the
files' contents being "mixed-up" when a unclean shutdown occurs.  Is
this because only metadata is journaled by default?  Should it be fixed
if I change mount options for full journaling?

-- 
Ryan Underwood, <nemesis@icequake.net>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some ReiserFS failure situations and questions
  2004-03-04 22:09   ` Ryan Underwood
@ 2004-03-05  6:47     ` Sander
  0 siblings, 0 replies; 7+ messages in thread
From: Sander @ 2004-03-05  6:47 UTC (permalink / raw)
  To: reiserfs-list

Ryan Underwood wrote (ao):
> The reason I posted is to find out some general Reiser recovery
> strategy.

Backups are a good thing for this. If you don't have money for a
usb/firewire disk, cd burner or old second computer, you can also store
your files in your homedir at the university.

Computer hardware will die for sure. You just never know when, but when
it does, it is a real PITA.
Backups not only protect against user error, but also bad programs,
power failures, broken hardware, etc. You can't do without.

> I had two filesystems corrupted in unusual ways as I described, and in
> both cases reiserfsck --rebuild-tree was mostly useless in recovering
> the data.

Did you try the newest reiserfsprogs? They are at 3.6.13 now, and seem
to get quite a bit better every release.

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some ReiserFS failure situations and questions
  2004-03-03 22:48 Some ReiserFS failure situations and questions Ryan Underwood
  2004-03-04  9:37 ` Hans Reiser
@ 2004-03-05  6:58 ` Sander
  2004-03-05  7:08   ` Ryan Underwood
  1 sibling, 1 reply; 7+ messages in thread
From: Sander @ 2004-03-05  6:58 UTC (permalink / raw)
  To: Ryan Underwood; +Cc: reiserfs-list

Ryan Underwood wrote (ao):
> I have another problem with reiserfs that happens occasionally. On
> reiserfs and only reiserfs partitions, when a crash and journal
> recovery happens, occasionally two files that were open will have
> their contents "exchanged". For example, a piece of the text file I
> was editing ends up in a licq contact file, or an emule download gets
> a piece of a web page from opera in it. Is this a common problem, and
> what is the best way to keep it from happening?

The 'problem' is that reiserfs only does meta-data journaling. This
gives you a consistent filesystem after a crash from the computers'
point of view. Your data is still a bit toast though.

Nowadays (with a current kernel and some patches?) you have a few extra
mount options for reiserfs. The default with the old kernels is
'data=writeback'. In this mode the data can be written after the
metadata is commited.

With the patched kernel, the default is 'data=ordered'. In this mode the
data hits the disk before the metadata hits the disk. This should give
much more consistent data after a crash.

With the patched kernel you can also choose 'data=journal'. Then
meta-data and data is journalled, and thus fs consistency is more or
less certain.

In general it is assumed that the first one is the fastest, and the last
one the slowest, but in some situations this differs.

The new default (ordered) is a great improvement IMHO.

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some ReiserFS failure situations and questions
  2004-03-05  6:58 ` Sander
@ 2004-03-05  7:08   ` Ryan Underwood
  2004-03-05 20:37     ` Mike Fedyk
  0 siblings, 1 reply; 7+ messages in thread
From: Ryan Underwood @ 2004-03-05  7:08 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 162 bytes --]


Thank you Sander, you have been very helpful.  Sounds like I will be
looking into that 'ordered' mount option.

-- 
Ryan Underwood, <nemesis@icequake.net>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some ReiserFS failure situations and questions
  2004-03-05  7:08   ` Ryan Underwood
@ 2004-03-05 20:37     ` Mike Fedyk
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Fedyk @ 2004-03-05 20:37 UTC (permalink / raw)
  To: Ryan Underwood; +Cc: reiserfs-list

Ryan Underwood wrote:
> Thank you Sander, you have been very helpful.  Sounds like I will be
> looking into that 'ordered' mount option.

It's been in the suse kernel for ages, and Chris Mason is the guy 
working on it.

It looks like it'll get into the 2.6 kernel eventually.

Though I'd hope suse would submit it for the 2.4 kernel also, since 
that's where it has been working in the suse production kernel.

Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-03-05 20:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-03 22:48 Some ReiserFS failure situations and questions Ryan Underwood
2004-03-04  9:37 ` Hans Reiser
2004-03-04 22:09   ` Ryan Underwood
2004-03-05  6:47     ` Sander
2004-03-05  6:58 ` Sander
2004-03-05  7:08   ` Ryan Underwood
2004-03-05 20:37     ` Mike Fedyk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.