All of lore.kernel.org
 help / color / mirror / Atom feed
* Scrambled Files
@ 2002-04-23 21:05 Ross Vandegrift
  2002-04-23 21:46 ` Michael Carmack
  0 siblings, 1 reply; 6+ messages in thread
From: Ross Vandegrift @ 2002-04-23 21:05 UTC (permalink / raw)
  To: reiserfs-list

Hello all,

	A drive crashed in a RAID1 array today, taking down an IDE bus
and locking the box it was in.  ReiserFS is installed on this array.
The box is a Debian woody machine, and was in the middle of running
dpkg.

	Files all over the system have been scrabled - replaced with
garbage text, other executables, sections of data files.  There doesn't
seem to be a pattern to the file corruption.  There are no messages in
the logs.

	Unfortunately, this is a pretty critical box and I need to start
restoring its functionality ASAP.  I don't really have time to debug the
problem.  I've extracted the filesystem metadata and put it up at
http://willow.seitz.com/~ross/metadata.bz2 if anyone thinks there would
be relevant information.

	The machine is an AMD-K6-2/500 with 168M of RAM.  It's running
2.4.17 over RAID1.  The operating system is Debian woody - other than
this disk failure, the machine has been rock solid.

Ross Vandegrift
ross@willow.seitz.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Scrambled Files
  2002-04-23 21:05 Scrambled Files Ross Vandegrift
@ 2002-04-23 21:46 ` Michael Carmack
  2002-04-23 23:02   ` Hans Reiser
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Carmack @ 2002-04-23 21:46 UTC (permalink / raw)
  To: Ross Vandegrift; +Cc: reiserfs-list

On Tue, Apr 23, 2002 at 05:05:01PM -0400, Ross Vandegrift wrote:
> 
> 	Files all over the system have been scrabled - replaced with
> garbage text, other executables, sections of data files.  There doesn't
> seem to be a pattern to the file corruption.  There are no messages in
> the logs.

I've read about this before, and it's something I've been meaning to
ask about but never quite got around to. IIRC this is related to the
tail-packing feature of reiserfs, and there's not a whole lot that can
be done to salvage data once it's corrupted. Here are some questions
that maybe the reiser developers can answer:

1. Is it correct that this corruption is due to tail-packing?

2. Is the corruption in any way deterministic? For example, does
   it only affect files that have been modified since the last
   sync, or perhaps files that are in the process of being modified 
   at the time the system goes down?

3. If a reiserfs partition is mounted with the notail option, will 
   it protect against this kind of corruption? How about mounting
   with sync instead of notail?

4. Is it possible to specify notail in /etc/fstab for the root 
   partition? I seem to recall reading something along the lines
   of: "It is not enough to simply specify notail in /etc/fstab
   for the root partition, because notail is only valid for the
   initial mount (i.e. mount -o,remount / will not pick up notail)."
   Is this true? If so, how can one turn on notail for the root 
   partition?

5. Are there any performance hits (other than wasted disk space) 
   as a result of turning on notail? I don't mind trading drive
   space for reliability, but speed is important to me. What exactly
   are the consequences of specifying notail?

These are important questions for me, as I am currently using reiserfs
exclusively on all of my machines. I've been doing network backups pretty
religiously, but there's always this bit of apprehension in the back of 
my mind for some time about losing data if a machine just happens to die
unexpectedly.

Thanks,
m.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Scrambled Files
  2002-04-23 21:46 ` Michael Carmack
@ 2002-04-23 23:02   ` Hans Reiser
  2002-04-24  3:32     ` Ross Vandegrift
  0 siblings, 1 reply; 6+ messages in thread
From: Hans Reiser @ 2002-04-23 23:02 UTC (permalink / raw)
  To: Michael Carmack; +Cc: Ross Vandegrift, reiserfs-list

Michael Carmack wrote:

>On Tue, Apr 23, 2002 at 05:05:01PM -0400, Ross Vandegrift wrote:
>  
>
>>	Files all over the system have been scrabled - replaced with
>>garbage text, other executables, sections of data files.  There doesn't
>>seem to be a pattern to the file corruption.  There are no messages in
>>the logs.
>>    
>>
>
>I've read about this before, and it's something I've been meaning to
>ask about but never quite got around to. IIRC this is related to the
>tail-packing feature of reiserfs, and there's not a whole lot that can
>be done to salvage data once it's corrupted. Here are some questions
>that maybe the reiser developers can answer:
>
>1. Is it correct that this corruption is due to tail-packing?
>
If you crash while writing, garbage possibly appears in the file being 
written to.  Chris is looking at improving this with ordered writes, but 
there is a performance cost for it.  There are those who say it can be 
done without performance cost in V3 by ordering writes.  They have not 
yet produced a patch for which this is true, but it won't amaze me if 
they do so eventually.

reiser4 will fix this without much performance cost.

I think tails have nothing to do with it.  Probably the files that are 
all tails have no corruption at all, because they are journaled.  

If you want to be able to crash while doing an OS upgrade, and not have 
a hosed system, you must have transactions (or take a snapshot before 
doing it).  No FS currently supports transactions, but reiser4 will.

>
>2. Is the corruption in any way deterministic? For example, does
>   it only affect files that have been modified since the last
>   sync, or perhaps files that are in the process of being modified 
>   at the time the system goes down?
>
>3. If a reiserfs partition is mounted with the notail option, will 
>   it protect against this kind of corruption? How about mounting
>   with sync instead of notail?
>
>4. Is it possible to specify notail in /etc/fstab for the root 
>   partition? I seem to recall reading something along the lines
>   of: "It is not enough to simply specify notail in /etc/fstab
>   for the root partition, because notail is only valid for the
>   initial mount (i.e. mount -o,remount / will not pick up notail)."
>   Is this true? If so, how can one turn on notail for the root 
>   partition?
>
>5. Are there any performance hits (other than wasted disk space) 
>   as a result of turning on notail? I don't mind trading drive
>   space for reliability, but speed is important to me. What exactly
>   are the consequences of specifying notail?
>
>These are important questions for me, as I am currently using reiserfs
>exclusively on all of my machines. I've been doing network backups pretty
>religiously, but there's always this bit of apprehension in the back of 
>my mind for some time about losing data if a machine just happens to die
>unexpectedly.
>
>Thanks,
>m.
>
>
>
>
>  
>




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Scrambled Files
  2002-04-23 23:02   ` Hans Reiser
@ 2002-04-24  3:32     ` Ross Vandegrift
  2002-04-24 12:26       ` Kuba Ober
  0 siblings, 1 reply; 6+ messages in thread
From: Ross Vandegrift @ 2002-04-24  3:32 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Michael Carmack, reiserfs-list

On Wed, Apr 24, 2002 at 03:02:54AM +0400, Hans Reiser wrote:
> Michael Carmack wrote:
> >On Tue, Apr 23, 2002 at 05:05:01PM -0400, Ross Vandegrift wrote:
> >
> >>	Files all over the system have been scrabled - replaced with
> >>garbage text, other executables, sections of data files.  There doesn't
> >>seem to be a pattern to the file corruption.  There are no messages in
> >>the logs.
> >
> >I've read about this before, and it's something I've been meaning to
> >ask about but never quite got around to. IIRC this is related to the
> >tail-packing feature of reiserfs, and there's not a whole lot that can
> >be done to salvage data once it's corrupted. Here are some questions
> >that maybe the reiser developers can answer:
> >
> >1. Is it correct that this corruption is due to tail-packing?
> >
> If you crash while writing, garbage possibly appears in the file being 
> written to.

I'm curious - does it have anything to do with mmaped files?  I've been
using ReiserFS for a very long time on my home workstation where
I use very few applications that mmap files (that I know of...).  OTOH
dpkg does mmap files.

I'm not sure why I think this could be the cause.  Has there been bug
reports about mmaping files before?

Over the years I've had more than a few crashes during write activity -
the only time it resulted in any data loss was due to the 2.4.5 infamous
umount bug.  Perhaps I've just been lucky?

> >2. Is the corruption in any way deterministic? For example, does
> >  it only affect files that have been modified since the last
> >  sync, or perhaps files that are in the process of being modified 
> >  at the time the system goes down?

In my case it seemed to affect file that were either 1) recently written
(probably in cache somewhere) 2) being written.

For example, apt-get was messing around with xfs when it died.  xfs was
completely hosed, as were a bunch of things related to X.  However, the
files were hosed with completely non-deterministic contents - german
text, what appeared to be UUEncoded binaries, random data, as well as
some kind of regular-looking data.  It wasn't like a bunch of the writes
got crossed and files ended up backwards.  It also wasn't like random
junk was written.

Ross Vandegrift
ross@willow.seitz.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Scrambled Files
  2002-04-24  3:32     ` Ross Vandegrift
@ 2002-04-24 12:26       ` Kuba Ober
  2002-04-24 18:42         ` Ross Vandegrift
  0 siblings, 1 reply; 6+ messages in thread
From: Kuba Ober @ 2002-04-24 12:26 UTC (permalink / raw)
  To: Ross Vandegrift; +Cc: reiserfs-list

> > >2. Is the corruption in any way deterministic? For example, does
> > >  it only affect files that have been modified since the last
> > >  sync, or perhaps files that are in the process of being modified
> > >  at the time the system goes down?
>
> In my case it seemed to affect file that were either 1) recently written
> (probably in cache somewhere) 2) being written.
>
> For example, apt-get was messing around with xfs when it died.  xfs was
> completely hosed, as were a bunch of things related to X.  However, the
> files were hosed with completely non-deterministic contents - german
> text, what appeared to be UUEncoded binaries, random data, as well as
> some kind of regular-looking data.  It wasn't like a bunch of the writes
> got crossed and files ended up backwards.  It also wasn't like random
> junk was written.

What you see might be a result of metadata corruption rather than file 
contents corruption. The metadata might be pointing to astray places, which 
are either used or unused-but-previously-written-to places on the drive. It 
may even be that the data from original files is sitting there intact.

As a side note: did you try fsck'ing the filesystem?

Cheers,
Kuba

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Scrambled Files
  2002-04-24 12:26       ` Kuba Ober
@ 2002-04-24 18:42         ` Ross Vandegrift
  0 siblings, 0 replies; 6+ messages in thread
From: Ross Vandegrift @ 2002-04-24 18:42 UTC (permalink / raw)
  To: Kuba Ober; +Cc: reiserfs-list

> > (probably in cache somewhere) 2) being written.
> >
> > For example, apt-get was messing around with xfs when it died.  xfs was
> > completely hosed, as were a bunch of things related to X.  However, the
> > files were hosed with completely non-deterministic contents - german
> > text, what appeared to be UUEncoded binaries, random data, as well as
> > some kind of regular-looking data.  It wasn't like a bunch of the writes
> > got crossed and files ended up backwards.  It also wasn't like random
> > junk was written.
> 
> What you see might be a result of metadata corruption rather than file 
> contents corruption. The metadata might be pointing to astray places, which 
> are either used or unused-but-previously-written-to places on the drive. It 
> may even be that the data from original files is sitting there intact.
> 
> As a side note: did you try fsck'ing the filesystem?
> 
> Cheers,
> Kuba

No, I figured this would probably fix the problems - unfortunately, the
machine's availability is pretty important.  So instead of gambling on
fsck (which the reiserfs devottee in me wanted to do), I started going
about restoring the system from backups and original media (as the
employee in me knew had to be done ::-)

Ross

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-04-24 18:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-23 21:05 Scrambled Files Ross Vandegrift
2002-04-23 21:46 ` Michael Carmack
2002-04-23 23:02   ` Hans Reiser
2002-04-24  3:32     ` Ross Vandegrift
2002-04-24 12:26       ` Kuba Ober
2002-04-24 18:42         ` Ross Vandegrift

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.