Corrupted/unreadable journal: reiser vs. ext3

All of lore.kernel.org
 help / color / mirror / Atom feed

* Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-11 18:59 Dirk Schenkewitz
  2003-02-11 20:27 ` Hans Reiser
  2003-02-12 10:11 ` trolling Alexander Lyamin
  0 siblings, 2 replies; 39+ messages in thread
From: Dirk Schenkewitz @ 2003-02-11 18:59 UTC (permalink / raw)
  To: Reiserfs List

Hi Guys,

Recently I read about ReiserFS V4, taking that as a reason to take
a look at ReiserFS again. But I'm not sure if it's worth to switch
from ext3/ext2 to reiser. Because:

More than a year ago, I made up one reiser-partition for playing
around. Well, first there seemed to be nothing special about it.
Then, one day, it suddenly couldn't read its journal anymore,
which prevented the system from booting. (about 2 weeks later I
discovered why: a bad power supply had caused physical damage to
that area of the hard disk) For some reason I don't recall anymore,
I couldn't find a reiserfsck or such. I found no way to get around
the case of a corrupted/unreadable journal.

Luckily, the partition was nearly empty, so I put on an ext3 system
on that partition. That went fine for just a few days, than the bad
disk area (which now held the ext3-journal) decided to strike again.
But guess what happened:
While booting the next time, the ext3 code discovered that the jour-
nal was unreadable (watching that, I thought "oh shit, not again" -
for less than a second), put out a short message stating that and 
that it will continue as ext2. No painfull attempts to recover the
journal - it just dropped it and continued, taking only a few seconds 
for that.
No data was lost! I sat there for some time, staring at the screen,
hardly believing it.

After that, I removed reiser-support from the kernels I used and
since then I only used ext3. If I lost some data since then, it was
only because I accidentally deleted it - there seems to be no way
to recover anything from ext3 (unlike ext2).

Because I have large amounts of data, reliability and solidness of
a filesystem are the most important things to me, then comes space-
efficiency, then speed. Sometimes some of my filesystems get 100%
full, having only some kilobytes left (of, say, 8Gig) until I clean
up. That's my personal situation & experiences.

Now my questions:
From reading the mails from this list, I suspect that a ReiserFS:
 - will sport poor performance (whatever that means, in terms of 
   absolute speed) if it gets more than 96% full. (*1*)
 - will fall far behind ext3 when it comes to reliability, robust-
   ness and crash recovery (at least when fsck is involved), 
 - and will have even more trouble (which may lead to complete fai-
   lure) if the journal cannot be accessed.
Is any of this still true?

(*1*): What if the filesystem contains rather large files, like
       CD-images, MP3s and such, filling it up completely ? Will
       it still slow down?

From what I wrote, you may think that I have some prejudice against
ReiserFS. That's true, I have, because I had a bad experience with
it. Anyway, if you (the developers and/or other people reading here)
can say that nowadays ReiserFS is better than ext3, even under my
personal harsh circumstances, I will give it another try. And now,
feel free to flame me. :-)

happy coding
	dirk
-- 
Dirk Schenkewitz 

InterFace AG                 fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16            fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching         
http://www.interface-ag.de   mailto:dirk.schenkewitz@interface-ag.de

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 18:59 Corrupted/unreadable journal: reiser vs. ext3 Dirk Schenkewitz
@ 2003-02-11 20:27 ` Hans Reiser
  2003-02-11 21:30   ` Mike Hodson
  2003-02-12 10:11 ` trolling Alexander Lyamin
  1 sibling, 1 reply; 39+ messages in thread
From: Hans Reiser @ 2003-02-11 20:27 UTC (permalink / raw)
  To: Dirk Schenkewitz; +Cc: Reiserfs List, Vitaly Fertman

I have to tell you that if you aren't willing to run fsck for reiserfs 
in response to disk corruption, but you are willing to switch to a 
filesystem (ext2) that runs fsck at every boot, I don't have a lot of 
sympathy.  Vitaly can comment more.

Hans

Dirk Schenkewitz wrote:

>Hi Guys,
>
>Recently I read about ReiserFS V4, taking that as a reason to take
>a look at ReiserFS again. But I'm not sure if it's worth to switch
>from ext3/ext2 to reiser. Because:
>
>More than a year ago, I made up one reiser-partition for playing
>around. Well, first there seemed to be nothing special about it.
>Then, one day, it suddenly couldn't read its journal anymore,
>which prevented the system from booting. (about 2 weeks later I
>discovered why: a bad power supply had caused physical damage to
>that area of the hard disk) For some reason I don't recall anymore,
>I couldn't find a reiserfsck or such. I found no way to get around
>the case of a corrupted/unreadable journal.
>
>Luckily, the partition was nearly empty, so I put on an ext3 system
>on that partition. That went fine for just a few days, than the bad
>disk area (which now held the ext3-journal) decided to strike again.
>But guess what happened:
>While booting the next time, the ext3 code discovered that the jour-
>nal was unreadable (watching that, I thought "oh shit, not again" -
>for less than a second), put out a short message stating that and 
>that it will continue as ext2. No painfull attempts to recover the
>journal - it just dropped it and continued, taking only a few seconds 
>for that.
>No data was lost! I sat there for some time, staring at the screen,
>hardly believing it.
>
>After that, I removed reiser-support from the kernels I used and
>since then I only used ext3. If I lost some data since then, it was
>only because I accidentally deleted it - there seems to be no way
>to recover anything from ext3 (unlike ext2).
>
>Because I have large amounts of data, reliability and solidness of
>a filesystem are the most important things to me, then comes space-
>efficiency, then speed. Sometimes some of my filesystems get 100%
>full, having only some kilobytes left (of, say, 8Gig) until I clean
>up. That's my personal situation & experiences.
>
>Now my questions:
>>From reading the mails from this list, I suspect that a ReiserFS:
> - will sport poor performance (whatever that means, in terms of 
>   absolute speed) if it gets more than 96% full. (*1*)
> - will fall far behind ext3 when it comes to reliability, robust-
>   ness and crash recovery (at least when fsck is involved), 
> - and will have even more trouble (which may lead to complete fai-
>   lure) if the journal cannot be accessed.
>Is any of this still true?
>
>(*1*): What if the filesystem contains rather large files, like
>       CD-images, MP3s and such, filling it up completely ? Will
>       it still slow down?
>
>>From what I wrote, you may think that I have some prejudice against
>ReiserFS. That's true, I have, because I had a bad experience with
>it. Anyway, if you (the developers and/or other people reading here)
>can say that nowadays ReiserFS is better than ext3, even under my
>personal harsh circumstances, I will give it another try. And now,
>feel free to flame me. :-)
>
>happy coding
>	dirk
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 20:27 ` Hans Reiser
@ 2003-02-11 21:30   ` Mike Hodson
  2003-02-11 21:47     ` Hans Reiser
                       ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Mike Hodson @ 2003-02-11 21:30 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list

> I have to tell you that if you aren't willing to run fsck for reiserfs 
> in response to disk corruption, but you are willing to switch to a 
> filesystem (ext2) that runs fsck at every boot, I don't have a lot of 
> sympathy.  Vitaly can comment more.
> 
> Hans

I've used ReiserFS in the past, but have also used ext3 on my user's important
data (/home) after a good chunk of one drive was converted to
sparse/null files due to a screwup stemming from no 'badblocks' support
in reiserfs.  Since then, i've used ext3 as well as Reiser but recently
dropped reiser completely. The main reason I hadn't converted to ext3
was a lack of a free harddrive 20+gb in size to copy all my files to.
Unfortunately the filesystem was turning up errors all over the place,
including random sparse files growing out of nowhere, (even my mail
queue) and files that the filesystem could not access (even ls returned
'Permission Denied').  In the time since my first filesystem was nulled
due to a reiserfsck, I hadn't fsck'd the main drive as I feared bad
blocks may be to blame for the filesystem inconsistency. After 4 failed
attempts (kernel panicks) to copy files between old and new drives, I
finally decided to run an fsck.  After fscking with --rebuild-tree, it
found many errors, and corrupt directory entries, and chucked about 150
files into /lost+found/.  Most were from the mail server, owned by
vpopmail.vchkpw, though others were from different websites I run and
some were even from /dev/. But, none of the existing files were made
sparse and I was able to completely copy the remaining files.
After this I completely re-checked the drive with Maxtor's disk tools
disk, and it showed that the drive was 'certified error-free'. After
seeing this, I have come to respect reiserfs even less than after my
/home/ drive got converted to nothing but nulls.

In my years of running ext2 and ext3, I can't see any reason why you
would think they require fscks at every reboot. In the time that I've
ran both ext2/ext3 and reiser, the only times ive had to fsck ext2 was
after unclean unmounting. I've never had to run it on ext3. As for
reiser, I've had to run it a few times, but each time it either
destroyed data or fixed a very large number of errors that a journalled
filesystem should not have.

I guess my point is, If reiserfs can't keep consistency without
requiring periodic fscks, you have no argument making a statement  that
says ext2 is worse as it requires them every boot. Even if your
statement was not false (I know for a fact that ext2 does -not- perform
fscks on every boot, it only will do it after an unclean unmount), you
still would have no basis to say that  ext2 is worse due to the
requirement of periodic fscks. 

In conclusion, I don't think i'll be using your filesystem even as a
testbed for new features, due to poor reliability under everyday
circumstances. In addition, check your facts before making a statement
that insults someone and claims something that is completely untrue.
-- 
Mike Hodson  <mike@mystica.cx>  ICQ: 18145059

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 21:30   ` Mike Hodson
@ 2003-02-11 21:47     ` Hans Reiser
  2003-02-11 21:58     ` Hans Reiser
  2003-02-11 23:11     ` Adam Goryachev
  2 siblings, 0 replies; 39+ messages in thread
From: Hans Reiser @ 2003-02-11 21:47 UTC (permalink / raw)
  To: Mike Hodson; +Cc: reiserfs-list

Mike Hodson wrote:

>
>In my years of running ext2 and ext3, I can't see any reason why you
>would think they require fscks at every reboot. 
>

Sorry, my experience with kernel hacking (I used ext2 for a long time 
while debugging reiserfs)  led me to forget that it is possible to 
reboot for reasons other than that the kernel oopsed on my latest 
changes to it.;-)

Now that we can user reiser3 while debugging reiser4, oopses are much 
less painful.

I regret that you had a bad experience with reiserfs.  I have never seen 
reiserfs convert a partition to nothing but nulls.  Are you sure that 
you were not repartitioning at the time you had this experience?

I don't mean to slam ext2, sorry if it sounded like that, I just don't 
think that being unwilling to run fsck after bad blocks occur is a 
reasonable complaint about our design.

Unfortunately, fsck programs take a long time to mature, and ext2's fsck 
is more mature than ours.  It is important that users use the latest fsck.

With reiser4 we have built into the node format a number of features 
that will make reiser4 fsck more effective than reiser3.

-- 
Hans

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 21:30   ` Mike Hodson
  2003-02-11 21:47     ` Hans Reiser
@ 2003-02-11 21:58     ` Hans Reiser
  2003-02-12  6:35       ` Oleg Drokin
  2003-02-11 23:11     ` Adam Goryachev
  2 siblings, 1 reply; 39+ messages in thread
From: Hans Reiser @ 2003-02-11 21:58 UTC (permalink / raw)
  To: Mike Hodson; +Cc: reiserfs-list, vs, Oleg Drokin

Mike Hodson wrote:

>After this I completely re-checked the drive with Maxtor's disk tools
>disk, and it showed that the drive was 'certified error-free'. 
>
This does not mean that there were no bad blocks that were remapped, 
does it? 

If you have data corruption that we can analyze, please contact us. 

We believe that our current release is very stable.  We have one known 
bug relating to unlink we are still working on, fsck still gets bug 
reports, all the linux journaling filesystems have trouble with write 
caching being turned on (this is being fixed in the latest 2.5 kernel, 
and some have argued over whether it is a bug or a lack of a feature) 
because they don't know how to flush disk caches on commit, and other 
than that we simply aren't getting bug reports for V3 in 2.4 (oleg's 
write performance improvements have gotten some bug reports, but those 
aren't in the stable kernel yet).

Oleg and Vladimir, what is the status of the unlink bug?  I'd prefer to 
say that we have no bugs at all....

-- 
Hans

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 21:58     ` Hans Reiser
@ 2003-02-12  6:35       ` Oleg Drokin
  0 siblings, 0 replies; 39+ messages in thread
From: Oleg Drokin @ 2003-02-12  6:35 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Mike Hodson, reiserfs-list, vs

Hello!

On Wed, Feb 12, 2003 at 12:58:53AM +0300, Hans Reiser wrote:

> Oleg and Vladimir, what is the status of the unlink bug?  I'd prefer to 

Vladimir is investigating it. He does not know the reason of it yet.
BTW, while the bug is the bug and should of course be fixed, that particular one
is not causing any data corruption or similar stuff. It manifests itself in
directory entries pointing to nowhere (annoying once happens) under
certain workloads (I believe it starts to happen in low memory situations).

> say that we have no bugs at all....

Sure, we all would love to be able to say this.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 21:30   ` Mike Hodson
  2003-02-11 21:47     ` Hans Reiser
  2003-02-11 21:58     ` Hans Reiser
@ 2003-02-11 23:11     ` Adam Goryachev
  2003-02-11 23:17       ` Anders Widman
  2003-02-12  1:02       ` Mike Hodson
  2 siblings, 2 replies; 39+ messages in thread
From: Adam Goryachev @ 2003-02-11 23:11 UTC (permalink / raw)
  To: reiserfs-list

> I've used ReiserFS in the past, but have also used ext3 on my
> user's important
> data (/home) after a good chunk of one drive was converted to
> sparse/null files due to a screwup stemming from no 'badblocks' support
> in reiserfs.  Since then, i've used ext3 as well as Reiser but recently

I can't comment on your experience, but personally if I have a drive with
any number of badblocks (which are showing up to the fs layer, not invisibly
re-mapped by the drive) then I take the drive back and get a replacement, or
bin the drive.

> dropped reiser completely. The main reason I hadn't converted to ext3
> was a lack of a free harddrive 20+gb in size to copy all my files to.

I guess that is also a hardware issue, I had the same issue in trying to
convert from ext2 to reiser on a 80GB RAID5 partition...

> Unfortunately the filesystem was turning up errors all over the place,
> including random sparse files growing out of nowhere, (even my mail
> queue) and files that the filesystem could not access (even ls returned
> 'Permission Denied').  In the time since my first filesystem was nulled

I suppose that if you continue to have a growing number of FS errors, then
you either have faulty hardware, or are using a buggy version of
software.... If you already admit to badblocks, then I would blame
hardware..

> due to a reiserfsck, I hadn't fsck'd the main drive as I feared bad
> blocks may be to blame for the filesystem inconsistency. After 4 failed
> attempts (kernel panicks) to copy files between old and new drives, I
> finally decided to run an fsck.  After fscking with --rebuild-tree, it
> found many errors, and corrupt directory entries, and chucked about 150
> files into /lost+found/.  Most were from the mail server, owned by
> vpopmail.vchkpw, though others were from different websites I run and
> some were even from /dev/. But, none of the existing files were made
> sparse and I was able to completely copy the remaining files.

Hmmm, so apart from finding a number of errors and doing it's best to fix
them (putting them into lost+found) and recovering all of your other files
even with hardware issues present, you recovered all of your data? The
problem here is?

> After this I completely re-checked the drive with Maxtor's disk tools
> disk, and it showed that the drive was 'certified error-free'. After
> seeing this, I have come to respect reiserfs even less than after my
> /home/ drive got converted to nothing but nulls.

If a harddrive is showing badblocks and then the disk vendors tool shows no
errors, I think a simple dd over the whole disk or similar would really show
the true story....

> In my years of running ext2 and ext3, I can't see any reason why you
> would think they require fscks at every reboot. In the time that I've
> ran both ext2/ext3 and reiser, the only times ive had to fsck ext2 was
> after unclean unmounting. I've never had to run it on ext3. As for

Well, I had my share of ext2 doing an fsck after reboot, and it wasn't nice
on a 80GB partition... Sure, usually this is after a crash, which generally
is the *worst* time to have to run a fsck (ie, this just drags out the
unscheduled downtime).

> reiser, I've had to run it a few times, but each time it either
> destroyed data or fixed a very large number of errors that a journalled
> filesystem should not have.

I suppose your real issue is that you used faulty hardware. I wouldn't
expect any FS (journalling or otherwise) to be able to work faultlessly with
faulty hardware. Of course, depending on the faulty hardware, it would
probably affect different FS'es differently (depdning on where the faults
are on the disk and what data that FS tries to store on it).

> I guess my point is, If reiserfs can't keep consistency without
> requiring periodic fscks, you have no argument making a statement  that
> says ext2 is worse as it requires them every boot. Even if your
> statement was not false (I know for a fact that ext2 does -not- perform
> fscks on every boot, it only will do it after an unclean unmount), you
> still would have no basis to say that  ext2 is worse due to the
> requirement of periodic fscks.

ext2 requires periodic fsck's which are also rather dis-concerting when you
are never quite sure whether you need to allow a few hours downtime for a
kernel upgrade this time, or perhaps it is only a few minutes.

> In conclusion, I don't think i'll be using your filesystem even as a
> testbed for new features, due to poor reliability under everyday
> circumstances. In addition, check your facts before making a statement
> that insults someone and claims something that is completely untrue.

I wish you the best of luck with your chosen software and hardware, of
course, my experience is different, and the only place I use ext3 is when I
can't convert it, or I don't really care (ie, it's just the / fs or
something, not data/server that I care about). I haven't had any issues with
reiserfs for a long time...

Finally, I find it interesting to see people who swear they will never use
reiserfs again to be on the reiserfs mailing list :)

Regards,
Adam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 23:11     ` Adam Goryachev
@ 2003-02-11 23:17       ` Anders Widman
  2003-02-12  0:12         ` Hans Reiser
                           ` (4 more replies)
  2003-02-12  1:02       ` Mike Hodson
  1 sibling, 5 replies; 39+ messages in thread
From: Anders Widman @ 2003-02-11 23:17 UTC (permalink / raw)
  To: reiserfs-list

>> I've used ReiserFS in the past, but have also used ext3 on my
>> user's important
>> data (/home) after a good chunk of one drive was converted to
>> sparse/null files due to a screwup stemming from no 'badblocks' support
>> in reiserfs.  Since then, i've used ext3 as well as Reiser but recently

> I can't comment on your experience, but personally if I have a drive with
> any number of badblocks (which are showing up to the fs layer, not invisibly
> re-mapped by the drive) then I take the drive back and get a replacement, or
> bin the drive.

However,  the FS SHOULD support handling of bad blocks/clusters at the
FS  layer,  even  while running in a production system. Bad blocks can
pop  up  at any give time for no particular reason, and it is at these
times  you  (we) need a strong and reliable filesystem that can handle
and logically remap broken blocks/sectors.

Sure,  a  disk  with physical errors should be replaced, but until you
find out about the error on the drive the FS HAS TO HANDLE these kinds
of problems.

 - Anders

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 23:17       ` Anders Widman
@ 2003-02-12  0:12         ` Hans Reiser
  2003-02-12 10:23           ` Anders Widman
  2003-02-12  5:12         ` Ross Vandegrift
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 39+ messages in thread
From: Hans Reiser @ 2003-02-12  0:12 UTC (permalink / raw)
  To: Anders Widman; +Cc: reiserfs-list

Anders Widman wrote:

>>>I've used ReiserFS in the past, but have also used ext3 on my
>>>user's important
>>>data (/home) after a good chunk of one drive was converted to
>>>sparse/null files due to a screwup stemming from no 'badblocks' support
>>>in reiserfs.  Since then, i've used ext3 as well as Reiser but recently
>>>      
>>>
>
>  
>
>>I can't comment on your experience, but personally if I have a drive with
>>any number of badblocks (which are showing up to the fs layer, not invisibly
>>re-mapped by the drive) then I take the drive back and get a replacement, or
>>bin the drive.
>>    
>>
>
>However,  the FS SHOULD support handling of bad blocks/clusters at the
>FS  layer,  even  while running in a production system. Bad blocks can
>pop  up  at any give time for no particular reason, and it is at these
>times  you  (we) need a strong and reliable filesystem that can handle
>and logically remap broken blocks/sectors.
>
>Sure,  a  disk  with physical errors should be replaced, but until you
>find out about the error on the drive the FS HAS TO HANDLE these kinds
>of problems.
>
> - Anders
>
>
>
>
>
>  
>
We have gotten better at this over time.  There was a point in time when 
some of our guys reviewed all the bad block handling.  We still find 
cases where we could be better though. 

For some users it would be better to boot to a corrupted filesystem 
because running fsck is more of a problem than putting their data at 
higher risk.  For datalogging, it is probably conceivable to just toss 
the journal and lose the more recent updates to it.  For the default 
metadata journaling, this just does not seem prudent.

I really prefer making users understand that they have a problem they 
need to do something about.  This is just my style.  I want them to fail 
to boot, and after some effort learn that there is this thing called 
fsck, and dd_rescue, and that it is time to buy another hard drive and 
chuck their current one.  It would be best though if they were given 
detailed instructions about how they need to do this when the code hits 
that bad block.  Vitaly, please work on that.

If we handle the journal block error without downtime, the user will 
never chuck the hard drive, and that is bad in the longterm.

-- 
Hans

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12  0:12         ` Hans Reiser
@ 2003-02-12 10:23           ` Anders Widman
  2003-02-12 10:47             ` Hans Reiser
  0 siblings, 1 reply; 39+ messages in thread
From: Anders Widman @ 2003-02-12 10:23 UTC (permalink / raw)
  To: reiserfs-list

>>
>>However,  the FS SHOULD support handling of bad blocks/clusters at the
>>FS  layer,  even  while running in a production system. Bad blocks can
>>pop  up  at any give time for no particular reason, and it is at these
>>times  you  (we) need a strong and reliable filesystem that can handle
>>and logically remap broken blocks/sectors.
>>
>>Sure,  a  disk  with physical errors should be replaced, but until you
>>find out about the error on the drive the FS HAS TO HANDLE these kinds
>>of problems.
>>
>> - Anders
>>
> We have gotten better at this over time.  There was a point in time when
> some of our guys reviewed all the bad block handling.  We still find
> cases where we could be better though.

  I  never  intended to blame ReiserFS, just mate the point in general
  for any FS :)

> For some users it would be better to boot to a corrupted filesystem 
> because running fsck is more of a problem than putting their data at
> higher risk.  For datalogging, it is probably conceivable to just toss
> the journal and lose the more recent updates to it.  For the default
> metadata journaling, this just does not seem prudent.

> I really prefer making users understand that they have a problem they
> need to do something about.  This is just my style.  I want them to fail
> to boot, and after some effort learn that there is this thing called
> fsck, and dd_rescue, and that it is time to buy another hard drive and
> chuck their current one.  It would be best though if they were given
> detailed instructions about how they need to do this when the code hits
> that bad block.  Vitaly, please work on that.

> If we handle the journal block error without downtime, the user will
> never chuck the hard drive, and that is bad in the longterm.

  But  a  user  never  knows  he  has  a media error before his system
  crashes  (or  do  a surface scan), or monitor his logs very closely.
  Not all users does this.

  To  me  a FS should be able to handle both read and write errors and
  be  able to reallocate these errors to a sane are of the media. When
  this  occurs then it should be noted in the kernel log (or similar).

  Then  users can run a cron job to monitor the log after exactly this
  error message.

  The  whole  point  is  this that errors can occur at any time when a
  system  is  up  and  running and it _always_ takes some time for the
  user  to  react  and find out about the problem. In the time between
  the  error  has  occurred  and  the  when the user finds out and can
  administer it, then we need a solid and secure FS that can manage to
  run  the  system and protect the data (which is why we choose one FS
  over the other).

  To  my  knowledge  only  Windows  with  NTFS  can handle just this -
  relocating  bad  blocks  on  the fly and notifying the user such has
  happened.

  - Anders

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 10:23           ` Anders Widman
@ 2003-02-12 10:47             ` Hans Reiser
  2003-02-12 11:12               ` Adam Goryachev
  0 siblings, 1 reply; 39+ messages in thread
From: Hans Reiser @ 2003-02-12 10:47 UTC (permalink / raw)
  To: Anders Widman; +Cc: reiserfs-list

Anders Widman wrote:

>
>  
>
>>If we handle the journal block error without downtime, the user will
>>never chuck the hard drive, and that is bad in the longterm.
>>    
>>
>
>  But  a  user  never  knows  he  has  a media error before his system
>  crashes  (or  do  a surface scan), or monitor his logs very closely.
>  Not all users does this.
>
>  To  me  a FS should be able to handle both read and write errors and
>  be  able to reallocate these errors to a sane are of the media. When
>  this  occurs then it should be noted in the kernel log (or similar).
>
>  Then  users can run a cron job to monitor the log after exactly this
>  error message.
>
>  The  whole  point  is  this that errors can occur at any time when a
>  system  is  up  and  running and it _always_ takes some time for the
>  user  to  react  and find out about the problem. In the time between
>  the  error  has  occurred  and  the  when the user finds out and can
>  administer it, then we need a solid and secure FS that can manage to
>  run  the  system and protect the data (which is why we choose one FS
>  over the other).
>
>  To  my  knowledge  only  Windows  with  NTFS  can handle just this -
>  relocating  bad  blocks  on  the fly and notifying the user such has
>  happened.
>
>  - Anders
>  
>
>
>
>
>
>  
>
Yes, you are probably right, we should do it for those cases where it is 
feasible. 

-- 
Hans



^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 10:47             ` Hans Reiser
@ 2003-02-12 11:12               ` Adam Goryachev
  2003-02-12 13:42                 ` Anders Widman
  2003-02-12 16:39                 ` Corrupted/unreadable journal: reiser vs. ext3 Sam Vilain
  0 siblings, 2 replies; 39+ messages in thread
From: Adam Goryachev @ 2003-02-12 11:12 UTC (permalink / raw)
  To: reiserfs-list

> Anders Widman wrote:
> >>If we handle the journal block error without downtime, the user will
> >>never chuck the hard drive, and that is bad in the longterm.
> >  But  a  user  never  knows  he  has  a media error before his system
> >  crashes  (or  do  a surface scan), or monitor his logs very closely.
> >  Not all users does this.
> >
> >  To  me  a FS should be able to handle both read and write errors and
> >  be  able to reallocate these errors to a sane are of the media. When
> >  this  occurs then it should be noted in the kernel log (or similar).
> >
> >  Then  users can run a cron job to monitor the log after exactly this
> >  error message.

[SNIP]

While this is all perfectly true, there also remains the question of "If we
*know* the media is faulty in this spot, how do we know the media isn't
faulty in these spots?"

ie, once you are on faulty hardware, you never really know what is the
correct course of action for this specific fault.

I can conceive of a few things that *might* be the right thing in various
circumstances:

A) Immediately re-mount the drive read-only, and wait for the sysadmin to
either re-mount rw or to do some other data recovery/repair

B) Immediately dis-mount the drive and wait

C) OK, I tried to write to sector 1324 so lets just try each consecutive
available sector until it doesn't return an error (possibly marking the
sectors bad/used as we go)

D) Just return an error to the application

Of course, all of the above would *also* log an error message to the kernel
log.

Now, some options might be suitable if for example the head of the disk has
crashed into the surface and hence the more you attempt to
read/write/seek/etc the more damage you do (as opposed to immediately
stopping all access to the disk and thereby preserving the data).
Some options are more suitable for example you have a system which
absolutely needs to run 20 hours/7days and it is only writing to disk as a
informational log.

Of course, it all depends on the reason the hardware error has showed it's
ugly face, and what your individual circumstances are. Perhaps re-mapping is
just making the corruption worse (for every write the drive gets more
confused and scribbles in the wrong spot)...

Like most things MS do, they just take the common approach and pretend that
it will work for everyone.

Just my 0.02c worth.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 11:12               ` Adam Goryachev
@ 2003-02-12 13:42                 ` Anders Widman
  2003-02-12 14:15                   ` Russell Coker
  2003-02-12 16:39                 ` Corrupted/unreadable journal: reiser vs. ext3 Sam Vilain
  1 sibling, 1 reply; 39+ messages in thread
From: Anders Widman @ 2003-02-12 13:42 UTC (permalink / raw)
  To: reiserfs-list

>> Anders Widman wrote:
>> >>If we handle the journal block error without downtime, the user will
>> >>never chuck the hard drive, and that is bad in the longterm.
>> >  But  a  user  never  knows  he  has  a media error before his system
>> >  crashes  (or  do  a surface scan), or monitor his logs very closely.
>> >  Not all users does this.
>> >
>> >  To  me  a FS should be able to handle both read and write errors and
>> >  be  able to reallocate these errors to a sane are of the media. When
>> >  this  occurs then it should be noted in the kernel log (or similar).
>> >
>> >  Then  users can run a cron job to monitor the log after exactly this
>> >  error message.

> [SNIP]

> While this is all perfectly true, there also remains the question of "If we
> *know* the media is faulty in this spot, how do we know the media isn't
> faulty in these spots?"

> ie, once you are on faulty hardware, you never really know what is the
> correct course of action for this specific fault.

> I can conceive of a few things that *might* be the right thing in various
> circumstances:

> A) Immediately re-mount the drive read-only, and wait for the sysadmin to
> either re-mount rw or to do some other data recovery/repair

This  would be devastating for some form of servers and cause downtime
and  loss  of  work  for  plenty. Consider an application server, or a
document  server within a company. If writes were prohibited then lots
of programs do stop function, or saving fails.

There  are  millions  of other examples I can think of too, that would
make me not choose this option.

Unplanned downtime do cause lot of harm to any business.

> B) Immediately dis-mount the drive and wait

Urg.. Even worse than the above case.

> C) OK, I tried to write to sector 1324 so lets just try each consecutive
> available sector until it doesn't return an error (possibly marking the
> sectors bad/used as we go)

Yes, or use another algorithm to find safe/free sectors.

> D) Just return an error to the application

This  would  at  least  allow  some  services/applications to continue
running. Still, important data and services may not work.

> Of course, all of the above would *also* log an error message to the kernel
> log.

> Now, some options might be suitable if for example the head of the disk has
> crashed into the surface and hence the more you attempt to
> read/write/seek/etc the more damage you do (as opposed to immediately
> stopping all access to the disk and thereby preserving the data).
> Some options are more suitable for example you have a system which
> absolutely needs to run 20 hours/7days and it is only writing to disk as a
> informational log.

Agreed.  There  are  many  different  cases  that  should  be met with
different approaches.

> Of course, it all depends on the reason the hardware error has showed it's
> ugly face, and what your individual circumstances are. Perhaps re-mapping is
> just making the corruption worse (for every write the drive gets more
> confused and scribbles in the wrong spot)...

I  am  not  sure about that. You certainly do not write on top of user
data,  but  on (marked) free space. If the drive cannot write, then no
harm  would  be  done  to  the  user data anyway. Of course in case of
electronics failure/firmware failure then you would perhaps get random
writes  instead  of  where  you want them. However I believe you would
have more serious problems before then (like not being able to read).

> Like most things MS do, they just take the common approach and pretend that
> it will work for everyone.

Easy  to  blame  MS, but Bad Blocks handling from FAT12 and ever since
has probably saved more data than not having it.

Just  because  MS  does  things it does not mean they are bad and that
Linux   world  should  refuse  to  do  something similar. That is just
shooting yourself in the foot.

> Just my 0.02c worth.

How about 0.00c? ;)

- Anders

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 13:42                 ` Anders Widman
@ 2003-02-12 14:15                   ` Russell Coker
  2003-02-12 15:26                     ` Anders Widman
  2003-02-13  3:31                     ` Zygo Blaxell
  0 siblings, 2 replies; 39+ messages in thread
From: Russell Coker @ 2003-02-12 14:15 UTC (permalink / raw)
  To: Anders Widman, reiserfs-list

On Wed, 12 Feb 2003 14:42, Anders Widman wrote:
> > A) Immediately re-mount the drive read-only, and wait for the sysadmin to
> > either re-mount rw or to do some other data recovery/repair
>
> This  would be devastating for some form of servers and cause downtime
> and  loss  of  work  for  plenty. Consider an application server, or a
> document  server within a company. If writes were prohibited then lots
> of programs do stop function, or saving fails.

Servers should have RAID.  There is no excuse.  With RAID the regular disk 
errors should not be an issue.

> Unplanned downtime do cause lot of harm to any business.

It's better to stop when there's a serious error than to blindly continue and 
make things worse.

> Easy  to  blame  MS, but Bad Blocks handling from FAT12 and ever since
> has probably saved more data than not having it.
>
> Just  because  MS  does  things it does not mean they are bad and that
> Linux   world  should  refuse  to  do  something similar. That is just
> shooting yourself in the foot.

The FAT bad block handling was developed when RAID was virtually unknown (and 
not available for PCs) and when no commonly available hard drives had the 
ability to relocate bad blocks (the drive controller received the analog 
signal from the disk heads - there was no abstraction).

Now all machines other than laptops are getting RAID, all hard drives support 
re-mapping bad sectors, and the entire situation is different.

Bad block handling is only needed for laptops.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/    Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 14:15                   ` Russell Coker
@ 2003-02-12 15:26                     ` Anders Widman
  2003-02-12 16:22                       ` bscott
                                         ` (2 more replies)
  2003-02-13  3:31                     ` Zygo Blaxell
  1 sibling, 3 replies; 39+ messages in thread
From: Anders Widman @ 2003-02-12 15:26 UTC (permalink / raw)
  To: reiserfs-list

> On Wed, 12 Feb 2003 14:42, Anders Widman wrote:
>> > A) Immediately re-mount the drive read-only, and wait for the sysadmin to
>> > either re-mount rw or to do some other data recovery/repair
>>
>> This  would be devastating for some form of servers and cause downtime
>> and  loss  of  work  for  plenty. Consider an application server, or a
>> document  server within a company. If writes were prohibited then lots
>> of programs do stop function, or saving fails.

> Servers should have RAID.  There is no excuse.  With RAID the regular disk
> errors should not be an issue.

Yes, most should use some form of hardware redundancy.

>> Unplanned downtime do cause lot of harm to any business.

> It's better to stop when there's a serious error than to blindly continue and
> make things worse.

I  (and  I  think  no  one  else)  never  said  continue blindly. Most
users/workstations do not have RAID and probably never will.

I can take any normal home-user as an example. They use single drives,
but  still  they  need to be able to rely on the filesystem to protect
their   data  and  not stop working if the drive has a few bad blocks.
Not  all users has spare drives, nor can find a replacement right away
so  they  have to be able to use their computers even though there are
bad blocks.

There  are  two  kinds of people I come across when it comes to Linux:

Those  that  say it is perfect and blame everything on the user or the
hardware.

The  others  want to make Linux a viable option for "normal" users and
want Linux to be able to replace Windows or Mac OS. The only way I see
that happen is if Linux starts to get more userfriendly and safe.

- Anders

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 15:26                     ` Anders Widman
@ 2003-02-12 16:22                       ` bscott
  2003-02-12 16:28                       ` Russell Coker
  2003-02-13  3:42                       ` Zygo Blaxell
  2 siblings, 0 replies; 39+ messages in thread
From: bscott @ 2003-02-12 16:22 UTC (permalink / raw)
  To: ReiserFS Mailing List; +Cc: Anders Widman

On Wed, 12 Feb 2003, at 4:26pm, andewid@tnonline.net wrote:
> They use single drives, but still they need to be able to rely on the
> filesystem to protect their data and not stop working if the drive has a
> few bad blocks.

  Question: What should the filesystem software do?

  The filesystem stores data in blocks on disk.  It expects to be able read
and write to them to do its job.  When you take that away, you take away a
fundamental.

> Not all users has spare drives, nor can find a replacement right away so
> they have to be able to use their computers even though there are bad
> blocks.

  "Not all users have UPSes (battery backups), nor can they find a UPS right
away, so they have to be able to use their computers even though the power
is out."

> The only way I see that happen is if Linux starts to get more userfriendly
> and safe.

  A bad disk drive isn't a Linux issue.  It's a hardware issue.  Windows
and/or MacOS cannot magically make a bad disk into a good disk, either.

-- 
Ben Scott <bscott@ntisys.com>
| The opinions expressed in this message are those of the author and do  |
| not represent the views or policy of any other person or organization. |
| All information is provided without warranty of any kind.              |



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 15:26                     ` Anders Widman
  2003-02-12 16:22                       ` bscott
@ 2003-02-12 16:28                       ` Russell Coker
  2003-02-12 16:40                         ` Anders Widman
  2003-02-13  3:42                       ` Zygo Blaxell
  2 siblings, 1 reply; 39+ messages in thread
From: Russell Coker @ 2003-02-12 16:28 UTC (permalink / raw)
  To: Anders Widman, reiserfs-list

On Wed, 12 Feb 2003 16:26, Anders Widman wrote:
> >> Unplanned downtime do cause lot of harm to any business.
> >
> > It's better to stop when there's a serious error than to blindly continue
> > and make things worse.
>
> I  (and  I  think  no  one  else)  never  said  continue blindly. Most
> users/workstations do not have RAID and probably never will.

Hard drive costs are constantly decreasing while the value of data is 
constantly increasing.  I think that the use of RAID will increase steadily.

> The  others  want to make Linux a viable option for "normal" users and
> want Linux to be able to replace Windows or Mac OS. The only way I see
> that happen is if Linux starts to get more userfriendly and safe.

I guess you're not familiar with what NT does then.

NT 3.5x would sometimes get confused about it's data and umount the file 
system in question to avoid the risk of damaging data.

In case of a serious kernel error NT will give a BSOD in situations where 
Linux by default will print an Oops message and continue running.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/    Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 16:28                       ` Russell Coker
@ 2003-02-12 16:40                         ` Anders Widman
  0 siblings, 0 replies; 39+ messages in thread
From: Anders Widman @ 2003-02-12 16:40 UTC (permalink / raw)
  To: reiserfs-list

> On Wed, 12 Feb 2003 16:26, Anders Widman wrote:
>> >> Unplanned downtime do cause lot of harm to any business.
>> >
>> > It's better to stop when there's a serious error than to blindly continue
>> > and make things worse.
>>
>> I  (and  I  think  no  one  else)  never  said  continue blindly. Most
>> users/workstations do not have RAID and probably never will.

> Hard drive costs are constantly decreasing while the value of data is
> constantly increasing.  I think that the use of RAID will increase steadily.

>> The  others  want to make Linux a viable option for "normal" users and
>> want Linux to be able to replace Windows or Mac OS. The only way I see
>> that happen is if Linux starts to get more userfriendly and safe.

> I guess you're not familiar with what NT does then.

> NT 3.5x would sometimes get confused about it's data and umount the file
> system in question to avoid the risk of damaging data.

> In case of a serious kernel error NT will give a BSOD in situations where
> Linux by default will print an Oops message and continue running.

NT3.5  is  a little old to compare a modern OS with, is it not? I have
had numerous Linux kernel crashes that were not recoverable also.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 15:26                     ` Anders Widman
  2003-02-12 16:22                       ` bscott
  2003-02-12 16:28                       ` Russell Coker
@ 2003-02-13  3:42                       ` Zygo Blaxell
  2003-02-13 10:13                         ` Anders Widman
  2 siblings, 1 reply; 39+ messages in thread
From: Zygo Blaxell @ 2003-02-13  3:42 UTC (permalink / raw)
  To: reiserfs-list

In article <46110589437.20030212162613@tnonline.net>,
Anders Widman  <andewid@tnonline.net> wrote:
>The  others  want to make Linux a viable option for "normal" users and
>want Linux to be able to replace Windows or Mac OS. The only way I see
>that happen is if Linux starts to get more userfriendly and safe.

Last time I checked, Windows and Mac OS come to a near total halt when
they see a disk error while doing a write on non-removable media, unless
the application goes to extraordinary lengths to handle the error itself.

Frankly, I used to mount my ext3 filesystems on servers with
'errors=panic', causing a reboot at the very first sign of trouble (past
tense as I now use reiserfs which doesn't like that option ;-).
The sooner the server goes out of production and starts running fsck,
the sooner it will finish running fsck and come back into production
(or, in the worst case, the sooner an admin person will start pulling
out backup tapes and ordering replacement disks).

-- 
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-13  3:42                       ` Zygo Blaxell
@ 2003-02-13 10:13                         ` Anders Widman
  2003-02-13 14:44                           ` Rudy Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Anders Widman @ 2003-02-13 10:13 UTC (permalink / raw)
  To: reiserfs-list

> In article <46110589437.20030212162613@tnonline.net>,
> Anders Widman  <andewid@tnonline.net> wrote:
>>The  others  want to make Linux a viable option for "normal" users and
>>want Linux to be able to replace Windows or Mac OS. The only way I see
>>that happen is if Linux starts to get more userfriendly and safe.

> Last time I checked, Windows and Mac OS come to a near total halt when
> they see a disk error while doing a write on non-removable media, unless
> the application goes to extraordinary lengths to handle the error itself.

Actually no. :) Windows continue to run (ok, maybe now win9x or WinNT,
but  these are old anyway). You can just remove a harddrive in Windows
XP  and  the system continues to run. Or you can add new PCI cards and
Windows will find those too.


> Frankly, I used to mount my ext3 filesystems on servers with
> 'errors=panic', causing a reboot at the very first sign of trouble (past
> tense as I now use reiserfs which doesn't like that option ;-).
> The sooner the server goes out of production and starts running fsck,
> the sooner it will finish running fsck and come back into production
> (or, in the worst case, the sooner an admin person will start pulling
> out backup tapes and ordering replacement disks).









^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-13 10:13                         ` Anders Widman
@ 2003-02-13 14:44                           ` Rudy Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Rudy Zijlstra @ 2003-02-13 14:44 UTC (permalink / raw)
  To: Anders Widman; +Cc: reiserfs-list



On Thu, 13 Feb 2003, Anders Widman wrote:

> > In article <46110589437.20030212162613@tnonline.net>,
> > Anders Widman  <andewid@tnonline.net> wrote:
> >>The  others  want to make Linux a viable option for "normal" users and
> >>want Linux to be able to replace Windows or Mac OS. The only way I see
> >>that happen is if Linux starts to get more userfriendly and safe.
>
> > Last time I checked, Windows and Mac OS come to a near total halt when
> > they see a disk error while doing a write on non-removable media, unless
> > the application goes to extraordinary lengths to handle the error itself.
>
> Actually no. :) Windows continue to run (ok, maybe now win9x or WinNT,
> but  these are old anyway). You can just remove a harddrive in Windows
> XP  and  the system continues to run. Or you can add new PCI cards and
> Windows will find those too.
>
>
Provided you first shut it down, then yes. I am not aware of PC hardware
that will allow you to savely do this with power on the board. Disk removal and
addition also worked using Win2K. And by the way, also using Linux -:)

If you get troubles with the system disk under windows, i do not know
what happens, likely to be interesting... And I have had Linux running
with 1 disk disconnected after it was mounted. unexpexted SCSI disconnect.
All kept working, except for the paritions that were unreachable. Which
happened to be reiserfs and were unharmed.

Cheers

Rudy

P.S. I am getting RAID for that particular system...




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 14:15                   ` Russell Coker
  2003-02-12 15:26                     ` Anders Widman
@ 2003-02-13  3:31                     ` Zygo Blaxell
       [not found]                       ` <20030213113003.7ee7af6e.philippe.gramoulle@mmania.com>
  1 sibling, 1 reply; 39+ messages in thread
From: Zygo Blaxell @ 2003-02-13  3:31 UTC (permalink / raw)
  To: reiserfs-list

In article <200302121515.46953.russell@coker.com.au>,
Russell Coker  <russell@coker.com.au> wrote:
>Now all machines other than laptops are getting RAID, all hard drives support 
>re-mapping bad sectors, and the entire situation is different.

Actually, laptops get RAID too... ;-)

My laptop can have up to 3 2.5" IDE disks simultaneously installed, if
I remove optional equipment such as second batteries and CD-ROM.
"/" is /dev/loop7 (rijndael encryption) on top of /dev/md0 on top of
/dev/hd[ab]2.

-- 
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <20030213113003.7ee7af6e.philippe.gramoulle@mmania.com>]

* Re: rijndael loopback encryption was [Re: Corrupted/unreadable journal: reiser vs. ext3]
       [not found]                       ` <20030213113003.7ee7af6e.philippe.gramoulle@mmania.com>
@ 2003-02-13 18:17                         ` Zygo Blaxell
  0 siblings, 0 replies; 39+ messages in thread
From: Zygo Blaxell @ 2003-02-13 18:17 UTC (permalink / raw)
  To: reiserfs-list

In article <20030213113003.7ee7af6e.philippe.gramoulle@mmania.com>,
Philippe =?ISO-8859-15?Q?Gramoull=E9?=  <philippe.gramoulle@mmania.com> wrote:
>Hi Zygo,
>
>This is a little bit OT from the thread on ReiserFS ML, but could you tell =
>me more
>about your laptop setup with rijndael loopback encryption and how you insta=
>lled it
>, what kernel version/patches ( link to a guide or FAQ or tutorial about ho=
>w to set this up)
>
>I had once a NOC latop full of critical infos stolen in the tube in Paris,F=
>rance
>and it had been a mess to change all the passwords, etc...
>
>Having every FS encrypted would make my paranoid ego feel much better ;o)

Linux 2.4.20, loop-AES 1.7b (replaces the standard loop.o module).
See http://loop-aes.sourceforge.net for loop-AES.  The package tarball
contains a crypto-ramdisk-boot script and some information on how to
set things up.

Note that you should probably encrypt swap as well, and watch out for
features like suspend-to-disk (aka "hibernate") that save the contents
of RAM without encryption.

-- 
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 11:12               ` Adam Goryachev
  2003-02-12 13:42                 ` Anders Widman
@ 2003-02-12 16:39                 ` Sam Vilain
  1 sibling, 0 replies; 39+ messages in thread
From: Sam Vilain @ 2003-02-12 16:39 UTC (permalink / raw)
  To: Adam Goryachev, reiserfs-list

On Thu, 13 Feb 2003 00:12, Adam Goryachev wrote:
> I can conceive of a few things that *might* be the right thing in
> various circumstances:
>
> A) Immediately re-mount the drive read-only, and wait for the sysadmin
> to either re-mount rw or to do some other data recovery/repair
>
> B) Immediately dis-mount the drive and wait
>
> C) OK, I tried to write to sector 1324 so lets just try each consecutive
> available sector until it doesn't return an error (possibly marking the
> sectors bad/used as we go)
>
> D) Just return an error to the application

Or a mixture...

C) with a max limit of, say 5 attempts, then D).  And then, later if it 
gets `really bad', where most I/O operations are failing, then A).

But I'd consider it acceptable behaviour for bounds check exceptions (ie, 
unreported filesystem corruption) or situations where you have lost a 
large amount of really critical structural information to invoke B).  Much 
better than an Oops.

Whoever made that statement about the hard disk head crashing... now that's 
certainly a laughable suggestion; a hard disk continuing after a head 
crash.  If anything, my experience with disks has been that if they start 
failing, you have to sort things out sooner rather than later.
-- 
Sam Vilain, sam@vilain.net

  You can judge your age by the amount of pain you feel when you come
in contact with a new idea.
JOHN NUVEEN

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 23:17       ` Anders Widman
  2003-02-12  0:12         ` Hans Reiser
@ 2003-02-12  5:12         ` Ross Vandegrift
  2003-02-12  7:17         ` Oleg Drokin
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: Ross Vandegrift @ 2003-02-12  5:12 UTC (permalink / raw)
  To: Anders Widman; +Cc: reiserfs-list

On Wed, Feb 12, 2003 at 12:17:47AM +0100, Anders Widman wrote:
> Sure,  a  disk  with physical errors should be replaced, but until you
> find out about the error on the drive the FS HAS TO HANDLE these kinds
> of problems.

No, this is *ridiculous*.  A filesystem should tolerate a failed disk when
the VM system can handle bad memory, or the scheduler can handle an
overheated CPU.  In all three cases, hard failure of a machine is
certain, unless hardware is replaced before the failure occurs.

You have to start your software on some kind of foundation.  Working
hardware sounds like a great place to me.

-- 
Ross Vandegrift
ross@willow.seitz.com

A Pope has a Water Cannon.                               It is a Water Cannon.
He fires Holy-Water from it.                        It is a Holy-Water Cannon.
He Blesses it.                                 It is a Holy Holy-Water Cannon.
He Blesses the Hell out of it.          It is a Wholly Holy Holy-Water Cannon.
He has it pierced.                It is a Holey Wholly Holy Holy-Water Cannon.
He makes it official.       It is a Canon Holey Wholly Holy Holy-Water Cannon.
Batman and Robin arrive.                                       He shoots them.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 23:17       ` Anders Widman
  2003-02-12  0:12         ` Hans Reiser
  2003-02-12  5:12         ` Ross Vandegrift
@ 2003-02-12  7:17         ` Oleg Drokin
  2003-02-12 10:17         ` Alexander Lyamin
  2003-02-12 16:25         ` Vitaly Fertman
  4 siblings, 0 replies; 39+ messages in thread
From: Oleg Drokin @ 2003-02-12  7:17 UTC (permalink / raw)
  To: Anders Widman; +Cc: reiserfs-list

Hello!

On Wed, Feb 12, 2003 at 12:17:47AM +0100, Anders Widman wrote:
> >> I've used ReiserFS in the past, but have also used ext3 on my
> >> user's important
> >> data (/home) after a good chunk of one drive was converted to
> >> sparse/null files due to a screwup stemming from no 'badblocks' support
> >> in reiserfs.  Since then, i've used ext3 as well as Reiser but recently
> > I can't comment on your experience, but personally if I have a drive with
> > any number of badblocks (which are showing up to the fs layer, not invisibly
> > re-mapped by the drive) then I take the drive back and get a replacement, or
> > bin the drive.
> However,  the FS SHOULD support handling of bad blocks/clusters at the

Well, the FS itself support this. Kind of ;)
Just mark bad blocks are "used".
Of course this does not work when bad block is in journal (solved with
relocate/custom journal) or in bitmap block.
Said that, I know that ext3 does not do very well if there is a bad block
in the journal area.
Another problem is write errors (especially into journal areas). I do not
know about ext3, but reiserfs just fails in such a case, though I know
that SuSE people are working on resolving this problem.

> FS  layer,  even  while running in a production system. Bad blocks can
> pop  up  at any give time for no particular reason, and it is at these
> times  you  (we) need a strong and reliable filesystem that can handle
> and logically remap broken blocks/sectors.

Hm. None of existing filesystems for Linux can do this to my knowledge.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re:  Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 23:17       ` Anders Widman
                           ` (2 preceding siblings ...)
  2003-02-12  7:17         ` Oleg Drokin
@ 2003-02-12 10:17         ` Alexander Lyamin
  2003-02-12 10:19           ` Alexander Lyamin
  2003-02-12 16:25         ` Vitaly Fertman
  4 siblings, 1 reply; 39+ messages in thread
From: Alexander Lyamin @ 2003-02-12 10:17 UTC (permalink / raw)
  To: Anders Widman; +Cc: reiserfs-list

Wed, Feb 12, 2003 at 12:17:47AM +0100, Anders Widman wrote:
> However,  the FS SHOULD support handling of bad blocks/clusters at the
> FS  layer,  even  while running in a production system. Bad blocks can
> pop  up  at any give time for no particular reason, and it is at these
> times  you  (we) need a strong and reliable filesystem that can handle
> and logically remap broken blocks/sectors.

Once i compared disk storage systems to a burger where each of components -
 ham (hdd), cheese(controller) and bread (filesystem) think that they are smartest one. and totally unwilling to cooperate to give a burger better taste.

problem with remapping that modern HDD's do this thing, and when you have
bad blocks leacked in upper layer (FS) chances are that
THINGS ARE REALLY BAD AND UGLY.

you'd better bin this HDD, unless you dont care about data.

-- 
"Cache remedies via multi-variable logic shorts will leave you crying."(cl)
Lex Lyamin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re:   Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 10:17         ` Alexander Lyamin
@ 2003-02-12 10:19           ` Alexander Lyamin
  0 siblings, 0 replies; 39+ messages in thread
From: Alexander Lyamin @ 2003-02-12 10:19 UTC (permalink / raw)
  To: Alexander Lyamin; +Cc: Anders Widman, reiserfs-list

Wed, Feb 12, 2003 at 01:17:06PM +0300, Alexander Lyamin wrote:
> Wed, Feb 12, 2003 at 12:17:47AM +0100, Anders Widman wrote:
> > However,  the FS SHOULD support handling of bad blocks/clusters at the
> > FS  layer,  even  while running in a production system. Bad blocks can
> > pop  up  at any give time for no particular reason, and it is at these
> > times  you  (we) need a strong and reliable filesystem that can handle
> > and logically remap broken blocks/sectors.
> 
> Once i compared disk storage systems to a burger where each of components -
>  ham (hdd), cheese(controller) and bread (filesystem) think that they are smartest one. and totally unwilling to cooperate to give a burger better taste.
> 
> problem with remapping that modern HDD's do this thing, and when you have
> bad blocks leacked in upper layer (FS) chances are that
> THINGS ARE REALLY BAD AND UGLY.
> 
> you'd better bin this HDD, unless you dont care about data.

But being a diffrent-thinking kind of bread i could not say we were not trying.
Vitaly could comment on this more :)

-- 
"Cache remedies via multi-variable logic shorts will leave you crying."(cl)
Lex Lyamin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 23:17       ` Anders Widman
                           ` (3 preceding siblings ...)
  2003-02-12 10:17         ` Alexander Lyamin
@ 2003-02-12 16:25         ` Vitaly Fertman
  2003-02-12 16:56           ` Anders Widman
  4 siblings, 1 reply; 39+ messages in thread
From: Vitaly Fertman @ 2003-02-12 16:25 UTC (permalink / raw)
  To: reiserfs-list

On Wednesday 12 February 2003 02:17, Anders Widman wrote:
> >> I've used ReiserFS in the past, but have also used ext3 on my
> >> user's important
> >> data (/home) after a good chunk of one drive was converted to
> >> sparse/null files due to a screwup stemming from no 'badblocks' support
> >> in reiserfs.  Since then, i've used ext3 as well as Reiser but recently
> >
> > I can't comment on your experience, but personally if I have a drive with
> > any number of badblocks (which are showing up to the fs layer, not
> > invisibly re-mapped by the drive) then I take the drive back and get a
> > replacement, or bin the drive.
>
> However,  the FS SHOULD support handling of bad blocks/clusters at the
> FS  layer,  even  while running in a production system. Bad blocks can
> pop  up  at any give time for no particular reason, and it is at these
> times  you  (we) need a strong and reliable filesystem that can handle
> and logically remap broken blocks/sectors.
>
> Sure,  a  disk  with physical errors should be replaced, but until you
> find out about the error on the drive the FS HAS TO HANDLE these kinds
> of problems.

That is difficult to say if bad blocks should be handled at fs layer or not. 
It would be useful to have this problem solved somehow, but harddrives with 
their remappings looks like the proper part of doing this. And probably fs 
layer should just skilfully use some interface for such remapping. Well, 
remapping is probably not correct word here. Thus, Xuan Baldauf 
<xuan--reiserfs@baldauf.org> sent us his program once claimed that it recovered 
blocks w/out remapping. The explanations were the following:

> The problem is that often multiple adjacent blocks are bad. You'll have to detect
> them manually. Once you know the bad blocks, just trying to overwrite them usually
> does not succeed because the disk wants to seek to that block exactly (which does
> not work for the same reason the block is bad). But if the whole track is
> rewritten, the bad blocks usually are gone.
>
> I suspect track wandering for this: due to small misalignments at each write, a track (or more
> precisely, and arc of the track which contains the block to be written) slowly wanders. If the
> misalignments do not zero out each other, they add up to a bias. If an arc of an has been
> written many times, it will have wandered under these conditions. If the wandering has
> progressed too far, the wandering arc slowly reaches the next neighbouring track.
>
> Now imagine an access to the wandered track: if the head seeks to the original position of the
> wandered track, it may not be able to read the wandered arc because it is too far away (lower
> signal quality). If the head seeks to the new position of the wandered arc, the signal may be
> interfered by the neighbouring track.
>
> Both effects may occur, which one does not really matter, both makes parts of the wandered arc
> inaccessible
>
> The problem is: the individual wandered arc is no longer accessible, because the disk
> controller cannot sync to the block it is flying over because of the bad
> signal-to-noise-ratio. And if the wandered arc is accessible, another write will make it
> further wander up to inaccessibility.
>
> But if the seek to the track of the arc which should be overwritten occurs before the wandered
> arc, the disk controller actually can sync to the track and then write the whole track,
> effectivily creating the track new and only having the bias of the not-wandered part of the
> track. Thus, the wandered arc has not wandered anymore compared to the other arcs of the
> track.

Well, it worked. We had some bad blocks on a drive, write to them failed, after using 
this program there were no bad blocks anymore. 

So it would be possible to do some actions to 1) get some blocks back in the described 
way, 1.1) write to really bad blocks should have remaped them already here if there is 
a space in remap area 2) save bad blocks to badblock list in fs if they are still bad - 
out of remap area. 
Would be not bad to try to recover in this way already remapped blocks - do not know how 
to get the list of them only.

Ok, but what if the IO error you got is not a bad block, but a bad cable? Do you want 
the fs to work in the described way? trying to fix all automatically? I am not sure. 
Now about the user space. Using badblocks and some programs like Xuan Baldauf sent us
and just trying to write to bad blocks make them being remapped - that is how you can 
try to get rid of some amount of badblocks. Should a drive with amount of bad blocks 
which exceeds the remap area be used? It is a realy rare case that the amount of bad 
blocks of such a drive does not get increased - the case where you may want to continue 
using the drive - so this is why a proper support for bad blocks was not implemented 
in reiserfs yet. And probably it is not the most urgent thing to do. 

-- 

Thanks,
Vitaly Fertman

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 16:25         ` Vitaly Fertman
@ 2003-02-12 16:56           ` Anders Widman
  2003-02-12 17:13             ` Oleg Drokin
  0 siblings, 1 reply; 39+ messages in thread
From: Anders Widman @ 2003-02-12 16:56 UTC (permalink / raw)
  To: reiserfs-list

> On Wednesday 12 February 2003 02:17, Anders Widman wrote:
>> >> I've used ReiserFS in the past, but have also used ext3 on my
>> >> user's important
>> >> data (/home) after a good chunk of one drive was converted to
>> >> sparse/null files due to a screwup stemming from no 'badblocks' support
>> >> in reiserfs.  Since then, i've used ext3 as well as Reiser but recently
>> >
>> > I can't comment on your experience, but personally if I have a drive with
>> > any number of badblocks (which are showing up to the fs layer, not
>> > invisibly re-mapped by the drive) then I take the drive back and get a
>> > replacement, or bin the drive.
>>
>> However,  the FS SHOULD support handling of bad blocks/clusters at the
>> FS  layer,  even  while running in a production system. Bad blocks can
>> pop  up  at any give time for no particular reason, and it is at these
>> times  you  (we) need a strong and reliable filesystem that can handle
>> and logically remap broken blocks/sectors.
>>
>> Sure,  a  disk  with physical errors should be replaced, but until you
>> find out about the error on the drive the FS HAS TO HANDLE these kinds
>> of problems.

> That is difficult to say if bad blocks should be handled at fs layer or not.
> It would be useful to have this problem solved somehow, but harddrives with
> their remappings looks like the proper part of doing this. And probably fs
> layer should just skilfully use some interface for such remapping. Well,
> remapping is probably not correct word here. Thus, Xuan Baldauf 
> <xuan--reiserfs@baldauf.org> sent us his program once claimed that it recovered
> blocks w/out remapping. The explanations were the following:

>> The problem is that often multiple adjacent blocks are bad. You'll have to detect
>> them manually. Once you know the bad blocks, just trying to overwrite them usually
>> does not succeed because the disk wants to seek to that block exactly (which does
>> not work for the same reason the block is bad). But if the whole track is
>> rewritten, the bad blocks usually are gone.
>>
>> I suspect track wandering for this: due to small misalignments at each write, a track (or more
>> precisely, and arc of the track which contains the block to be written) slowly wanders. If the
>> misalignments do not zero out each other, they add up to a bias. If an arc of an has been
>> written many times, it will have wandered under these
>> conditions. If the wandering has
>> progressed too far, the wandering arc slowly reaches the next neighbouring track.
>>
>> Now imagine an access to the wandered track: if the head seeks to the original position of the
>> wandered track, it may not be able to read the wandered arc
>> because it is too far away (lower
>> signal quality). If the head seeks to the new position of the wandered arc, the signal may be
>> interfered by the neighbouring track.
>>
>> Both effects may occur, which one does not really matter, both makes parts of the wandered arc
>> inaccessible
>>
>> The problem is: the individual wandered arc is no longer accessible, because the disk
>> controller cannot sync to the block it is flying over because of the bad
>> signal-to-noise-ratio. And if the wandered arc is accessible, another write will make it
>> further wander up to inaccessibility.
>>
>> But if the seek to the track of the arc which should be
>> overwritten occurs before the wandered
>> arc, the disk controller actually can sync to the track and then write the whole track,
>> effectivily creating the track new and only having the bias of the not-wandered part of the
>> track. Thus, the wandered arc has not wandered anymore compared to the other arcs of the
>> track.

> Well, it worked. We had some bad blocks on a drive, write to them failed, after using
> this program there were no bad blocks anymore. 

> So it would be possible to do some actions to 1) get some blocks back in the described
> way, 1.1) write to really bad blocks should have remaped them already here if there is
> a space in remap area 2) save bad blocks to badblock list in fs if they are still bad -
> out of remap area. 
> Would be not bad to try to recover in this way already remapped blocks - do not know how
> to get the list of them only.

> Ok, but what if the IO error you got is not a bad block, but a bad cable? Do you want
> the fs to work in the described way? Trying to fix all automatically? I am not sure.

  How about trial and (then) error? :)

> Now about the user space. Using badblocks and some programs like Xuan Baldauf sent us
> and just trying to write to bad blocks make them being remapped - that is how you can
> try to get rid of some amount of badblocks. Should a drive with amount of bad blocks
> which exceeds the remap area be used? It is a realy rare case that the amount of bad
> blocks of such a drive does not get increased - the case where you may want to continue
> using the drive - so this is why a proper support for bad blocks was not implemented
> in reiserfs yet. And probably it is not the most urgent thing to do.

  No,  perhaps  bad  blocks  handling is not the major i mprovement we
  need,  however  I  feel it is still an important part of any FS - to
  handle  errors gracefully and not throw the user to ground.


  - Anders




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12 16:56           ` Anders Widman
@ 2003-02-12 17:13             ` Oleg Drokin
  0 siblings, 0 replies; 39+ messages in thread
From: Oleg Drokin @ 2003-02-12 17:13 UTC (permalink / raw)
  To: Anders Widman; +Cc: reiserfs-list

Hello!

On Wed, Feb 12, 2003 at 05:56:58PM +0100, Anders Widman wrote:
 
> > So it would be possible to do some actions to 1) get some blocks back in the described
> > way, 1.1) write to really bad blocks should have remaped them already here if there is
> > a space in remap area 2) save bad blocks to badblock list in fs if they are still bad -
> > out of remap area. 
> > Would be not bad to try to recover in this way already remapped blocks - do not know how
> > to get the list of them only.
> > Ok, but what if the IO error you got is not a bad block, but a bad cable? Do you want
> > the fs to work in the described way? Trying to fix all automatically? I am not sure.
>   How about trial and (then) error? :)

That might be suitable for fsck, but not for kernel I am sure.
Kernel should just probably return error or try to use different block (if it was
doing write) and if certain number of attempts failed, return error too.
Also remount R/O if write error is in system area (journal, superblock, bitmaps)
or special mount option was given that demands remounting R/O on io errors.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-11 23:11     ` Adam Goryachev
  2003-02-11 23:17       ` Anders Widman
@ 2003-02-12  1:02       ` Mike Hodson
  2003-02-12  7:25         ` Oleg Drokin
                           ` (2 more replies)
  1 sibling, 3 replies; 39+ messages in thread
From: Mike Hodson @ 2003-02-12  1:02 UTC (permalink / raw)
  To: Adam Goryachev; +Cc: reiserfs-list

> I can't comment on your experience, but personally if I have a drive with
> any number of badblocks (which are showing up to the fs layer, not invisibly
> re-mapped by the drive) then I take the drive back and get a replacement, or
> bin the drive.
I'd have liked to do that, but 2 problems. 1, the drive was an OEM drive,
and the reseller I bought it from went out of business- Western Digital
wouldn't accept an RMA directly from an end-user on that drive. and 2, I
didn't even notice errors until one day it wouldnt work properly. At
that time I thought it was a reiserfs inconstancy, and I fscked. at that
point I noticed a few {DriveReady -SeekComplete} errors and then most of
the data on the drive nulled itself out.

> I suppose that if you continue to have a growing number of FS errors, then
> you either have faulty hardware, or are using a buggy version of
> software.... If you already admit to badblocks, then I would blame
> hardware..
The case I outlined at the beginning of my message involved 2 specific
drives. The drive that completely failed was a WD 6.4 gig. The 20gig was
a Maxtor that I've had for 2.5 years, that to my knowledge does not have
media errors, but there have been some FS corruptions. I don't think its
media since I never see any 'DriveReady-seekcomplete' errors that you
usually get with bad sectors.

> Hmmm, so apart from finding a number of errors and doing it's best to fix
> them (putting them into lost+found) and recovering all of your other files
> even with hardware issues present, you recovered all of your data? The
> problem here is?
The fact that the filesystem got so many errors in the first place. And
as ive said, hardware issues are AFAIK not the cause.

If a harddrive is showing badblocks and then the disk vendors tool shows no
errors, I think a simple dd over the whole disk or similar would really show
the true story....
If i haven't made myself clear, the badblocks problem was with a
different drive.  The thought occured to me that there may have been bad
blocks due to the sporadic corruption, but I don't have any error
messages to back up that thought.

> ext2 requires periodic fsck's which are also rather dis-concerting when you
> are never quite sure whether you need to allow a few hours downtime for a
> kernel upgrade this time, or perhaps it is only a few minutes.

Well one way of being completely sure is to reset the mount count in the
filesystem before rebooting, or to set the fstab to never automatically
fsck. then on some set  schedule, fsck along with a kernel upgrade, and
schedule the downtime

> Finally, I find it interesting to see people who swear they will never use
> reiserfs again to be on the reiserfs mailing list :).
I may use it at some point, when its as well proven as the second
extended filesystem is currently. Its interesting to see how many people
have errors, and when that number gets lower and more people start
posting good things I may reevaluate it at some point :)

Mike

-- 
Mike Hodson  <mike@mystica.cx>  ICQ: 18145059


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12  1:02       ` Mike Hodson
@ 2003-02-12  7:25         ` Oleg Drokin
  2003-02-12  9:45         ` Hans Reiser
  2003-02-12 16:09         ` Sam Vilain
  2 siblings, 0 replies; 39+ messages in thread
From: Oleg Drokin @ 2003-02-12  7:25 UTC (permalink / raw)
  To: Mike Hodson; +Cc: Adam Goryachev, reiserfs-list

Hello!

On Tue, Feb 11, 2003 at 07:02:21PM -0600, Mike Hodson wrote:

> > Finally, I find it interesting to see people who swear they will never use
> > reiserfs again to be on the reiserfs mailing list :).
> I may use it at some point, when its as well proven as the second
> extended filesystem is currently. Its interesting to see how many people
> have errors, and when that number gets lower and more people start
> posting good things I may reevaluate it at some point :)

I guess you are not subscribed to ext3-users mailinglist?
The thing is that all the fs lists are filled with error reports (if fs is used
by someone of course).
When user have zero problems, he is just busy doing his own things usually.
Users seem to only write letters to FS mailing lists when they have problems.
(of course there are some exclusions, but general tendency is like this).

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12  1:02       ` Mike Hodson
  2003-02-12  7:25         ` Oleg Drokin
@ 2003-02-12  9:45         ` Hans Reiser
  2003-02-12 16:09         ` Sam Vilain
  2 siblings, 0 replies; 39+ messages in thread
From: Hans Reiser @ 2003-02-12  9:45 UTC (permalink / raw)
  To: Mike Hodson; +Cc: Adam Goryachev, reiserfs-list

Mike Hodson wrote:

> At
>that time I thought it was a reiserfs inconstancy, and I fscked. at that
>point I noticed a few {DriveReady -SeekComplete} errors and then most of
>the data on the drive nulled itself out.
>
It is only to be expected that using fsck on a bad hard drive is going 
to lead to complete disaster.

Maybe we should ask the user if he'd like us to verify the media first.  
Though if they follow the instructions to use dd_rescue first, then 
they'll know if it has bad sectors.....  Probably a lot of users aren't 
going to use dd_rescue first even if told to, and we should expect that.....

>  
>
>>Hmmm, so apart from finding a number of errors and doing it's best to fix
>>them (putting them into lost+found) and recovering all of your other files
>>even with hardware issues present, you recovered all of your data? The
>>problem here is?
>>    
>>
>The fact that the filesystem got so many errors in the first place. And
>as ive said, hardware issues are AFAIK not the cause.
>
>  
>
>I may use it at some point, when its as well proven as the second
>extended filesystem is currently. Its interesting to see how many people
>have errors, and when that number gets lower and more people start
>posting good things I may reevaluate it at some point :)
>
>Mike
>
>  
>
I think you'll find that it is a lot more stable now. 

-- 
Hans



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Corrupted/unreadable journal: reiser vs. ext3
  2003-02-12  1:02       ` Mike Hodson
  2003-02-12  7:25         ` Oleg Drokin
  2003-02-12  9:45         ` Hans Reiser
@ 2003-02-12 16:09         ` Sam Vilain
  2 siblings, 0 replies; 39+ messages in thread
From: Sam Vilain @ 2003-02-12 16:09 UTC (permalink / raw)
  To: Mike Hodson; +Cc: reiserfs-list

On Wed, 12 Feb 2003 14:02, Mike Hodson wrote:
> Well one way of being completely sure is to reset the mount count in the
> filesystem before rebooting, or to set the fstab to never automatically
> fsck. then on some set  schedule, fsck along with a kernel upgrade, and
> schedule the downtime

Nah.  Set up a mirror, wait for a fairly quiet time, sync, split the 
mirror, fsck the split mirror, and only do something if that fsck fails 
:-).

Solaris does all this very well.  It's equivalent of `md-utils' - Online 
Disk Suite - does journalling for you of all writes (including data) if 
you turn it on; at the block level, ignorant of the FS.  IMHO that's a 
much better place to do the journalling.  It's simple, solid.
-- 
Sam Vilain, sam@vilain.net

Do you have blacks, too?
 - George W. Bush, talking to Fernando Henrique Cardoso (the president
   of Brazil).  Reported in Der Speigel on May 19 2002.  Never
   reported in any US paper or news source.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* trolling
  2003-02-11 18:59 Corrupted/unreadable journal: reiser vs. ext3 Dirk Schenkewitz
  2003-02-11 20:27 ` Hans Reiser
@ 2003-02-12 10:11 ` Alexander Lyamin
  2003-02-12 12:32   ` trolling Dirk Schenkewitz
                     ` (2 more replies)
  1 sibling, 3 replies; 39+ messages in thread
From: Alexander Lyamin @ 2003-02-12 10:11 UTC (permalink / raw)
  To: Dirk Schenkewitz; +Cc: Reiserfs List

Once, back in 1990, I had one customer calling and blaming me
that we sold them BAD crt, where only LEFT part of screen being
not displaying everything. Being upset (CRT's were expensive
in thouse days) I went to his office only to realise that this
person used Norton Commander shell and pushed Ctrl+F1 effectively
turning off left panel.... 

He was IGNORANT, but his rage and HATRED were so PURE.

You see, for IGNORANT people its much easier to blame someone else
then their ignorance.

P.S. "In addition, check your facts before making a statement
that insults someone and claims something that is completely untrue."

Powersupply units physically damaging hard-drive disks (plates!).
Unwillingness of running FSCK when its known that filesystem was physically corrupted.
EXT2 never runned fsck in YEARS (what about mountcount flag? :).

-- 
"Cache remedies via multi-variable logic shorts will leave you crying."(cl)
Lex Lyamin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: trolling
  2003-02-12 10:11 ` trolling Alexander Lyamin
@ 2003-02-12 12:32   ` Dirk Schenkewitz
  2003-02-12 14:48   ` trolling Chris Mason
  2003-02-13 19:54   ` trolling Zygo Blaxell
  2 siblings, 0 replies; 39+ messages in thread
From: Dirk Schenkewitz @ 2003-02-12 12:32 UTC (permalink / raw)
  To: flx; +Cc: Reiserfs List

Hello Mr. Lyamin,

Alexander Lyamin schrieb:
> 
> Once, back in 1990, I had one customer calling and blaming me
> that we sold them BAD crt, where only LEFT part of screen being
> not displaying everything. Being upset (CRT's were expensive
> in thouse days) I went to his office only to realise that this
> person used Norton Commander shell and pushed Ctrl+F1 effectively
> turning off left panel....
> 
> He was IGNORANT, but his rage and HATRED were so PURE.

:-) Well, I can understand both viewpoints, I have been on both
sides - right now I must be on a troll's side.

> You see, for IGNORANT people its much easier to blame someone else
> then their ignorance.

I'm aware of my ignorance, really. At least a bit. But, see, in
that case it was not much loss to just forget the filesystem, and
it was EASIER for me to give ext3 a try than to go to someone else 
and search the internet for a solution - I had no internet access
by myself back then.

> P.S. "In addition, check your facts before making a statement that
> insults someone and claims something that is completely untrue."
>
> Powersupply units physically damaging hard-drive disks (plates!).

Well, I'm not sure what EXACTLY happened. Somehow I got suspicious
about the power supply, put in a new one, and from there on, every-
thing was fine. Then I checked the disk, a few tracks seemed to be
damaged, they turned out to be unrelieable. I modified the partitions
so that these tracks were left out. (The disk ist still in use today.)
Again, I'm not sure, but I believed the bad power made the disk slow
down or have the head land or something else while working in that
area. I believed that the journal area was the one which got most of
the traffic, so I thought it was logical that it got most of the
damage. The same was true for the ext3-journal area - both journals
got corrupted.

> Unwillingness of running FSCK when its known that filesystem was 
> physically corrupted.

I was not really unwilling - I was too stupid to find one on my
system and on my CDs. :-/

> EXT2 never runned fsck in YEARS (what about mountcount flag? :).

Er... did I say that? Sorry, that's wrong. In fact, fsck.ext3 ran
every time the system was bootet, but normally it only replays the
journal (if needed), unless the actual number of mounts exceed the
max-mounts, as you say. Admittedly, it is a pain to wait several
minutes to let that finish. I tuned my system (by adjusting the
max-mountcount values and the mountcount values) so that no more
than one partition is "really" fsck'd per boot. (*Sigh*)

That's one reason why I'm here!

Thanks for answering a troll :-) - happy coding
	dirk
-- 
Dirk Schenkewitz 

InterFace AG                 fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16            fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching         
http://www.interface-ag.de   mailto:dirk.schenkewitz@interface-ag.de

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: trolling
  2003-02-12 10:11 ` trolling Alexander Lyamin
  2003-02-12 12:32   ` trolling Dirk Schenkewitz
@ 2003-02-12 14:48   ` Chris Mason
  2003-02-13 19:54   ` trolling Zygo Blaxell
  2 siblings, 0 replies; 39+ messages in thread
From: Chris Mason @ 2003-02-12 14:48 UTC (permalink / raw)
  To: flx; +Cc: Reiserfs List

On Wed, 2003-02-12 at 05:11, Alexander Lyamin wrote:

> You see, for IGNORANT people its much easier to blame someone else
> then their ignorance.

Sometimes people make mistakes, or don't have the benefit of the long
experience it really takes to understand things, and sometimes they just
don't get it.

All of which is fine.  We've chosen to read their questions, and those
of us who get paid to do it should have the good sense to try and be
nice about it.  Senseless flames only discourage valid questions from
getting asked.

-chris

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: trolling
  2003-02-12 10:11 ` trolling Alexander Lyamin
  2003-02-12 12:32   ` trolling Dirk Schenkewitz
  2003-02-12 14:48   ` trolling Chris Mason
@ 2003-02-13 19:54   ` Zygo Blaxell
  2 siblings, 0 replies; 39+ messages in thread
From: Zygo Blaxell @ 2003-02-13 19:54 UTC (permalink / raw)
  To: reiserfs-list

In article <20030212131120.A10438@t-raenon.nmd.msu.ru>,
Alexander Lyamin  <flx@msu.ru> wrote:
>P.S. "In addition, check your facts before making a statement
>that insults someone and claims something that is completely untrue."
>
>Powersupply units physically damaging hard-drive disks (plates!).

I wouldn't characterize that as "untrue," more like "technically
incomplete."  There are lots of ways to permanently disable hard disks
without physically damaging them.

Disk head servo controllers operate using some of the data on the disk
platter as a position encoder, which is used in a feedback scheme
to control the motor.  The motor's electrical power is literally a
function of the previous position, the theoretical position, and the
current position as reported by reading data through the disk head.
Assuming that the normal operating rules for the DC servo controller and
write head power are violated (e.g. because the CPU on the disk controller
board is going insane due to invalid power supply feed corrupting its
memory, or due to an external physical shock during a write), it is
possible to corrupt the position data on the platter and permanently lose
the ability to seek to some areas of the platter.  The position data was
initially written using frighteningly expensive precision hardware at
the disk drive factory and cannot be regenerated without said equipment.

Speaking of DC server controller operating rule violations...those motors
are fairly powerful, and they normally operate at only a few percent of
their full power most of the time.  Very short pulses of significant power
are used during acceleration and deceleration.  Full power sustained for
any significant interval of time on the head motor may damage the heads
as they collide with the side of the drive case, or bend the arm the
heads are mounted on.  On the other hand, when the motors are moving,
the drive must feed their kinetic energy back into the power supply to
make them stop.  A bad power supply can result in position overshoots,
although most disks will correct for these automatically (they'll probably
be a lot slower though).

Consider what happens if the circuit which accelerates the disk from a
standing stop (a fairly high-power circuit, actually the highest power
circuit in the disk drive) does not turn off automatically when the disk
reached its cruising velocity, and instead simply keeps accelerating the
disk as fast as air resistance and maximum power output would allow.
Either the platters will fly apart, or the electronics will cook in
their own heat.

It's also possible to rewrite the disk controller firmware on the disk.
Usually only a minimal loader program is provided in ROM, and the rest
of the drive's firmware is stored on the disk itself--two copies on
two different tracks, near the track 0 index mark where they can be
located with a very simple motor controller program.  This trades cheap
disk space and fast RAM for expensive and slow flash ROM.

-- 
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2003-02-13 19:54 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-11 18:59 Corrupted/unreadable journal: reiser vs. ext3 Dirk Schenkewitz
2003-02-11 20:27 ` Hans Reiser
2003-02-11 21:30   ` Mike Hodson
2003-02-11 21:47     ` Hans Reiser
2003-02-11 21:58     ` Hans Reiser
2003-02-12  6:35       ` Oleg Drokin
2003-02-11 23:11     ` Adam Goryachev
2003-02-11 23:17       ` Anders Widman
2003-02-12  0:12         ` Hans Reiser
2003-02-12 10:23           ` Anders Widman
2003-02-12 10:47             ` Hans Reiser
2003-02-12 11:12               ` Adam Goryachev
2003-02-12 13:42                 ` Anders Widman
2003-02-12 14:15                   ` Russell Coker
2003-02-12 15:26                     ` Anders Widman
2003-02-12 16:22                       ` bscott
2003-02-12 16:28                       ` Russell Coker
2003-02-12 16:40                         ` Anders Widman
2003-02-13  3:42                       ` Zygo Blaxell
2003-02-13 10:13                         ` Anders Widman
2003-02-13 14:44                           ` Rudy Zijlstra
2003-02-13  3:31                     ` Zygo Blaxell
     [not found]                       ` <20030213113003.7ee7af6e.philippe.gramoulle@mmania.com>
2003-02-13 18:17                         ` rijndael loopback encryption was [Re: Corrupted/unreadable journal: reiser vs. ext3] Zygo Blaxell
2003-02-12 16:39                 ` Corrupted/unreadable journal: reiser vs. ext3 Sam Vilain
2003-02-12  5:12         ` Ross Vandegrift
2003-02-12  7:17         ` Oleg Drokin
2003-02-12 10:17         ` Alexander Lyamin
2003-02-12 10:19           ` Alexander Lyamin
2003-02-12 16:25         ` Vitaly Fertman
2003-02-12 16:56           ` Anders Widman
2003-02-12 17:13             ` Oleg Drokin
2003-02-12  1:02       ` Mike Hodson
2003-02-12  7:25         ` Oleg Drokin
2003-02-12  9:45         ` Hans Reiser
2003-02-12 16:09         ` Sam Vilain
2003-02-12 10:11 ` trolling Alexander Lyamin
2003-02-12 12:32   ` trolling Dirk Schenkewitz
2003-02-12 14:48   ` trolling Chris Mason
2003-02-13 19:54   ` trolling Zygo Blaxell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.