* Re: Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-12 20:57 Dirk Schenkewitz
0 siblings, 0 replies; 94+ messages in thread
From: Dirk Schenkewitz @ 2003-02-12 20:57 UTC (permalink / raw)
To: Reiserfs List
Oleg Drokin wrote (in response to Anders Widman):
> > > So it would be possible to do some actions to
> > > 1) get some blocks back in the described way,
> > > 1.1) write to really bad blocks should have remaped them
> > > already here if there is a space in remap area
> > > 2) save bad blocks to badblock list in fs if they are still bad -
> > > out of remap area.
> > > Would be not bad to try to recover in this way already remapped
> > > blocks - do not know how to get the list of them only.
> > > Ok, but what if the IO error you got is not a bad block, but
> > > a bad cable? Do you want the fs to work in the described way?
> > > Trying to fix all automatically? I am not sure.
> >
> > How about trial and (then) error? :)
>
> That might be suitable for fsck, but not for kernel I am sure.
> Kernel should just probably return error or try to use different
> block (if it was doing write) and if certain number of attempts
> failed, return error too.
> Also remount R/O if write error is in system area (journal,
> superblock, bitmaps) or special mount option was given that demands
> remounting R/O on io errors.
I still feel that the system area should be DESIGNED to be extra-
robust against everything, because it is vital for the whole fs.
Btw, might such thoughts be the reason that ext2 has superblock
backups? I agree that a bad block in the system area is a good
reason for all kinds of alarm, but a really good fs should overcome
more than one without (unrecoverable) damage to the fs in whole.
happy coding
dirk
--
Dirk Schenkewitz
InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16 fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching
http://www.interface-ag.de mailto:dirk.schenkewitz@interface-ag.de
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-20 9:55 Dirk Schenkewitz
2003-02-20 10:20 ` Anders Widman
0 siblings, 1 reply; 94+ messages in thread
From: Dirk Schenkewitz @ 2003-02-20 9:55 UTC (permalink / raw)
To: Juan Quintela; +Cc: Reiserfs List
Hi Juan,
Juan Quintela wrote (in response to me):
>
> dirk> Not at my system - I have (for example) a 16 Gig partition with
> dirk> ext3 on it which is 100% full and has now 6100 kilobytes free space.
> dirk> No Problem (with getting ENOSPC, I mean :-)).
> dirk> Oh - wait a sec: do you usually reserve 5%-10% for the superuser?
> dirk> That might explain why you get ENOSPC at 95%-90%, because that
> dirk> reserved space is not taken into account... I normally tune the
> dirk> fs to reserve 0% for the superuser. I never needed the reserved
> dirk> space anyway.
>
> that 5-10% is there not for the superuser (with today disks, that is
> a lot of space). I was also going to reduce the percentage, but then
> somebody explained me that this porcentange needs to be free at all
> times to maintain the fragmentation low. And that makes a lot of
> sense, the bigger the disk, the more free space you need to have low
> fragmentation.
Thanks - I didn't know that... although it's logical. Hmm... high frag-
mentation should result in a slower access to (or better: higher response
time from) the fs, right? I haven't noticed some (I mean, it can be there,
but it cannot be very much). Then again, I did not do timing measurements,
and filesystems which get that full are not "working" filesystems, but
"storage" filesystems, so there is not much movement on them. Hmm...
perhaps I should do a defragmentation on one of these filesystems and
compare before/after.
Have fun
dirk
--
Dirk Schenkewitz
InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16 fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching
http://www.interface-ag.de mailto:dirk.schenkewitz@interface-ag.de
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-17 10:04 Dirk Schenkewitz
2003-02-20 1:27 ` Juan Quintela
0 siblings, 1 reply; 94+ messages in thread
From: Dirk Schenkewitz @ 2003-02-17 10:04 UTC (permalink / raw)
To: Reiserfs List
Sorry for being that late, I didn't see it on first view.
Sam Vilain wrote:
> On Wed, 12 Feb 2003 08:43, berthiaume_wayne@emc.com wrote:
> > Dirk, I'd be interested in hearing from you your performance
> > experience with ext3 when it reaches 96% full.
>
> No problem, because you get ENOSPC at 95% or 90%.
Not at my system - I have (for example) a 16 Gig partition with
ext3 on it which is 100% full and has now 6100 kilobytes free space.
No Problem (with getting ENOSPC, I mean :-)).
Oh - wait a sec: do you usually reserve 5%-10% for the superuser?
That might explain why you get ENOSPC at 95%-90%, because that
reserved space is not taken into account... I normally tune the
fs to reserve 0% for the superuser. I never needed the reserved
space anyway.
> Hmm, another feature SysAdmins actually find useful, missing in
> reiserfs.
> Along with quotas (this feature is a lazy case of a quota, really).
>
> On Wed, 12 Feb 2003 18:12, Ross Vandegrift wrote:
> > You have to start your software on some kind of foundation.
> > Working hardware sounds like a great place to me.
>
> Hmm, you've never heard of redundancy or fault tolerance then.
>
> What part fails the most in running systems ? Disk platters.
>
> CPUs might overheat and RAM might suddenly one day get a sticky bit,
Even then I'd like the OS to find out about that and inform me...
> but as you point out there ain't much you can do about it.
> Except buy a Tandem, or use ECC memory.
>
> But with disks, you can. Mirroring aside, modern hard disks use S.M.A.R.T.
> technology which claims to be able to spot failures before they happen.
> Many BIOSes will let you turn this feature on and off. Of course I've
> never actually seen it in action :-).
For me the most important thing is: if something is vital to a fs,
it must be protected even against hardware failure as good as possible.
For example, by making copies, or (perhaps) at least having reserved
space for a copy, and if some access fails, mark the blocks as bad,
give a warning (important) and start using the reserved space.
Have fun
dirk
--
Dirk Schenkewitz
InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16 fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching
http://www.interface-ag.de mailto:dirk.schenkewitz@interface-ag.de
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-17 10:04 Dirk Schenkewitz
@ 2003-02-20 1:27 ` Juan Quintela
2003-02-20 9:03 ` Anders Widman
0 siblings, 1 reply; 94+ messages in thread
From: Juan Quintela @ 2003-02-20 1:27 UTC (permalink / raw)
To: Dirk Schenkewitz; +Cc: Reiserfs List
>>>>> "dirk" == Dirk Schenkewitz <Dirk.Schenkewitz@interface-ag.com> writes:
Hi
dirk> Not at my system - I have (for example) a 16 Gig partition with
dirk> ext3 on it which is 100% full and has now 6100 kilobytes free space.
dirk> No Problem (with getting ENOSPC, I mean :-)).
dirk> Oh - wait a sec: do you usually reserve 5%-10% for the superuser?
dirk> That might explain why you get ENOSPC at 95%-90%, because that
dirk> reserved space is not taken into account... I normally tune the
dirk> fs to reserve 0% for the superuser. I never needed the reserved
dirk> space anyway.
that 5-10% is there not for the superuser (with today disks, that is
a lot of space). I was also going to reduce the percentage, but then
somebody explained me that this porcentange needs to be free at all
times to maintain the fragmentation low. And that makes a lot of
sense, the bigger the disk, the more free space you need to have low
fragmentation.
Later, Juan.
--
In theory, practice and theory are the same, but in practice they
are different -- Larry McVoy
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-20 1:27 ` Juan Quintela
@ 2003-02-20 9:03 ` Anders Widman
0 siblings, 0 replies; 94+ messages in thread
From: Anders Widman @ 2003-02-20 9:03 UTC (permalink / raw)
To: reiserfs-list
> that 5-10% is there not for the superuser (with today disks, that is
> a lot of space). I was also going to reduce the percentage, but then
> somebody explained me that this porcentange needs to be free at all
> times to maintain the fragmentation low. And that makes a lot of
> sense, the bigger the disk, the more free space you need to have low
> fragmentation.
That is "free" space that needs to be free to lessen the
fragmentation. All the users won't see this free space - and you end
up with high fragmentation anyway?
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-14 14:30 Dirk Schenkewitz
0 siblings, 0 replies; 94+ messages in thread
From: Dirk Schenkewitz @ 2003-02-14 14:30 UTC (permalink / raw)
To: Reiserfs List
Sam Vilain <sam@vilain.net> wrote:
> On Fri, 14 Feb 2003 09:08, Zygo Blaxell wrote:
> > ...
> > The M in MTBF is Mean, not Maximum or Minimum. For every disk that
> > lasts 10 years or more, there's an equal and opposite disk that dies
> > within a few minutes.
>
> It actually stands for Meaningless, I'm sure :-) Vendors should be
> required to state this figure in terms of the number of unit failures
> they experienced running X units for T amount of time.
It is not? I thoght it was just that: have 100 drives running for a year,
one failes, gives you a MTBF of 100 years per drive. No?
have fun
dirk
--
Dirk Schenkewitz
InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16 fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching
http://www.interface-ag.de mailto:dirk.schenkewitz@interface-ag.de
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-14 14:20 Dirk Schenkewitz
2003-02-14 20:58 ` Valdis.Kletnieks
0 siblings, 1 reply; 94+ messages in thread
From: Dirk Schenkewitz @ 2003-02-14 14:20 UTC (permalink / raw)
To: Reiserfs List
eazgwmin wrote:
> In article <3E4AA902.86F15815@interface-ag.com>,
> Dirk Schenkewitz <Dirk.Schenkewitz@interface-ag.com> wrote:
> >For me, it was alarming enough seeing ext3 drop the journal. In fact,
> >THAT was the point where I went to investigate in other directions
> >instead of blaming the filesystem.
>
> The kernel block device messages complaining about I/O errors from the
> device aren't sufficient to tell you that there is a serious problem?
> Or was this device silently corrupting data without reporting errors?
Exactly - the kernel didn't report any problems - At least, I did
not notice any.
> >The only problem is, that putting a bad drive to eternal rest
> >might not solve the problem, as long as the REASON for the drive gone
> >bad stays uncovered. (I had that said drive in use for less than 4
> >months (if my memory servers, er, serves my well) - it was like new.
>
> I've had disks that were DOA (literally--Medium Errors during
> partitioning and mke2fs, followed by mechanical noises and total failure
> in a matter of a few minutes). I've had several disks that failed a
> week or two after first installation.
Forgive me: what is DOA?
> The M in MTBF is Mean, not Maximum or Minimum. For every disk that
> lasts 10 years or more, there's an equal and opposite disk that dies
> within a few minutes.
Yeah, right, I know that. At least, theoretically ;-). I mean,
if a disk drive fails within rather short time, it is most likely
a mechanical defect, and you will hear unusual noises. My drive
produced absolutely normal sounds, not-too-loud spinning noises,
a little stepping noises, you know. And then, it turned out that
no further damage was done after I replaced the power supply...
> >Hans Reiser wrote (in response to Anders Widman):
> >> If we handle the journal block error without downtime, the user will
> >> never chuck the hard drive, and that is bad in the longterm.
> >
> >Not agreed, unless you continue without a warning.
>
> I'd prefer to continue in read-only mode, and refuse further read-write
> mounts with an error until the filesystem is fscked. I really like
> systems that can still boot and let me (attempt to) run diagnostic
> tools even when they're otherwise really unhealthy.
Exactly - that's my humble opinion, too. Just to emphasize it,
If i get told by a program "there is a serious problem here, it will
cost some of my abilities, but I can deal with it for a short time",
then I'm really alarmed. But if I get told, "there is an extremly
serious problem here, and I (*squeak*)...(*silence*)" (u know what
I mean), then I rather think it's a problem of said program.
> I don't care if recently written data is corrupt or missing--I
> probably didn't write to the diagnostic tools within the last journal
> interval, and if the filesystem is read-only I can't make any metadata
> corruption worse.
Right. I configured all my ext3 filesystems to remount-ro in case of
error. A kernel panic would not help that much, because you need to
find out what's wrong. Unless there are nasty noises from somewhere,
in which case you still can decide to switch off the machine immediately,
you most likely need a halfway running system to search for the problem
(IMHO).
> I would think that most people notice that something's wrong if they
> can't write to their filesystems any more. I certainly wouldn't want
> the filesystem to be modified if there's something known to be wrong
> with the metadata. But if I can't read any of the data at all because
> some tiny part of it is suspicious, I just get annoyed. :-P
You took the words right out my mouth! ((C) Meatloaf, many years ago)
happy coding
dirk
--
Dirk Schenkewitz
InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16 fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching
http://www.interface-ag.de mailto:dirk.schenkewitz@interface-ag.de
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-14 0:18 Sam Vilain
2003-02-23 23:31 ` Zygo Blaxell
0 siblings, 1 reply; 94+ messages in thread
From: Sam Vilain @ 2003-02-14 0:18 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: reiserfs-list
On Thu, 13 Feb 2003 16:42, Zygo Blaxell wrote:
> Last time I checked, Windows and Mac OS come to a near total halt when
> they see a disk error while doing a write on non-removable media, unless
> the application goes to extraordinary lengths to handle the error
> itself.
Perhaps it might, but it sure does a good job of cleaning them up with
ScanDisk. And it almost certainly won't BSOD for a bad sector.
My experience is that it partially hangs the system while it retries the
write, but eventually comes back and the application either has died or
returns an error. I actually think it's comparitively graceful. I mean,
Windows might crash if you sneeze at it for no reason whatsoever, but it
handles disk errors quite well most of the time. Baby's First FileSystem
(FAT) just doesn't have any structure to lose, which sure makes it
resilient. Even if you get a bad block in the FAT it survives, because
there are two copies of it!
Windows doesn't handle resetting the IDE bus when it needs it very well,
I've seen one disk that didn't work in Windows but worked passably in
Linux because of this. Of course it died a few months later :-).
--
Sam Vilain, sam@vilain.net
Real software engineers like C's structured constructs, but they are
suspicious of it because they have heard that it lets you get "close
to the machine."
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 0:18 Sam Vilain
@ 2003-02-23 23:31 ` Zygo Blaxell
2003-02-24 1:14 ` Anders Widman
0 siblings, 1 reply; 94+ messages in thread
From: Zygo Blaxell @ 2003-02-23 23:31 UTC (permalink / raw)
To: reiserfs-list
In article <200302141318.33057.sam@vilain.net>,
Sam Vilain <sam@vilain.net> wrote:
>Perhaps it might, but it sure does a good job of cleaning them up with
>ScanDisk. And it almost certainly won't BSOD for a bad sector.
BSOD, no. Hang, yes. Crash, yes. Kill applications more or less at
random, yes. Corrupt the registry and never successfully boot again, yes.
Corrupt a database somewhere (even on another machine over a network)
and require careful hand-holding to make some application work again, yes.
Take two or more hours to boot, yes.
But BSOD...no.
>Windows doesn't handle resetting the IDE bus when it needs it very well,
>I've seen one disk that didn't work in Windows but worked passably in
>Linux because of this. Of course it died a few months later :-).
I've seen disks that fail in ways that bus resets can't fix. Usually
the problem is heat-related--the drive electronics simply overheat and
the embedded controller crashes. The drive might start working again
if it cools off a bit. Generally if I hear a user say "I get disk errors
that only appear when I'm using the disk heavily", I find a hard drive
with inadequate airflow.
Often there is some data corruption associated with IDE bus resets.
Linux IDE drivers seem to corrupt any I/O requests in progress when it
switches from DMA to PIO mode, which it usually does when the IDE bus
times out on DMA requests. :-P
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-23 23:31 ` Zygo Blaxell
@ 2003-02-24 1:14 ` Anders Widman
0 siblings, 0 replies; 94+ messages in thread
From: Anders Widman @ 2003-02-24 1:14 UTC (permalink / raw)
To: reiserfs-list
> In article <200302141318.33057.sam@vilain.net>,
> Sam Vilain <sam@vilain.net> wrote:
>>Perhaps it might, but it sure does a good job of cleaning them up with
>>ScanDisk. And it almost certainly won't BSOD for a bad sector.
> BSOD, no. Hang, yes. Crash, yes. Kill applications more or less at
> random, yes.
On the other hand. In Linux you most likely CAN'T kill any
applications that tries to read from a failed disk, or even hit a
bad sector.
> Corrupt the registry and never successfully boot again, yes.
> Corrupt a database somewhere (even on another machine over a network)
> and require careful hand-holding to make some application work again, yes.
> Take two or more hours to boot, yes.
With WinNT...yes.
> But BSOD...no.
> Often there is some data corruption associated with IDE bus resets.
> Linux IDE drivers seem to corrupt any I/O requests in progress when it
> switches from DMA to PIO mode, which it usually does when the IDE bus
> times out on DMA requests. :-P
And DMA errors seem be happen quite often with newer type motherboards
and current 'stable' kernels. These often result in multiple smbd
processes that can't be killed and therefore unmountable filesystems.
All OS have their flaws, and there is no point in comparing just to
prove one is better than the other. What we should do, is to try to
improve and evolve things the best we can. ReiserFS is a good way in
this direction, still there are things that can be better. We should
focus on those (at least on this list).
- Anders
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-14 0:17 Sam Vilain
0 siblings, 0 replies; 94+ messages in thread
From: Sam Vilain @ 2003-02-14 0:17 UTC (permalink / raw)
To: bscott; +Cc: reiserfs-list
On Thu, 13 Feb 2003 05:22, bscott@ntisys.com wrote:
> "Not all users have UPSes (battery backups), nor can they find a UPS
> right away, so they have to be able to use their computers even though
> the power is out."
The power supply states 240V. The system still runs when it is at 220V.
Hardware works within given tolerances. It is not perfect. This is a
fundamental. One of those tolerances is a MTBF of disk surfaces.
--
Sam Vilain, sam@vilain.net
By doing just a little every day, I can gradually let the task
completely overwhelm me.
ASHLEIGH BRILLIANT
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-14 0:16 Sam Vilain
2003-02-23 23:10 ` Zygo Blaxell
0 siblings, 1 reply; 94+ messages in thread
From: Sam Vilain @ 2003-02-14 0:16 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: reiserfs-list
On Fri, 14 Feb 2003 09:08, Zygo Blaxell wrote:
> Sam Vilain <sam@vilain.net> wrote:
> >But with disks, you can. Mirroring aside, modern hard disks use
> > S.M.A.R.>T. technology which claims to be able to spot failures before
> > they happen. Many BIOSes will let you turn this feature on and off.
> > Of course I've never actually seen it in action :-).
>
> I have seen SMART work. At 11:20:30 I had a disk fail, then smartd put
> this in my logs:
> Nov 6 11:20:30 chlorine smartd: Device: /dev/hdb, Failed attribute: 3
> Oh, wait, you said "before"...no, I've never actually seen that in
> action either.
As you so eloquently point out in your below paragraph, I was missing the
word `some' in my statement.
> SMART does give you statistics on ECC recovery rates, temperature,
> number of remapped sectors, etc. which can give you a hint, if you keep
> track of them over time, when your disk is beginning to have more
> problems than it did have when it was newer. Maybe about 50% of
> failures can be predicted this way (but you have no idea _when_ the
> failure will occur--this afternoon or next summer?) it's little better
> than the MTBF rating. The other 50% of failures are predicted only
> after the fact. :-P
Presumably 50% is a guess rather than a carefully measured statistic. My
inclination would be towards thinking that 90% or more of failures that do
not happen around the time of a power state change would be noticable by
the ECC corrections first. The failures that happen around the time of
power state change (including power spikes) would make your statistic more
or less correct.
> The position data was
> initially written using frighteningly expensive precision hardware at
> the disk drive factory and cannot be regenerated without said equipment.
Interesting; does this happen before the platter is inserted into the
disk? I have heard that vendors each have specific low level format
utilities, which perform the job of remapping failed sectors and I would
have thought, writing this timing information. Chickens and Eggs spring
to mind, though.
> The M in MTBF is Mean, not Maximum or Minimum. For every disk that
> lasts 10 years or more, there's an equal and opposite disk that dies
> within a few minutes.
It actually stands for Meaningless, I'm sure :-) Vendors should be
required to state this figure in terms of the number of unit failures they
experienced running X units for T amount of time.
--
Sam Vilain, sam@vilain.net
Real software engineers write in languages that have not actually been
implemented for any machine, and for which only the formal spec (in
BNF) is available. This keeps them from having to take any machine
dependencies into account. Machine dependencies make real software
engineers very uneasy.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 0:16 Sam Vilain
@ 2003-02-23 23:10 ` Zygo Blaxell
0 siblings, 0 replies; 94+ messages in thread
From: Zygo Blaxell @ 2003-02-23 23:10 UTC (permalink / raw)
To: reiserfs-list
In article <200302141316.41198.sam@vilain.net>,
Sam Vilain <sam@vilain.net> wrote:
>> SMART does give you statistics on ECC recovery rates, temperature,
>> number of remapped sectors, etc. which can give you a hint, if you keep
>> track of them over time, when your disk is beginning to have more
>> problems than it did have when it was newer. Maybe about 50% of
>> failures can be predicted this way (but you have no idea _when_ the
>> failure will occur--this afternoon or next summer?) it's little better
>> than the MTBF rating. The other 50% of failures are predicted only
>> after the fact. :-P
>
>Presumably 50% is a guess rather than a carefully measured statistic.
The precise figure I have is 12
showed_signs_of_problems_through_smart_first / 23 total_failed_disks,
or 52%. That counts only disks that I was able to observe with SMART data
prior to failure and that have since failed, so it really only counts
disks from the last 2-3 years. 50% was just a guess when I quoted it,
but it was a good guess. ;-)
Prior to SMART, I used to just listen to disks and measure their
read throughput across the disk. If there was any change in the sound
signature, or new non-linearities in the read speed of various areas of
the disk, the disk would fail in the near future roughly half of the time.
One major problem with using SMART data for failure prediction is that
the failure prediction data is all vendor-specific, so its meaning is
not always well-defined, and it is all optional. The "standard" SMART
failure data only reports fatal failure of the disk, and only after it
has failed (e.g. the electrical system fails the power-on self-test, or
a long self-test found a medium error). Some disks report ECC rates,
some report reallocated sectors, some have error logs...others don't.
Some claim to have these features but bugs in the firmware or limited (I
would say "braindead") implementation prevent them from actually working.
>My
>inclination would be towards thinking that 90% or more of failures that do
>not happen around the time of a power state change would be noticable by
>the ECC corrections first.
I have not observed any correlation between ECC correction rates
and drive failure. Most of the drives that I have observed either didn't
report ECC rates at all, did not have significant ECC rates
(a few dozen/hour if not zero), or did not have a significant change
in ECC rates before failure. Also, no drive in my collection with ECC
reporting capability has survived with the ability to report ECC rates
after failure, so I can't even try to read a known bad sector to see if
the ECC count actually goes up.
I have drives that have been running with
millions-of-ECC-detected-per-minute for the past two years. They're
_very_ slow--peak sequential I/O rate of 5MBytes/sec when seeking,
where an identical model drive under identical conditions would normally
give 20-40MBytes/sec. They haven't lost any data yet, and they've never
reported an I/O error, and the vendor won't take them back until one of
those two events happens, so I'm stuck with them. :-P
I've found that the "remapped sector count" is a good indicator of
future fatal disk failure, in that most disks will fatally fail soon
after detecting new remapped sectors; however, "soon" has a range of
a few minutes to a year or more, and the no-remapped-sectors condition
does not mean that the drive will not fail in the near future.
>The failures that happen around the time of
>power state change (including power spikes) would make your statistic more
>or less correct.
I have observed exactly two disk failures in the field around a power
state change, one in 1989 and one in 1991. In both cases the parts
of the drive electronics that are active during initial spin-up had
failed (with a big burned spot on the circuit board).
Since Energy Star, drive vendors have been designing disks to survive
thousands of spin-up/spin-down cycles, and that particular failure mode
has mostly gone away. I've observed thousands of power state changes
(including a dozen power supply failures, such as power supplies that
put 48VAC across the nominal 12VDC hard disk power supply) and hundreds
of disk failures. They don't tend to happen at near the same
time--certainly not anything near 50%.
Of course I can get more interesting failure modes if I start poking
around the drive electronics or components directly, or if I modify
the firmware. What's statistically likely and what's possible are
different things.
>> The position data was
>> initially written using frighteningly expensive precision hardware at
>> the disk drive factory and cannot be regenerated without said equipment>.
>
>Interesting; does this happen before the platter is inserted into the
> disk?
The head position data is usually on one platter surface, and it consumes
all of that surface. All of the disk heads are attached to the same arm
assembly, so they only need one surface--tracks are defined on other
platters in terms of wherever the head happens to be when the head with
the control data is in the correct position. The data must be available
at all times during a seek, so it can't coexist with user data easily.
Tricks used by other devices to solve this problem don't work on hard
disks. CD-ROMs encode position data in a subchannel of the data stream,
which works as long as you don't have to write the data in a device that
costs less than a luxury car. CD-Rs have a separate position data track
elsewhere in the disk, and drives with dual-focus lenses that can see
the user data area and the position data at the same time--all of which
only works for optical systems. DVDs use the multi-layer technique to
store half of their data capacity, which causes problems for DVD writers.
Floppy disks have position control inside the drive instead of on the
media, but they must tolerate position errors in the hundreds of microns.
> I have heard that vendors each have specific low level format
> utilities, which perform the job of remapping failed sectors and I would
> have thought, writing this timing information. Chickens and Eggs spring
> to mind, though.
It is possible for the drive to write position data within a single track;
indeed, it has to, since the positions of the tracks on the platters
cannot be determined until the drive is mostly assembled. This data
specifies sector boundaries and such, and of course all the actual user
data. If the bits of data that indicate "sector 4 begins here" were lost,
a low-level format could get them back, but so could a full track write.
A hard disk by itself couldn't write the seek data accurately enough
even if it had the sensors required--the environment inside a PC has
too much vibration and thermal variation, and the error tolerances for
user data are higher than for the head position data. About the best
you could expect would be the capacity of a Jaz or Zip disk.
Now you'd *think* that all drive vendors would try to prevent writes to
that special platter in hardware, e.g. by not providing a write current
level amplifier for that platter, or even a current limiter to prevent
power surges from other parts of the electronics. You really would.
>> The M in MTBF is Mean, not Maximum or Minimum. [...]
>It actually stands for Meaningless, I'm sure :-) Vendors should be
>required to state this figure in terms of the number of unit failures the>y
>experienced running X units for T amount of time.
Usually they also specify things like duty cycle, which make the MTBF
even more meaningless. I could claim a 5-year MTBF at 20% duty cycle
for a drive that dies within two hours of 100% duty cycle ("oh, yes, but
I only _specified_ the MTBF at 20% duty..."). If I didn't know better,
I'd swear some vendors actually do this. :-P
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-12 20:05 Dirk Schenkewitz
2003-02-13 22:49 ` Zygo Blaxell
0 siblings, 1 reply; 94+ messages in thread
From: Dirk Schenkewitz @ 2003-02-12 20:05 UTC (permalink / raw)
To: Reiserfs List
Looks like i'm under a curse...
I can post, but I don't get any messages from the mailing list,
only those that are explicitely addressed to me. I got to know
about all these other messages only because I requested the last
100 messages again :-/ Hmpf.
Back to the point:
Hans Reiser wrote (in response to Anders Widman):
> For some users it would be better to boot to a corrupted filesystem
> because running fsck is more of a problem than putting their data at
> higher risk. For datalogging, it is probably conceivable to just toss
> the journal and lose the more recent updates to it. For the default
> metadata journaling, this just does not seem prudent.
But I think that not everybody will know about if it's better to toss
the journal or to keep it. I wouldn't, and I know some people who are
much less interested in filesystems and the stuff around them than me.
Even SysOps.
> I really prefer making users understand that they have a problem they
> need to do something about. This is just my style. I want them to fail
> to boot, and after some effort learn that there is this thing called
> fsck, and dd_rescue, and that it is time to buy another hard drive and
> chuck their current one.
For me, it was alarming enough seeing ext3 drop the journal. In fact,
THAT was the point where I went to investigate in other directions
instead of blaming the filesystem.
> It would be best though if they were given detailed instructions about
> how they need to do this when the code hits that bad block.
Agreed! The only problem is, that putting a bad drive to eternal rest
might not solve the problem, as long as the REASON for the drive gone
bad stays uncovered. (I had that said drive in use for less than 4
months (if my memory servers, er, serves my well) - it was like new.
> If we handle the journal block error without downtime, the user will
> never chuck the hard drive, and that is bad in the longterm.
Not agreed, unless you continue without a warning.
happy coding
dirk
--
Dirk Schenkewitz
InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16 fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching
http://www.interface-ag.de mailto:dirk.schenkewitz@interface-ag.de
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 20:05 Dirk Schenkewitz
@ 2003-02-13 22:49 ` Zygo Blaxell
2003-02-14 0:32 ` Hans Reiser
0 siblings, 1 reply; 94+ messages in thread
From: Zygo Blaxell @ 2003-02-13 22:49 UTC (permalink / raw)
To: reiserfs-list
In article <3E4AA902.86F15815@interface-ag.com>,
Dirk Schenkewitz <Dirk.Schenkewitz@interface-ag.com> wrote:
>For me, it was alarming enough seeing ext3 drop the journal. In fact,
>THAT was the point where I went to investigate in other directions
>instead of blaming the filesystem.
The kernel block device messages complaining about I/O errors from the
device aren't sufficient to tell you that there is a serious problem?
Or was this device silently corrupting data without reporting errors?
>The only problem is, that putting a bad drive to eternal rest
>might not solve the problem, as long as the REASON for the drive gone
>bad stays uncovered. (I had that said drive in use for less than 4
>months (if my memory servers, er, serves my well) - it was like new.
I've had disks that were DOA (literally--Medium Errors during
partitioning and mke2fs, followed by mechanical noises and total failure
in a matter of a few minutes). I've had several disks that failed a
week or two after first installation.
The M in MTBF is Mean, not Maximum or Minimum. For every disk that
lasts 10 years or more, there's an equal and opposite disk that dies
within a few minutes.
>Hans Reiser wrote (in response to Anders Widman):
>> If we handle the journal block error without downtime, the user will
>> never chuck the hard drive, and that is bad in the longterm.
>
>Not agreed, unless you continue without a warning.
I'd prefer to continue in read-only mode, and refuse further read-write
mounts with an error until the filesystem is fscked. I really like
systems that can still boot and let me (attempt to) run diagnostic
tools even when they're otherwise really unhealthy. I don't care if
recently written data is corrupt or missing--I probably didn't write to
the diagnostic tools within the last journal interval, and if the
filesystem is read-only I can't make any metadata corruption worse.
I would think that most people notice that something's wrong if they
can't write to their filesystems any more. I certainly wouldn't want
the filesystem to be modified if there's something known to be wrong
with the metadata. But if I can't read any of the data at all because
some tiny part of it is suspicious, I just get annoyed. :-P
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-13 22:49 ` Zygo Blaxell
@ 2003-02-14 0:32 ` Hans Reiser
2003-02-14 8:18 ` Oleg Drokin
0 siblings, 1 reply; 94+ messages in thread
From: Hans Reiser @ 2003-02-14 0:32 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: reiserfs-list
Zygo Blaxell wrote:
>>Hans Reiser wrote (in response to Anders Widman):
>>
>>
>>>If we handle the journal block error without downtime, the user will
>>>never chuck the hard drive, and that is bad in the longterm.
>>>
>>>
>>Not agreed, unless you continue without a warning.
>>
>>
>
>I'd prefer to continue in read-only mode, and refuse further read-write
>mounts with an error until the filesystem is fscked.
>
Yes, I agree with that.
> I really like
>systems that can still boot and let me (attempt to) run diagnostic
>tools even when they're otherwise really unhealthy. I don't care if
>recently written data is corrupt or missing--I probably didn't write to
>the diagnostic tools within the last journal interval, and if the
>filesystem is read-only I can't make any metadata corruption worse.
>
>I would think that most people notice that something's wrong if they
>can't write to their filesystems any more. I certainly wouldn't want
>the filesystem to be modified if there's something known to be wrong
>with the metadata. But if I can't read any of the data at all because
>some tiny part of it is suspicious, I just get annoyed. :-P
>
>
>
Agreed. I think that this is actually what we do currently. Oleg, can
you check that?
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 0:32 ` Hans Reiser
@ 2003-02-14 8:18 ` Oleg Drokin
2003-02-14 10:13 ` Andreas Dilger
0 siblings, 1 reply; 94+ messages in thread
From: Oleg Drokin @ 2003-02-14 8:18 UTC (permalink / raw)
To: Hans Reiser; +Cc: Zygo Blaxell, reiserfs-list
Hello!
On Fri, Feb 14, 2003 at 03:32:42AM +0300, Hans Reiser wrote:
> > I really like
> >systems that can still boot and let me (attempt to) run diagnostic
> >tools even when they're otherwise really unhealthy. I don't care if
> >recently written data is corrupt or missing--I probably didn't write to
> >the diagnostic tools within the last journal interval, and if the
> >filesystem is read-only I can't make any metadata corruption worse.
> >I would think that most people notice that something's wrong if they
> >can't write to their filesystems any more. I certainly wouldn't want
> >the filesystem to be modified if there's something known to be wrong
> >with the metadata. But if I can't read any of the data at all because
> >some tiny part of it is suspicious, I just get annoyed. :-P
> Agreed. I think that this is actually what we do currently. Oleg, can
> you check that?
Currently we panic if write to journal area fails. We report IO error to
userspace if non-journaled write fails it seems (I will check it again).
Bye,
Oleg
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 8:18 ` Oleg Drokin
@ 2003-02-14 10:13 ` Andreas Dilger
2003-02-14 10:17 ` Oleg Drokin
0 siblings, 1 reply; 94+ messages in thread
From: Andreas Dilger @ 2003-02-14 10:13 UTC (permalink / raw)
To: Oleg Drokin; +Cc: Hans Reiser, Zygo Blaxell, reiserfs-list
On Feb 14, 2003 11:18 +0300, Oleg Drokin wrote:
> Currently we panic if write to journal area fails. We report IO error to
> userspace if non-journaled write fails it seems (I will check it again).
I'm thinking "panic" isn't going to help the user's data any more than
not commiting the change... How about remount-ro, or have a mount option
like ext3 "errors={panic,remount-ro,warning}"? If you marked the filesystem
and/or journal in error and mount read-only, and force a full fsck at
reboot time at least the user has a chance - otherwise the node might
just panic in a loop.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 10:13 ` Andreas Dilger
@ 2003-02-14 10:17 ` Oleg Drokin
2003-02-14 10:50 ` Andreas Dilger
0 siblings, 1 reply; 94+ messages in thread
From: Oleg Drokin @ 2003-02-14 10:17 UTC (permalink / raw)
To: Hans Reiser, Zygo Blaxell, reiserfs-list
Hello!
On Fri, Feb 14, 2003 at 03:13:16AM -0700, Andreas Dilger wrote:
> > Currently we panic if write to journal area fails. We report IO error to
> > userspace if non-journaled write fails it seems (I will check it again).
> I'm thinking "panic" isn't going to help the user's data any more than
> not commiting the change... How about remount-ro, or have a mount option
SuSE people work on this.
> like ext3 "errors={panic,remount-ro,warning}"? If you marked the filesystem
> and/or journal in error and mount read-only, and force a full fsck at
> reboot time at least the user has a chance - otherwise the node might
> just panic in a loop.
It hangs on panic ;) (because it does BUG() ), so no cyclical reboot.
There is even big chance that everything not touching problematic fs will
survive and continue to work.
Given that nobody runfs reiserfsck at boot, "full fsck" aproach won't work.
Ah, and reiserfsck ignores -a command line switch because
"we do not trust our fsck yet" (c) Hans.
Bye,
Oleg
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 10:17 ` Oleg Drokin
@ 2003-02-14 10:50 ` Andreas Dilger
2003-02-14 10:59 ` Oleg Drokin
2003-02-14 13:34 ` Hans Reiser
0 siblings, 2 replies; 94+ messages in thread
From: Andreas Dilger @ 2003-02-14 10:50 UTC (permalink / raw)
To: Oleg Drokin; +Cc: Hans Reiser, Zygo Blaxell, reiserfs-list
On Feb 14, 2003 13:17 +0300, Oleg Drokin wrote:
> On Fri, Feb 14, 2003 at 03:13:16AM -0700, Andreas Dilger wrote:
> > > Currently we panic if write to journal area fails. We report IO error to
> > > userspace if non-journaled write fails it seems (I will check it again).
> > I'm thinking "panic" isn't going to help the user's data any more than
> > not commiting the change... How about remount-ro, or have a mount option
>
> SuSE people work on this.
>
> > like ext3 "errors={panic,remount-ro,warning}"? If you marked the filesystem
> > and/or journal in error and mount read-only, and force a full fsck at
> > reboot time at least the user has a chance - otherwise the node might
> > just panic in a loop.
>
> It hangs on panic ;) (because it does BUG() ), so no cyclical reboot.
Ah, you said panic, but panic != BUG... There is a "reboot-on-panic" flag
that is often set on servers so they don't sit stupidly when they could
reboot and start working again.
> There is even big chance that everything not touching problematic fs will
> survive and continue to work.
> Given that nobody runfs reiserfsck at boot, "full fsck" aproach won't work.
> Ah, and reiserfsck ignores -a command line switch because
> "we do not trust our fsck yet" (c) Hans.
Yeah, I keep giving him good reasons to change his mind, even a little,
like "have 'reiserfsck -a' just check the superblock and return with a
code > 1 if there is an error" so that an admin can at least do something
about it if the filesystem is broken, before it gets mounted/written to
again and the brokenness multiplies unknown to the user...
Next, add journal replay to reiserfsck if it isn't already there, and
_then_ do the same check as above, keeping a field in the journal header
to synchronously write an error to in fatal cases, instead of into the
superblock and where it is overwritten by journal replay.
That is all e2fsck does for ext3 filesystems, and it only takes a fraction
of a second to complete (no longer than it takes in-kernel journal replay
to complete at mount time, really) but the user wins by being able to fix
the filesystem before the whole system has booted and possibly corrupted
more data.
Regardless of whether reiserfsck is trusted or not to check/fix the
whole filesystem automatically, the above is not a risky change. If you
wanted to go for more reliability, you could start adding quick "read
only" checks at periodic intervals like ext2 even if you never fix the
filesystem without user intervention. The most common error we see on
the ext3 these days is due to memory/disk corruption that is caught by
the kernel or with a periodic check, which no amount of journaling can
fix or prevent.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 10:50 ` Andreas Dilger
@ 2003-02-14 10:59 ` Oleg Drokin
2003-02-14 13:34 ` Hans Reiser
1 sibling, 0 replies; 94+ messages in thread
From: Oleg Drokin @ 2003-02-14 10:59 UTC (permalink / raw)
To: Zygo Blaxell, reiserfs-list
Hello!
On Fri, Feb 14, 2003 at 03:50:34AM -0700, Andreas Dilger wrote:
> > > like ext3 "errors={panic,remount-ro,warning}"? If you marked the filesystem
> > > and/or journal in error and mount read-only, and force a full fsck at
> > > reboot time at least the user has a chance - otherwise the node might
> > > just panic in a loop.
> > It hangs on panic ;) (because it does BUG() ), so no cyclical reboot.
> Ah, you said panic, but panic != BUG... There is a "reboot-on-panic" flag
Yes, I know. I just used the wrong word. Tere is reiserfs_panic, but it does
BUG(), hence the confusion. (btw panic() in 2.5 seems not to halt the machine.
I was able to continue work on 2.5 after panic() was called. Though that was
during 2.5.20-something I think, so may be it is back to normal behaviour).
> that is often set on servers so they don't sit stupidly when they could
> reboot and start working again.
Yeah, I use that on my servers too.
Bye,
Oleg
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 10:50 ` Andreas Dilger
2003-02-14 10:59 ` Oleg Drokin
@ 2003-02-14 13:34 ` Hans Reiser
2003-02-14 16:04 ` Rudy Zijlstra
2003-02-14 19:06 ` Andreas Dilger
1 sibling, 2 replies; 94+ messages in thread
From: Hans Reiser @ 2003-02-14 13:34 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Oleg Drokin, Zygo Blaxell, reiserfs-list
Andreas Dilger wrote:
>On Feb 14, 2003 13:17 +0300, Oleg Drokin wrote:
>
>
>>On Fri, Feb 14, 2003 at 03:13:16AM -0700, Andreas Dilger wrote:
>>
>>
>>>>Currently we panic if write to journal area fails. We report IO error to
>>>>userspace if non-journaled write fails it seems (I will check it again).
>>>>
>>>>
>>>I'm thinking "panic" isn't going to help the user's data any more than
>>>not commiting the change... How about remount-ro, or have a mount option
>>>
>>>
>>SuSE people work on this.
>>
>>
>>
>>>like ext3 "errors={panic,remount-ro,warning}"? If you marked the filesystem
>>>and/or journal in error and mount read-only, and force a full fsck at
>>>reboot time at least the user has a chance - otherwise the node might
>>>just panic in a loop.
>>>
>>>
>>It hangs on panic ;) (because it does BUG() ), so no cyclical reboot.
>>
>>
>
>Ah, you said panic, but panic != BUG... There is a "reboot-on-panic" flag
>that is often set on servers so they don't sit stupidly when they could
>reboot and start working again.
>
>
>
>>There is even big chance that everything not touching problematic fs will
>>survive and continue to work.
>>Given that nobody runfs reiserfsck at boot, "full fsck" aproach won't work.
>>Ah, and reiserfsck ignores -a command line switch because
>>"we do not trust our fsck yet" (c) Hans.
>>
>>
>
>Yeah, I keep giving him good reasons to change his mind, even a little,
>like "have 'reiserfsck -a' just check the superblock and return with a
>code > 1 if there is an error" so that an admin can at least do something
>about it if the filesystem is broken, before it gets mounted/written to
>again and the brokenness multiplies unknown to the user...
>
I don't understand you.
>
>Next, add journal replay to reiserfsck if it isn't already there,
>
Why, when it is in the kernel?
> and
>_then_ do the same check as above, keeping a field in the journal header
>to synchronously write an error to in fatal cases, instead of into the
>superblock and where it is overwritten by journal replay.
>
>That is all e2fsck does for ext3 filesystems, and it only takes a fraction
>of a second to complete (no longer than it takes in-kernel journal replay
>to complete at mount time, really) but the user wins by being able to fix
>the filesystem before the whole system has booted and possibly corrupted
>more data.
>
>Regardless of whether reiserfsck is trusted or not to check/fix the
>whole filesystem automatically, the above is not a risky change. If you
>wanted to go for more reliability, you could start adding quick "read
>only" checks at periodic intervals like ext2 even if you never fix the
>filesystem without user intervention. The most common error we see on
>the ext3 these days is due to memory/disk corruption that is caught by
>the kernel or with a periodic check, which no amount of journaling can
>fix or prevent.
>
I hate it when booting causes me to get stuck waiting for an fsck.
Probably fsck is stable enough now that we should encourage people to
run it readonly regularly, but it should not be forced on them.
Maybe having some code to check whether fsck was run in the last 3
months, and if not then if the user types y in the next 30 seconds
during boot it will be run, would make sense.
The ext2 tradition of checking the number of mounts since the last fsck
is simply counting the wrong thing.
>
>Cheers, Andreas
>--
>Andreas Dilger
>http://sourceforge.net/projects/ext2resize/
>http://www-mddsp.enel.ucalgary.ca/People/adilger/
>
>
>
>
>
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 13:34 ` Hans Reiser
@ 2003-02-14 16:04 ` Rudy Zijlstra
2003-02-14 19:06 ` Andreas Dilger
1 sibling, 0 replies; 94+ messages in thread
From: Rudy Zijlstra @ 2003-02-14 16:04 UTC (permalink / raw)
To: Hans Reiser; +Cc: Andreas Dilger, Oleg Drokin, Zygo Blaxell, reiserfs-list
On Fri, 14 Feb 2003, Hans Reiser wrote:
> Andreas Dilger wrote:
>
> >On Feb 14, 2003 13:17 +0300, Oleg Drokin wrote:
> >
> >
> >>On Fri, Feb 14, 2003 at 03:13:16AM -0700, Andreas Dilger wrote:
> >>
> >>
> >>>>Currently we panic if write to journal area fails. We report IO error to
> >>>>userspace if non-journaled write fails it seems (I will check it again).
> >>>>
> >>>>
> >>>I'm thinking "panic" isn't going to help the user's data any more than
> >>>not commiting the change... How about remount-ro, or have a mount option
> >>>
> >>>
> >>SuSE people work on this.
> >>
> >>
> >>
> >>>like ext3 "errors={panic,remount-ro,warning}"? If you marked the filesystem
> >>>and/or journal in error and mount read-only, and force a full fsck at
> >>>reboot time at least the user has a chance - otherwise the node might
> >>>just panic in a loop.
> >>>
> >>>
> >>It hangs on panic ;) (because it does BUG() ), so no cyclical reboot.
> >>
> >>
> >
> >Ah, you said panic, but panic != BUG... There is a "reboot-on-panic" flag
> >that is often set on servers so they don't sit stupidly when they could
> >reboot and start working again.
> >
> >
> >
> >>There is even big chance that everything not touching problematic fs will
> >>survive and continue to work.
> >>Given that nobody runfs reiserfsck at boot, "full fsck" aproach won't work.
> >>Ah, and reiserfsck ignores -a command line switch because
> >>"we do not trust our fsck yet" (c) Hans.
> >>
> >>
> >
> >Yeah, I keep giving him good reasons to change his mind, even a little,
> >like "have 'reiserfsck -a' just check the superblock and return with a
> >code > 1 if there is an error" so that an admin can at least do something
> >about it if the filesystem is broken, before it gets mounted/written to
> >again and the brokenness multiplies unknown to the user...
> >
> I don't understand you.
>
> >
> >Next, add journal replay to reiserfsck if it isn't already there,
> >
> Why, when it is in the kernel?
Just to be certain, on read-only mount the journal replay is also done?
This would then solve the above question.
This would need support in the kernel to force read-only mount in case an
error flag as proposed above is set. For redundancy this would need to be
present in 2 places. (one place might be on the bad block..)
>
> > and
> >_then_ do the same check as above, keeping a field in the journal header
> >to synchronously write an error to in fatal cases, instead of into the
> >superblock and where it is overwritten by journal replay.
> >
> >That is all e2fsck does for ext3 filesystems, and it only takes a fraction
> >of a second to complete (no longer than it takes in-kernel journal replay
> >to complete at mount time, really) but the user wins by being able to fix
> >the filesystem before the whole system has booted and possibly corrupted
> >more data.
> >
> >Regardless of whether reiserfsck is trusted or not to check/fix the
> >whole filesystem automatically, the above is not a risky change. If you
> >wanted to go for more reliability, you could start adding quick "read
> >only" checks at periodic intervals like ext2 even if you never fix the
> >filesystem without user intervention. The most common error we see on
> >the ext3 these days is due to memory/disk corruption that is caught by
> >the kernel or with a periodic check, which no amount of journaling can
> >fix or prevent.
> >
> I hate it when booting causes me to get stuck waiting for an fsck.
>
So do I, its why i use Reiserfs...
> Probably fsck is stable enough now that we should encourage people to
> run it readonly regularly, but it should not be forced on them.
>
Is it then possible to give the -a switch a sensible meaning?
> Maybe having some code to check whether fsck was run in the last 3
> months, and if not then if the user types y in the next 30 seconds
> during boot it will be run, would make sense.
>
Regards,
Rudy
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 13:34 ` Hans Reiser
2003-02-14 16:04 ` Rudy Zijlstra
@ 2003-02-14 19:06 ` Andreas Dilger
2003-02-14 19:19 ` Hans Reiser
1 sibling, 1 reply; 94+ messages in thread
From: Andreas Dilger @ 2003-02-14 19:06 UTC (permalink / raw)
To: Hans Reiser; +Cc: Oleg Drokin, Zygo Blaxell, reiserfs-list
On Feb 14, 2003 16:34 +0300, Hans Reiser wrote:
> Andreas Dilger wrote:
> >Yeah, I keep giving him good reasons to change his mind, even a little,
> >like "have 'reiserfsck -a' just check the superblock and return with a
> >code > 1 if there is an error" so that an admin can at least do something
> >about it if the filesystem is broken, before it gets mounted/written to
> >again and the brokenness multiplies unknown to the user...
>
> I don't understand you.
Ok, so the reiserfs kernel code detects an error on disk, what does it
do? Print out an error message, maybe BUG? There is an "error" field
in the reiserfs superblock, I hope it is set when the kernel detects
something bad.
So, now what happens? Maybe the user doesn't read their syslog and
doesn't see the error, or the error is just a prelude to memory corruption
which causes the system to crash. When the system boots again, it goes
on its merry way, mounting the reiserfs filesystem with _known_ errors
on it, using bad allocation bitmaps, directories btrees, etc and maybe
double allocating blocks or overwriting blocks from other files causing
them to become corrupt, etc, etc, etc. Until finally the filesystem is
totally corrupt, the system crashes miserably, the user emails this list
and reiserfsck has an impossible job trying to fix the filesystem.
Instead, what I propose is to have "reiserfsck -a" AS A STARTING POINT
simply check for a valid reiserfs superblock and the absence of the
"error" flag before declaring the filesystem clean and allowing the
system to boot.
What's even worse, the reiserfs_read_super (at least 2.4.18 RH kernel)
code OVERWRITES the superblock error status at mount time, making it
worse than useless, since each mount hides any errors that were detected
before the crash:
s->u.reiserfs_sb.s_mount_state = SB_REISERFS_STATE(s);
s->u.reiserfs_sb.s_mount_state = REISERFS_VALID_FS ;
> >Next, add journal replay to reiserfsck if it isn't already there,
> >
> Why, when it is in the kernel?
Because that is the next stage to allowing reiserfsck do checks on the
filesystem after a crash. Do you tell me you would rather (and you
must, because it obviously currently does) have reiserfsck just throw
away everything in the journal, leaving possibly inconsistent data in
the filesystem for it to check? Or maybe make the user mount the
filesystem (which obviously has problems or they wouldn't be running
reiserfsck to do a full check) just to clear out the journal and maybe
risk crashing or corruption if the filesystem is strangely corrupted?
> > and
> >_then_ do the same check as above, keeping a field in the journal header
> >to synchronously write an error to in fatal cases, instead of into the
> >superblock and where it is overwritten by journal replay.
> >
> >That is all e2fsck does for ext3 filesystems, and it only takes a fraction
> >of a second to complete (no longer than it takes in-kernel journal replay
> >to complete at mount time, really) but the user wins by being able to fix
> >the filesystem before the whole system has booted and possibly corrupted
> >more data.
>
> >Regardless of whether reiserfsck is trusted or not to check/fix the
> >whole filesystem automatically, the above is not a risky change. If you
> >wanted to go for more reliability, you could start adding quick "read
> >only" checks at periodic intervals like ext2 even if you never fix the
> >filesystem without user intervention. The most common error we see on
> >the ext3 these days is due to memory/disk corruption that is caught by
> >the kernel or with a periodic check, which no amount of journaling can
> >fix or prevent.
>
> I hate it when booting causes me to get stuck waiting for an fsck.
You don't hate it more when you lose data? You are a strange man then.
> Probably fsck is stable enough now that we should encourage people to
> run it readonly regularly, but it should not be forced on them.
I'm nowhere suggesting that reiserfsck _has_ to implement the additional
periodic read-only checks at boot time, mostly just that it do _very_simple_
checks that take .01 seconds at boot time before the filesystem is mounted,
so that the admin/user at least _notices_ that there is an error instead of
not finding out until their filesystem is totally corrupted.
> Maybe having some code to check whether fsck was run in the last 3
> months, and if not then if the user types y in the next 30 seconds
> during boot it will be run, would make sense.
Sure, that would be great, given the prevelance of memory errors and
IDE DMA errors that show up these days, which the filesystem and the
journal can do nothing about.
> The ext2 tradition of checking the number of mounts since the last fsck
> is simply counting the wrong thing.
It's only a matter of defaults safe vs. fast... e2fsck defaults to safe,
checking occasionally for possible corruption, vs. reiserfs waiting for
fatal corruption before forcing the user to run reiserfsck (which is so
heavily discouraged (on the list, documentation, when run), that nobody
runs it for fear of damaging their filesystem further. You are well aware
that the e2fsck check intervals can be tuned per-filesystem and even
disabled if desired (it prints options for how to do this at mke2fs time
and is clearly documented for the experienced user). For a boot-once-a-day
machine, the default is to check about once a month (at most 6 months for
the time check), and if machines are crashing more often, then they should
probably be checked more often because _something_ has to be causing crashes.
Having reiserfsck just do read-only checks shouldn't force you to type
"yes" (and we mean "yes" because this is so scary, mere mortals shouldn't
be doing this). Hans, you've always talked about making things easy for
the average user (error messages and such), don't you think that making
a data consistency check for the user a little less intimidating too?
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 19:06 ` Andreas Dilger
@ 2003-02-14 19:19 ` Hans Reiser
2003-02-15 12:51 ` Vitaly Fertman
2003-02-15 22:37 ` Andreas Dilger
0 siblings, 2 replies; 94+ messages in thread
From: Hans Reiser @ 2003-02-14 19:19 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Oleg Drokin, Zygo Blaxell, reiserfs-list
Andreas Dilger wrote:
>On Feb 14, 2003 16:34 +0300, Hans Reiser wrote:
>
>
>>Andreas Dilger wrote:
>>
>>
>>>Yeah, I keep giving him good reasons to change his mind, even a little,
>>>like "have 'reiserfsck -a' just check the superblock and return with a
>>>code > 1 if there is an error" so that an admin can at least do something
>>>about it if the filesystem is broken, before it gets mounted/written to
>>>again and the brokenness multiplies unknown to the user...
>>>
>>>
>>I don't understand you.
>>
>>
>
>Ok, so the reiserfs kernel code detects an error on disk, what does it
>do? Print out an error message, maybe BUG? There is an "error" field
>in the reiserfs superblock, I hope it is set when the kernel detects
>something bad.
>
>So, now what happens? Maybe the user doesn't read their syslog and
>doesn't see the error, or the error is just a prelude to memory corruption
>which causes the system to crash. When the system boots again, it goes
>on its merry way, mounting the reiserfs filesystem with _known_ errors
>on it, using bad allocation bitmaps, directories btrees, etc and maybe
>double allocating blocks or overwriting blocks from other files causing
>them to become corrupt, etc, etc, etc. Until finally the filesystem is
>totally corrupt, the system crashes miserably, the user emails this list
>and reiserfsck has an impossible job trying to fix the filesystem.
>
>Instead, what I propose is to have "reiserfsck -a" AS A STARTING POINT
>simply check for a valid reiserfs superblock and the absence of the
>"error" flag before declaring the filesystem clean and allowing the
>system to boot.
>
>What's even worse, the reiserfs_read_super (at least 2.4.18 RH kernel)
>code OVERWRITES the superblock error status at mount time, making it
>worse than useless, since each mount hides any errors that were detected
>before the crash:
>
> s->u.reiserfs_sb.s_mount_state = SB_REISERFS_STATE(s);
> s->u.reiserfs_sb.s_mount_state = REISERFS_VALID_FS ;
>
Andreas seems reasonable, Vitaly, what are your thoughts?
>
>
>
>>>Next, add journal replay to reiserfsck if it isn't already there,
>>>
>>>
>>>
>>Why, when it is in the kernel?
>>
>>
>
>Because that is the next stage to allowing reiserfsck do checks on the
>filesystem after a crash. Do you tell me you would rather (and you
>must, because it obviously currently does) have reiserfsck just throw
>away everything in the journal, leaving possibly inconsistent data in
>the filesystem for it to check? Or maybe make the user mount the
>filesystem (which obviously has problems or they wouldn't be running
>reiserfsck to do a full check) just to clear out the journal and maybe
>risk crashing or corruption if the filesystem is strangely corrupted?
>
Vitaly, answer this.
>
>
>
>>Maybe having some code to check whether fsck was run in the last 3
>>months, and if not then if the user types y in the next 30 seconds
>>during boot it will be run, would make sense.
>>
>>
>
>Sure, that would be great, given the prevelance of memory errors and
>IDE DMA errors that show up these days, which the filesystem and the
>journal can do nothing about.
>
>
>
>>The ext2 tradition of checking the number of mounts since the last fsck
>>is simply counting the wrong thing.
>>
>>
>
>It's only a matter of defaults safe vs. fast... e2fsck defaults to safe,
>checking occasionally for possible corruption, vs. reiserfs waiting for
>fatal corruption before forcing the user to run reiserfsck (which is so
>heavily discouraged (on the list, documentation, when run), that nobody
>runs it for fear of damaging their filesystem further.
>
It is probably not so dangerous anymore.
> You are well aware
>that the e2fsck check intervals can be tuned per-filesystem and even
>disabled if desired (it prints options for how to do this at mke2fs time
>and is clearly documented for the experienced user). For a boot-once-a-day
>machine, the default is to check about once a month (at most 6 months for
>the time check), and if machines are crashing more often, then they should
>probably be checked more often because _something_ has to be causing crashes.
>
The idea that how often you boot determines how often it checks is just
silly, sorry.
>
>Having reiserfsck just do read-only checks shouldn't force you to type
>"yes" (and we mean "yes" because this is so scary, mere mortals shouldn't
>be doing this). Hans, you've always talked about making things easy for
>the average user (error messages and such), don't you think that making
>a data consistency check for the user a little less intimidating too?
>
>
>
I think that you should have to agree that you have time to wait for
fsck before you get stuck with a 1 day large server fsck.
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 19:19 ` Hans Reiser
@ 2003-02-15 12:51 ` Vitaly Fertman
2003-02-15 13:00 ` Vitaly Fertman
` (2 more replies)
2003-02-15 22:37 ` Andreas Dilger
1 sibling, 3 replies; 94+ messages in thread
From: Vitaly Fertman @ 2003-02-15 12:51 UTC (permalink / raw)
To: Hans Reiser, Andreas Dilger; +Cc: Oleg Drokin, Zygo Blaxell, reiserfs-list
> >Ok, so the reiserfs kernel code detects an error on disk, what does it
> >do? Print out an error message, maybe BUG? There is an "error" field
> >in the reiserfs superblock, I hope it is set when the kernel detects
> >something bad.
> >
> >So, now what happens? Maybe the user doesn't read their syslog and
> >doesn't see the error, or the error is just a prelude to memory corruption
> >which causes the system to crash. When the system boots again, it goes
> >on its merry way, mounting the reiserfs filesystem with _known_ errors
> >on it, using bad allocation bitmaps, directories btrees, etc and maybe
> >double allocating blocks or overwriting blocks from other files causing
> >them to become corrupt, etc, etc, etc. Until finally the filesystem is
> >totally corrupt, the system crashes miserably, the user emails this list
> >and reiserfsck has an impossible job trying to fix the filesystem.
> >
> >Instead, what I propose is to have "reiserfsck -a" AS A STARTING POINT
> >simply check for a valid reiserfs superblock and the absence of the
> >"error" flag before declaring the filesystem clean and allowing the
> >system to boot.
> >
> >What's even worse, the reiserfs_read_super (at least 2.4.18 RH kernel)
> >code OVERWRITES the superblock error status at mount time, making it
> >worse than useless, since each mount hides any errors that were detected
> >before the crash:
> >
> > s->u.reiserfs_sb.s_mount_state = SB_REISERFS_STATE(s);
> > s->u.reiserfs_sb.s_mount_state = REISERFS_VALID_FS ;
>
> Andreas seems reasonable, Vitaly, what are your thoughts?
>
> >>>Next, add journal replay to reiserfsck if it isn't already there,
> >>
> >>Why, when it is in the kernel?
> >
> >Because that is the next stage to allowing reiserfsck do checks on the
> >filesystem after a crash. Do you tell me you would rather (and you
> >must, because it obviously currently does) have reiserfsck just throw
> >away everything in the journal, leaving possibly inconsistent data in
> >the filesystem for it to check? Or maybe make the user mount the
> >filesystem (which obviously has problems or they wouldn't be running
> >reiserfsck to do a full check) just to clear out the journal and maybe
> >risk crashing or corruption if the filesystem is strangely corrupted?
>
> Vitaly, answer this.
Ok, so probably we should make the following changes. The kernel set IO_ERROR
and FS_ERROR flags.
In the case of IO_ERROR reiserfsck prints the message about hardware problems
and returns error, so the fs does not get mounted at boot. On attempt mounting
the fs with IO_ERROR flag set it is mounted ro with some message about hardware
problems. When you are sure that problems disappeared you can mount it with a
spetial option cleaning this flag and probably reiserfstune will have some
option cleaning these flags also.
In the case of FS_ERROR - search_by_key failed or beyond end of device access
or similar - reiserfsck gets -a option at boot, replays the journal if needed
and checks for the flag. No flag - returns OK. Else - run fix-fixable. Errors
left - returns 'errors left uncorrected' and the fs does not get mounted at
boot. On attempt mounting the fs with the flag just print the message about
mounting the fs with errors and mount it. Not ro here as kernel will not do
deep analysis of errors and it could be just a small insignificant error.
--
Thanks,
Vitaly Fertman
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-15 12:51 ` Vitaly Fertman
@ 2003-02-15 13:00 ` Vitaly Fertman
2003-02-18 19:50 ` Hans Reiser
2003-02-15 13:04 ` Anders Widman
2003-02-17 19:43 ` Hans Reiser
2 siblings, 1 reply; 94+ messages in thread
From: Vitaly Fertman @ 2003-02-15 13:00 UTC (permalink / raw)
To: Hans Reiser, Andreas Dilger; +Cc: Oleg Drokin, Zygo Blaxell, reiserfs-list
On Saturday 15 February 2003 15:51, Vitaly Fertman wrote:
> > >Ok, so the reiserfs kernel code detects an error on disk, what does it
> > >do? Print out an error message, maybe BUG? There is an "error" field
> > >in the reiserfs superblock, I hope it is set when the kernel detects
> > >something bad.
> > >
> > >So, now what happens? Maybe the user doesn't read their syslog and
> > >doesn't see the error, or the error is just a prelude to memory
> > > corruption which causes the system to crash. When the system boots
> > > again, it goes on its merry way, mounting the reiserfs filesystem with
> > > _known_ errors on it, using bad allocation bitmaps, directories btrees,
> > > etc and maybe double allocating blocks or overwriting blocks from other
> > > files causing them to become corrupt, etc, etc, etc. Until finally the
> > > filesystem is totally corrupt, the system crashes miserably, the user
> > > emails this list and reiserfsck has an impossible job trying to fix the
> > > filesystem.
> > >
> > >Instead, what I propose is to have "reiserfsck -a" AS A STARTING POINT
> > >simply check for a valid reiserfs superblock and the absence of the
> > >"error" flag before declaring the filesystem clean and allowing the
> > >system to boot.
> > >
> > >What's even worse, the reiserfs_read_super (at least 2.4.18 RH kernel)
> > >code OVERWRITES the superblock error status at mount time, making it
> > >worse than useless, since each mount hides any errors that were detected
> > >before the crash:
> > >
> > > s->u.reiserfs_sb.s_mount_state = SB_REISERFS_STATE(s);
> > > s->u.reiserfs_sb.s_mount_state = REISERFS_VALID_FS ;
> >
> > Andreas seems reasonable, Vitaly, what are your thoughts?
> >
> > >>>Next, add journal replay to reiserfsck if it isn't already there,
> > >>
> > >>Why, when it is in the kernel?
> > >
> > >Because that is the next stage to allowing reiserfsck do checks on the
> > >filesystem after a crash. Do you tell me you would rather (and you
> > >must, because it obviously currently does) have reiserfsck just throw
> > >away everything in the journal, leaving possibly inconsistent data in
> > >the filesystem for it to check? Or maybe make the user mount the
> > >filesystem (which obviously has problems or they wouldn't be running
> > >reiserfsck to do a full check) just to clear out the journal and maybe
> > >risk crashing or corruption if the filesystem is strangely corrupted?
> >
> > Vitaly, answer this.
>
> Ok, so probably we should make the following changes. The kernel set
> IO_ERROR and FS_ERROR flags.
> In the case of IO_ERROR reiserfsck prints the message about hardware
> problems and returns error, so the fs does not get mounted at boot. On
> attempt mounting the fs with IO_ERROR flag set it is mounted ro with some
> message about hardware problems. When you are sure that problems
> disappeared you can mount it with a spetial option cleaning this flag and
> probably reiserfstune will have some option cleaning these flags also.
> In the case of FS_ERROR - search_by_key failed or beyond end of device
> access or similar - reiserfsck gets -a option at boot, replays the journal
> if needed and checks for the flag. No flag - returns OK. Else - run
> fix-fixable. Errors left - returns 'errors left uncorrected' and the fs
> does not get mounted at boot. On attempt mounting the fs with the flag just
> print the message about mounting the fs with errors and mount it. Not ro
> here as kernel will not do deep analysis of errors and it could be just a
> small insignificant error.
If fix-fixable finds fatal corruptions it sets FS_FATAL what means that the
second run of fix-fixable will not help to avoid another fsck run at the next
boot.
--
Thanks,
Vitaly Fertman
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-15 13:00 ` Vitaly Fertman
@ 2003-02-18 19:50 ` Hans Reiser
2003-02-18 20:05 ` Vitaly Fertman
0 siblings, 1 reply; 94+ messages in thread
From: Hans Reiser @ 2003-02-18 19:50 UTC (permalink / raw)
To: Vitaly Fertman; +Cc: Andreas Dilger, Oleg Drokin, Zygo Blaxell, reiserfs-list
Vitaly Fertman wrote:
>On Saturday 15 February 2003 15:51, Vitaly Fertman wrote:
>
>
>>>>Ok, so the reiserfs kernel code detects an error on disk, what does it
>>>>do? Print out an error message, maybe BUG? There is an "error" field
>>>>in the reiserfs superblock, I hope it is set when the kernel detects
>>>>something bad.
>>>>
>>>>So, now what happens? Maybe the user doesn't read their syslog and
>>>>doesn't see the error, or the error is just a prelude to memory
>>>>corruption which causes the system to crash. When the system boots
>>>>again, it goes on its merry way, mounting the reiserfs filesystem with
>>>>_known_ errors on it, using bad allocation bitmaps, directories btrees,
>>>>etc and maybe double allocating blocks or overwriting blocks from other
>>>>files causing them to become corrupt, etc, etc, etc. Until finally the
>>>>filesystem is totally corrupt, the system crashes miserably, the user
>>>>emails this list and reiserfsck has an impossible job trying to fix the
>>>>filesystem.
>>>>
>>>>Instead, what I propose is to have "reiserfsck -a" AS A STARTING POINT
>>>>simply check for a valid reiserfs superblock and the absence of the
>>>>"error" flag before declaring the filesystem clean and allowing the
>>>>system to boot.
>>>>
>>>>What's even worse, the reiserfs_read_super (at least 2.4.18 RH kernel)
>>>>code OVERWRITES the superblock error status at mount time, making it
>>>>worse than useless, since each mount hides any errors that were detected
>>>>before the crash:
>>>>
>>>> s->u.reiserfs_sb.s_mount_state = SB_REISERFS_STATE(s);
>>>> s->u.reiserfs_sb.s_mount_state = REISERFS_VALID_FS ;
>>>>
>>>>
>>>Andreas seems reasonable, Vitaly, what are your thoughts?
>>>
>>>
>>>
>>>>>>Next, add journal replay to reiserfsck if it isn't already there,
>>>>>>
>>>>>>
>>>>>Why, when it is in the kernel?
>>>>>
>>>>>
>>>>Because that is the next stage to allowing reiserfsck do checks on the
>>>>filesystem after a crash. Do you tell me you would rather (and you
>>>>must, because it obviously currently does) have reiserfsck just throw
>>>>away everything in the journal, leaving possibly inconsistent data in
>>>>the filesystem for it to check? Or maybe make the user mount the
>>>>filesystem (which obviously has problems or they wouldn't be running
>>>>reiserfsck to do a full check) just to clear out the journal and maybe
>>>>risk crashing or corruption if the filesystem is strangely corrupted?
>>>>
>>>>
>>>Vitaly, answer this.
>>>
>>>
>>Ok, so probably we should make the following changes. The kernel set
>>IO_ERROR and FS_ERROR flags.
>>In the case of IO_ERROR reiserfsck prints the message about hardware
>>problems and returns error, so the fs does not get mounted at boot. On
>>attempt mounting the fs with IO_ERROR flag set it is mounted ro with some
>>message about hardware problems. When you are sure that problems
>>disappeared you can mount it with a spetial option cleaning this flag and
>>probably reiserfstune will have some option cleaning these flags also.
>>In the case of FS_ERROR - search_by_key failed or beyond end of device
>>access or similar - reiserfsck gets -a option at boot, replays the journal
>>if needed and checks for the flag. No flag - returns OK. Else - run
>>fix-fixable. Errors left - returns 'errors left uncorrected' and the fs
>>does not get mounted at boot. On attempt mounting the fs with the flag just
>>print the message about mounting the fs with errors and mount it. Not ro
>>here as kernel will not do deep analysis of errors and it could be just a
>>small insignificant error.
>>
>>
>
>If fix-fixable finds fatal corruptions it sets FS_FATAL what means that the
>second run of fix-fixable will not help to avoid another fsck run at the next
>boot.
>
>
>
Can you repeat what you said (more clearly).
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-18 19:50 ` Hans Reiser
@ 2003-02-18 20:05 ` Vitaly Fertman
2003-02-18 22:18 ` Hans Reiser
0 siblings, 1 reply; 94+ messages in thread
From: Vitaly Fertman @ 2003-02-18 20:05 UTC (permalink / raw)
To: Hans Reiser; +Cc: Andreas Dilger, Oleg Drokin, Zygo Blaxell, reiserfs-list
> >If fix-fixable finds fatal corruptions it sets FS_FATAL what means that
> > the second run of fix-fixable will not help to avoid another fsck run at
> > the next boot.
>
> Can you repeat what you said (more clearly).
If fs was not fixed, you do not want to have another run of fsck at
the next boot. you already know that there are fatal corruptions and
you need to run rebuild-tree manually. So a special flag saying that
would be helpful.
--
Thanks,
Vitaly Fertman
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-18 20:05 ` Vitaly Fertman
@ 2003-02-18 22:18 ` Hans Reiser
0 siblings, 0 replies; 94+ messages in thread
From: Hans Reiser @ 2003-02-18 22:18 UTC (permalink / raw)
To: Vitaly Fertman; +Cc: Andreas Dilger, Oleg Drokin, Zygo Blaxell, reiserfs-list
Vitaly Fertman wrote:
>>>If fix-fixable finds fatal corruptions it sets FS_FATAL what means that
>>>the second run of fix-fixable will not help to avoid another fsck run at
>>>the next boot.
>>>
>>>
>>Can you repeat what you said (more clearly).
>>
>>
>
>If fs was not fixed, you do not want to have another run of fsck at
>the next boot. you already know that there are fatal corruptions and
>you need to run rebuild-tree manually. So a special flag saying that
>would be helpful.
>
>
>
ok, along with special messages clearly explaining it to the user at
each boot, yes?
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-15 12:51 ` Vitaly Fertman
2003-02-15 13:00 ` Vitaly Fertman
@ 2003-02-15 13:04 ` Anders Widman
2003-02-15 13:23 ` Oleg Drokin
2003-02-17 19:43 ` Hans Reiser
2 siblings, 1 reply; 94+ messages in thread
From: Anders Widman @ 2003-02-15 13:04 UTC (permalink / raw)
To: reiserfs-list
> Ok, so probably we should make the following changes. The kernel set IO_ERROR
> and FS_ERROR flags.
> In the case of IO_ERROR reiserfsck prints the message about hardware problems
> and returns error, so the fs does not get mounted at boot. On attempt mounting
> the fs with IO_ERROR flag set it is mounted ro with some message about hardware
> problems. When you are sure that problems disappeared you can mount it with a
> spetial option cleaning this flag and probably reiserfstune will have some
> option cleaning these flags also.
> In the case of FS_ERROR - search_by_key failed or beyond end of device access
> or similar - reiserfsck gets -a option at boot, replays the journal if needed
> and checks for the flag. No flag - returns OK. Else - run fix-fixable. Errors
> left - returns 'errors left uncorrected' and the fs does not get mounted at
> boot. On attempt mounting the fs with the flag just print the message about
> mounting the fs with errors and mount it. Not ro here as kernel will not do
> deep analysis of errors and it could be just a small insignificant error.
What happens if there are FS errors on boot device? Not mounting ro
will make system inaccessible unless users has some other boot or
rescue media.
Of course, this is where admins should have another disk or
partition to boot and make FS repairs. :)
- Anders
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-15 13:04 ` Anders Widman
@ 2003-02-15 13:23 ` Oleg Drokin
0 siblings, 0 replies; 94+ messages in thread
From: Oleg Drokin @ 2003-02-15 13:23 UTC (permalink / raw)
To: Anders Widman; +Cc: reiserfs-list
Hello!
On Sat, Feb 15, 2003 at 02:04:54PM +0100, Anders Widman wrote:
> > Ok, so probably we should make the following changes. The kernel set IO_ERROR
> > and FS_ERROR flags.
> > In the case of IO_ERROR reiserfsck prints the message about hardware problems
> > and returns error, so the fs does not get mounted at boot. On attempt mounting
> > the fs with IO_ERROR flag set it is mounted ro with some message about hardware
> > problems. When you are sure that problems disappeared you can mount it with a
> > spetial option cleaning this flag and probably reiserfstune will have some
> > option cleaning these flags also.
> > In the case of FS_ERROR - search_by_key failed or beyond end of device access
> > or similar - reiserfsck gets -a option at boot, replays the journal if needed
> > and checks for the flag. No flag - returns OK. Else - run fix-fixable. Errors
> > left - returns 'errors left uncorrected' and the fs does not get mounted at
> > boot. On attempt mounting the fs with the flag just print the message about
> > mounting the fs with errors and mount it. Not ro here as kernel will not do
> > deep analysis of errors and it could be just a small insignificant error.
> What happens if there are FS errors on boot device? Not mounting ro
It get's mounted readonly first, then fsck will exit with appropriate error code,
then system will refuse to continue booting process, I presume ;)
> will make system inaccessible unless users has some other boot or
> rescue media.
Sure, just like with any other filesystem.
If you want to avoid that, don't run fsck on your root device during boot,
or ignore what it have said to you ;)
> Of course, this is where admins should have another disk or
> partition to boot and make FS repairs. :)
Actually it is possible to reiserfsck (readonly) mounted filesystem. The only
prerequisite I found is reiserfsck binary (built statically it seems, or
with all the libs) should be located on another fs (tmpfs works just fine for me).
And of course little bit of reiserfsck patching is needed ;)
That's how I repair broken root filesystems on our test-boxes when rootfs is getting
corrupt, because some moron\bclear guy have not created rescue partition in there
and I hate booting from CDs (And CDs usually have outdated reiserfsck anyway).
Bye,
Oleg
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-15 12:51 ` Vitaly Fertman
2003-02-15 13:00 ` Vitaly Fertman
2003-02-15 13:04 ` Anders Widman
@ 2003-02-17 19:43 ` Hans Reiser
2 siblings, 0 replies; 94+ messages in thread
From: Hans Reiser @ 2003-02-17 19:43 UTC (permalink / raw)
To: Vitaly Fertman; +Cc: Andreas Dilger, Oleg Drokin, Zygo Blaxell, reiserfs-list
Vitaly Fertman wrote:
>>>Ok, so the reiserfs kernel code detects an error on disk, what does it
>>>do? Print out an error message, maybe BUG? There is an "error" field
>>>in the reiserfs superblock, I hope it is set when the kernel detects
>>>something bad.
>>>
>>>So, now what happens? Maybe the user doesn't read their syslog and
>>>doesn't see the error, or the error is just a prelude to memory corruption
>>>which causes the system to crash. When the system boots again, it goes
>>>on its merry way, mounting the reiserfs filesystem with _known_ errors
>>>on it, using bad allocation bitmaps, directories btrees, etc and maybe
>>>double allocating blocks or overwriting blocks from other files causing
>>>them to become corrupt, etc, etc, etc. Until finally the filesystem is
>>>totally corrupt, the system crashes miserably, the user emails this list
>>>and reiserfsck has an impossible job trying to fix the filesystem.
>>>
>>>Instead, what I propose is to have "reiserfsck -a" AS A STARTING POINT
>>>simply check for a valid reiserfs superblock and the absence of the
>>>"error" flag before declaring the filesystem clean and allowing the
>>>system to boot.
>>>
>>>What's even worse, the reiserfs_read_super (at least 2.4.18 RH kernel)
>>>code OVERWRITES the superblock error status at mount time, making it
>>>worse than useless, since each mount hides any errors that were detected
>>>before the crash:
>>>
>>> s->u.reiserfs_sb.s_mount_state = SB_REISERFS_STATE(s);
>>> s->u.reiserfs_sb.s_mount_state = REISERFS_VALID_FS ;
>>>
>>>
>>Andreas seems reasonable, Vitaly, what are your thoughts?
>>
>>
>>
>>>>>Next, add journal replay to reiserfsck if it isn't already there,
>>>>>
>>>>>
>>>>Why, when it is in the kernel?
>>>>
>>>>
>>>Because that is the next stage to allowing reiserfsck do checks on the
>>>filesystem after a crash. Do you tell me you would rather (and you
>>>must, because it obviously currently does) have reiserfsck just throw
>>>away everything in the journal, leaving possibly inconsistent data in
>>>the filesystem for it to check? Or maybe make the user mount the
>>>filesystem (which obviously has problems or they wouldn't be running
>>>reiserfsck to do a full check) just to clear out the journal and maybe
>>>risk crashing or corruption if the filesystem is strangely corrupted?
>>>
>>>
>>Vitaly, answer this.
>>
>>
>
>Ok, so probably we should make the following changes. The kernel set IO_ERROR
>and FS_ERROR flags.
>In the case of IO_ERROR reiserfsck prints the message about hardware problems
>and returns error, so the fs does not get mounted at boot. On attempt mounting
>the fs with IO_ERROR flag set it is mounted ro with some message about hardware
>problems. When you are sure that problems disappeared you can mount it with a
>spetial option cleaning this flag and probably reiserfstune will have some
>option cleaning these flags also.
>In the case of FS_ERROR - search_by_key failed or beyond end of device access
>or similar - reiserfsck gets -a option at boot, replays the journal if needed
>and checks for the flag. No flag - returns OK. Else - run fix-fixable. Errors
>left - returns 'errors left uncorrected' and the fs does not get mounted at
>boot. On attempt mounting the fs with the flag just print the message about
>mounting the fs with errors and mount it. Not ro here as kernel will not do
>deep analysis of errors and it could be just a small insignificant error.
>
>
>
Sounds good to me. Do it. Reiser4 also.
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-14 19:19 ` Hans Reiser
2003-02-15 12:51 ` Vitaly Fertman
@ 2003-02-15 22:37 ` Andreas Dilger
2003-02-18 18:21 ` Hans Reiser
2003-02-18 21:17 ` Valdis.Kletnieks
1 sibling, 2 replies; 94+ messages in thread
From: Andreas Dilger @ 2003-02-15 22:37 UTC (permalink / raw)
To: Hans Reiser; +Cc: Oleg Drokin, Zygo Blaxell, reiserfs-list
On Feb 14, 2003 22:19 +0300, Hans Reiser wrote:
> Andreas Dilger wrote:
> > You are well aware
> >that the e2fsck check intervals can be tuned per-filesystem and even
> >disabled if desired (it prints options for how to do this at mke2fs time
> >and is clearly documented for the experienced user). For a boot-once-a-day
> >machine, the default is to check about once a month (at most 6 months for
> >the time check), and if machines are crashing more often, then they should
> >probably be checked more often because _something_ has to be causing crashes.
> >
> The idea that how often you boot determines how often it checks is just
> silly, sorry.
I guess the shortcoming in the ext2 case is that it counts mounts and
not crashes. If it were counting the number of times the filesystem
was uncleanly shut down instead of normal shutdowns, would that be more
acceptable? The reason I'm still interested in crashes, even if they
are not filesystem-related crashes, is because there had to be _something_
which caused a crash (bad code, bad hardware, whatever), and once you have
any driver corrupting memory the chance that it is also corrupting filesystem
memory exists.
> >Having reiserfsck just do read-only checks shouldn't force you to type
> >"yes" (and we mean "yes" because this is so scary, mere mortals shouldn't
> >be doing this). Hans, you've always talked about making things easy for
> >the average user (error messages and such), don't you think that making
> >a data consistency check for the user a little less intimidating too?
>
> I think that you should have to agree that you have time to wait for
> fsck before you get stuck with a 1 day large server fsck.
That is definitely true. However, my assumption would be that if someone
is running a system with terabytes of data they will read the man page
after waiting a day for fsck to complete, or lose their job. It is entirely
possible for administrators to disable the per-mount e2fsck checking, and
the time-based (6 months by default) checking too, and do fsck themselves.
My experience would be that, like backups, people don't do that, so leaving
the 6 month check in protects users from themselves.
The other thing to keep in mind is that you can have different "levels" of
automated fsck at boot time, depending on how long they take. You never
necessarily have to try and fix anything with "fsck -a", just detect errors
and leave it up to the user to decide what to do if you find a problem:
- always recover journal, validate superblock, error flag (< 1s)
Don't know how long it takes these things to run, so it is up to you to
trade off checks vs. speed, and you could even round-robin them (storing
the last checked item in the superblock or something):
- check block allocation bitmaps match superblock counts
- walk directory structure from root, checking for directory corruption
- check btree validity on inodes for up to 10 seconds (or whatever, storing
last checked inode in superblock for restarting this test at next one)
By all means, don't do checks for an hour, or allow users to set the maximum
boot check duration in the superblock. I'm sure users don't mind waiting
5s at boot time if it means they don't lose data.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-15 22:37 ` Andreas Dilger
@ 2003-02-18 18:21 ` Hans Reiser
2003-02-18 19:22 ` Oleg Drokin
2003-02-18 21:17 ` Valdis.Kletnieks
1 sibling, 1 reply; 94+ messages in thread
From: Hans Reiser @ 2003-02-18 18:21 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Oleg Drokin, Zygo Blaxell, reiserfs-list
Andreas Dilger wrote:
>On Feb 14, 2003 22:19 +0300, Hans Reiser wrote:
>
>
>>Andreas Dilger wrote:
>>
>>
>>>You are well aware
>>>that the e2fsck check intervals can be tuned per-filesystem and even
>>>disabled if desired (it prints options for how to do this at mke2fs time
>>>and is clearly documented for the experienced user). For a boot-once-a-day
>>>machine, the default is to check about once a month (at most 6 months for
>>>the time check), and if machines are crashing more often, then they should
>>>probably be checked more often because _something_ has to be causing crashes.
>>>
>>>
>>>
>>The idea that how often you boot determines how often it checks is just
>>silly, sorry.
>>
>>
>
>I guess the shortcoming in the ext2 case is that it counts mounts and
>not crashes. If it were counting the number of times the filesystem
>was uncleanly shut down instead of normal shutdowns, would that be more
>acceptable? The reason I'm still interested in crashes, even if they
>are not filesystem-related crashes, is because there had to be _something_
>which caused a crash (bad code, bad hardware, whatever), and once you have
>any driver corrupting memory the chance that it is also corrupting filesystem
>memory exists.
>
This is at least arguably legitimate;-)....
>
>
>
>>>Having reiserfsck just do read-only checks shouldn't force you to type
>>>"yes" (and we mean "yes" because this is so scary, mere mortals shouldn't
>>>be doing this). Hans, you've always talked about making things easy for
>>>the average user (error messages and such), don't you think that making
>>>a data consistency check for the user a little less intimidating too?
>>>
>>>
>>I think that you should have to agree that you have time to wait for
>>fsck before you get stuck with a 1 day large server fsck.
>>
>>
>
>That is definitely true. However, my assumption would be that if someone
>is running a system with terabytes of data they will read the man page
>after waiting a day for fsck to complete, or lose their job.
>
How much does a terabyte of disk cost? A thousand dollars? How much
does a qualified sysadmin cost? $100-200k in Silicon Valley (but
rapidly reducing).
Yet this is still the wrong attitude.... our job is to make the
software so that it works without hassle. They don't need more items
on their checklists, they need software that manages the checklists for
them.
Also, whether a sysadmin is willing to wait a day for fsck might depend
on the day you ask him.
So I completely reject the argument you make.
> It is entirely
>possible for administrators to disable the per-mount e2fsck checking, and
>the time-based (6 months by default) checking too, and do fsck themselves.
>My experience would be that, like backups, people don't do that, so leaving
>the 6 month check in protects users from themselves.
>
Most users don't know that they can do it, and those that do don't need
us giving them more things they need to set when installing the OS.
>
>The other thing to keep in mind is that you can have different "levels" of
>automated fsck at boot time, depending on how long they take. You never
>necessarily have to try and fix anything with "fsck -a", just detect errors
>and leave it up to the user to decide what to do if you find a problem:
>- always recover journal, validate superblock, error flag (< 1s)
>
>Don't know how long it takes these things to run, so it is up to you to
>trade off checks vs. speed, and you could even round-robin them (storing
>the last checked item in the superblock or something):
>- check block allocation bitmaps match superblock counts
>- walk directory structure from root, checking for directory corruption
>- check btree validity on inodes for up to 10 seconds (or whatever, storing
> last checked inode in superblock for restarting this test at next one)
>
>By all means, don't do checks for an hour, or allow users to set the maximum
>boot check duration in the superblock. I'm sure users don't mind waiting
>5s at boot time if it means they don't lose data.
>
I doubt that there is a lot we can check in 5 seconds on a filesystem
with lots of small files, but I could be wrong.
>
>Cheers, Andreas
>--
>Andreas Dilger
>http://sourceforge.net/projects/ext2resize/
>http://www-mddsp.enel.ucalgary.ca/People/adilger/
>
>
>
>
>
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-18 18:21 ` Hans Reiser
@ 2003-02-18 19:22 ` Oleg Drokin
2003-02-18 19:28 ` Hans Reiser
0 siblings, 1 reply; 94+ messages in thread
From: Oleg Drokin @ 2003-02-18 19:22 UTC (permalink / raw)
To: Hans Reiser; +Cc: Andreas Dilger, Zygo Blaxell, reiserfs-list
Hello!
On Tue, Feb 18, 2003 at 09:21:54PM +0300, Hans Reiser wrote:
> >By all means, don't do checks for an hour, or allow users to set the
> >maximum
> >boot check duration in the superblock. I'm sure users don't mind waiting
> >5s at boot time if it means they don't lose data.
> I doubt that there is a lot we can check in 5 seconds on a filesystem
> with lots of small files, but I could be wrong.
We discussed this with Vitaly today.
We can do some simply things and if some of these show wrong signs,
we can do a warning and sysadmin can schedule a downtime for
fsck.
Simple things are: read the bitmaps and compare used block counts with
what stored in superblock.
Check consistency of tree root (we have a pointer there in superblock,
also it's supposed level).
May be something else that is that simple.
Bye,
Oleg
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-18 19:22 ` Oleg Drokin
@ 2003-02-18 19:28 ` Hans Reiser
0 siblings, 0 replies; 94+ messages in thread
From: Hans Reiser @ 2003-02-18 19:28 UTC (permalink / raw)
To: Oleg Drokin; +Cc: Andreas Dilger, Zygo Blaxell, reiserfs-list
Oleg Drokin wrote:
>Hello!
>
>On Tue, Feb 18, 2003 at 09:21:54PM +0300, Hans Reiser wrote:
>
>
>
>>>By all means, don't do checks for an hour, or allow users to set the
>>>maximum
>>>boot check duration in the superblock. I'm sure users don't mind waiting
>>>5s at boot time if it means they don't lose data.
>>>
>>>
>>I doubt that there is a lot we can check in 5 seconds on a filesystem
>>with lots of small files, but I could be wrong.
>>
>>
>
>We discussed this with Vitaly today.
>We can do some simply things and if some of these show wrong signs,
>we can do a warning and sysadmin can schedule a downtime for
>fsck.
>Simple things are: read the bitmaps and compare used block counts with
>what stored in superblock.
>Check consistency of tree root (we have a pointer there in superblock,
>also it's supposed level).
>May be something else that is that simple.
>
>Bye,
> Oleg
>
>
>
>
Ok.
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-15 22:37 ` Andreas Dilger
2003-02-18 18:21 ` Hans Reiser
@ 2003-02-18 21:17 ` Valdis.Kletnieks
2003-02-18 22:02 ` Matthias Andree
2003-02-18 22:23 ` Hans Reiser
1 sibling, 2 replies; 94+ messages in thread
From: Valdis.Kletnieks @ 2003-02-18 21:17 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Hans Reiser, Oleg Drokin, Zygo Blaxell, reiserfs-list
[-- Attachment #1: Type: text/plain, Size: 961 bytes --]
On Sat, 15 Feb 2003 15:37:10 MST, Andreas Dilger said:
> I guess the shortcoming in the ext2 case is that it counts mounts and
> not crashes. If it were counting the number of times the filesystem
> was uncleanly shut down instead of normal shutdowns, would that be more
> acceptable? The reason I'm still interested in crashes, even if they
> are not filesystem-related crashes, is because there had to be _something_
> which caused a crash (bad code, bad hardware, whatever), and once you have
> any driver corrupting memory the chance that it is also corrupting filesystem
> memory exists.
ext2/3 *intentionally* counts mounts rather than crashes.
It's possible for a filesystem to get non-noticably corrupted without
a crash (remember the Linux 2.4.11 mangle-on-shutdown bug?) - it's stuff
like that (and slowly failing media) that it tries to catch by counting
mounts.
--
Valdis Kletnieks
Computer Systems Senior Engineer
Virginia Tech
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-18 21:17 ` Valdis.Kletnieks
@ 2003-02-18 22:02 ` Matthias Andree
2003-02-19 6:26 ` Oleg Drokin
2003-02-18 22:23 ` Hans Reiser
1 sibling, 1 reply; 94+ messages in thread
From: Matthias Andree @ 2003-02-18 22:02 UTC (permalink / raw)
To: reiserfs-list
Valdis.Kletnieks@vt.edu writes:
> ext2/3 *intentionally* counts mounts rather than crashes.
>
> It's possible for a filesystem to get non-noticably corrupted without
> a crash (remember the Linux 2.4.11 mangle-on-shutdown bug?) - it's stuff
> like that (and slowly failing media) that it tries to catch by counting
> mounts.
Leaving that aside, reordered writes (write cache, enabled by default on
most drives) is one of the other promiment reasons for creeping
doom... the write barrier code hasn't yet been synched into 2.4 main
stream and I doubt it ever will. It might make 2.6 though.
--
Matthias Andree
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-18 22:02 ` Matthias Andree
@ 2003-02-19 6:26 ` Oleg Drokin
0 siblings, 0 replies; 94+ messages in thread
From: Oleg Drokin @ 2003-02-19 6:26 UTC (permalink / raw)
To: Matthias Andree; +Cc: reiserfs-list
Hello!
On Tue, Feb 18, 2003 at 11:02:42PM +0100, Matthias Andree wrote:
> > ext2/3 *intentionally* counts mounts rather than crashes.
> > It's possible for a filesystem to get non-noticably corrupted without
> > a crash (remember the Linux 2.4.11 mangle-on-shutdown bug?) - it's stuff
> > like that (and slowly failing media) that it tries to catch by counting
> > mounts.
> Leaving that aside, reordered writes (write cache, enabled by default on
> most drives) is one of the other promiment reasons for creeping
> doom... the write barrier code hasn't yet been synched into 2.4 main
> stream and I doubt it ever will. It might make 2.6 though.
I saw Jens have submitted his code for review on lkml once again. So
I hope there is still a hope ;)
Bye,
Oleg
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-18 21:17 ` Valdis.Kletnieks
2003-02-18 22:02 ` Matthias Andree
@ 2003-02-18 22:23 ` Hans Reiser
1 sibling, 0 replies; 94+ messages in thread
From: Hans Reiser @ 2003-02-18 22:23 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: Andreas Dilger, Oleg Drokin, Zygo Blaxell, reiserfs-list
Valdis.Kletnieks@vt.edu wrote:
>On Sat, 15 Feb 2003 15:37:10 MST, Andreas Dilger said:
>
>
>
>>I guess the shortcoming in the ext2 case is that it counts mounts and
>>not crashes. If it were counting the number of times the filesystem
>>was uncleanly shut down instead of normal shutdowns, would that be more
>>acceptable? The reason I'm still interested in crashes, even if they
>>are not filesystem-related crashes, is because there had to be _something_
>>which caused a crash (bad code, bad hardware, whatever), and once you have
>>any driver corrupting memory the chance that it is also corrupting filesystem
>>memory exists.
>>
>>
>
>ext2/3 *intentionally* counts mounts rather than crashes.
>
>It's possible for a filesystem to get non-noticably corrupted without
>a crash (remember the Linux 2.4.11 mangle-on-shutdown bug?) - it's stuff
>like that (and slowly failing media) that it tries to catch by counting
>mounts.
>
>
Making users who often reboot to windows run fsck more often because of
it is not very clever, in fact, it is just annoying, and I remember it
well from my old ext2 days.
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-12 18:27 Anders Widman
0 siblings, 0 replies; 94+ messages in thread
From: Anders Widman @ 2003-02-12 18:27 UTC (permalink / raw)
To: reiserfs-list
> On Wed, Feb 12, 2003 at 06:40:04PM +0100, Anders Widman wrote:
>>
>> > Every resource we have is going to go into getting V4 done and stable so
>> > that we can sell it in the summer. Hopefully we will make it.
>>
>> Just a question. (I know lots of people will shout at me for asking,
>> but please don't :) Will V3/4 be ported to Windows, or are we doomed
>> to use the new MS database with integrated Palladium software?
> Have you supplied namesys with funding for a port?
Nope, I do not have the cash for that. I do have cash to buy myself
a licence to use ReiserFS though, if it were sold.
>>
>> Linux is a great OS, but there are tools that I (and probably many
>> other) use every day that I need. One example is Adobe Photoshop,
>> colour management and lots of other things - not to mention people
>> who want to use games ;).
> Does Photoshop no longer run on a Macintosh? Does colour management no longer
> run on a Macintosh? As for games, have you considered a subscription to
> WineX or a game console.
No, I do not use Mac because they are simply to slow :). WineX and
similar is not fast enough, or stable enough to run most modern games.
But it is not all about the games, rather it is about all the software
that do exist in the Windows-world that has not yet been ported.
> I apologize, but I have a habit of hounding Windows users into admitting
> that the main reason they need Windows is because 1) their employer
> requires it (and my response is "The employer can supply the hardware and
> technical support.") or 2) They haven't really looked to see if it can be done
> elsewhere. or 3) a software vendor (like Autodesk) only supports Windows.
And what should we (Windows users) do when software vendors do not
support anything but Windows?
>>
>> As of now I can not completely go over to Linux. Therefore I would
>> pay to use ReiserFS on my Windows machines. Maybe I am the only one
>> who would, but perhaps not.
> Out of curiousity, what do you think that reiserfs would buy you on windows?
> Would reiserfs be more of a benefit than a separate linux box running
> samba or nfsd?
No, Samba and NFS would defeat some of the benefit (speed) of
ReiserFS. Though I do use ReiserFS over Samba for backup/storage of my
data.
- Anders
^ permalink raw reply [flat|nested] 94+ messages in thread
* RE: Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-11 19:43 berthiaume_wayne
2003-02-12 10:48 ` Dirk Schenkewitz
2003-02-12 16:22 ` Sam Vilain
0 siblings, 2 replies; 94+ messages in thread
From: berthiaume_wayne @ 2003-02-11 19:43 UTC (permalink / raw)
To: Dirk.Schenkewitz; +Cc: reiserfs-list
Dirk, I'd be interested in hearing from you your performance
experience with ext3 when it reaches 96% full.
Regards,
Wayne.
-----Original Message-----
From: Dirk Schenkewitz [mailto:Dirk.Schenkewitz@interface-ag.com]
Sent: Tuesday, February 11, 2003 1:59 PM
To: Reiserfs List
Subject: Corrupted/unreadable journal: reiser vs. ext3
Hi Guys,
Recently I read about ReiserFS V4, taking that as a reason to take
a look at ReiserFS again. But I'm not sure if it's worth to switch
from ext3/ext2 to reiser. Because:
More than a year ago, I made up one reiser-partition for playing
around. Well, first there seemed to be nothing special about it.
Then, one day, it suddenly couldn't read its journal anymore,
which prevented the system from booting. (about 2 weeks later I
discovered why: a bad power supply had caused physical damage to
that area of the hard disk) For some reason I don't recall anymore,
I couldn't find a reiserfsck or such. I found no way to get around
the case of a corrupted/unreadable journal.
Luckily, the partition was nearly empty, so I put on an ext3 system
on that partition. That went fine for just a few days, than the bad
disk area (which now held the ext3-journal) decided to strike again.
But guess what happened:
While booting the next time, the ext3 code discovered that the jour-
nal was unreadable (watching that, I thought "oh shit, not again" -
for less than a second), put out a short message stating that and
that it will continue as ext2. No painfull attempts to recover the
journal - it just dropped it and continued, taking only a few seconds
for that.
No data was lost! I sat there for some time, staring at the screen,
hardly believing it.
After that, I removed reiser-support from the kernels I used and
since then I only used ext3. If I lost some data since then, it was
only because I accidentally deleted it - there seems to be no way
to recover anything from ext3 (unlike ext2).
Because I have large amounts of data, reliability and solidness of
a filesystem are the most important things to me, then comes space-
efficiency, then speed. Sometimes some of my filesystems get 100%
full, having only some kilobytes left (of, say, 8Gig) until I clean
up. That's my personal situation & experiences.
Now my questions:
From reading the mails from this list, I suspect that a ReiserFS:
- will sport poor performance (whatever that means, in terms of
absolute speed) if it gets more than 96% full. (*1*)
- will fall far behind ext3 when it comes to reliability, robust-
ness and crash recovery (at least when fsck is involved),
- and will have even more trouble (which may lead to complete fai-
lure) if the journal cannot be accessed.
Is any of this still true?
(*1*): What if the filesystem contains rather large files, like
CD-images, MP3s and such, filling it up completely ? Will
it still slow down?
From what I wrote, you may think that I have some prejudice against
ReiserFS. That's true, I have, because I had a bad experience with
it. Anyway, if you (the developers and/or other people reading here)
can say that nowadays ReiserFS is better than ext3, even under my
personal harsh circumstances, I will give it another try. And now,
feel free to flame me. :-)
happy coding
dirk
--
Dirk Schenkewitz
InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16 fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching
http://www.interface-ag.de mailto:dirk.schenkewitz@interface-ag.de
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 19:43 berthiaume_wayne
@ 2003-02-12 10:48 ` Dirk Schenkewitz
2003-02-12 10:59 ` Hans Reiser
2003-02-12 16:22 ` Sam Vilain
1 sibling, 1 reply; 94+ messages in thread
From: Dirk Schenkewitz @ 2003-02-12 10:48 UTC (permalink / raw)
To: Reiserfs List
Wayne,
berthiaume_wayne@emc.com schrieb:
>
> Dirk, I'd be interested in hearing from you your performance
> experience with ext3 when it reaches 96% full.
Well (*shrug*), there seems to be nothing special about it. I did
not do any timing test when such a fileysytem went full. In fact,
becoming 100% full is not "mormal", it happens when I put stuff
on it just to have it out of the way for some time. The filesystem
is then used as some kind of "storage cellar".
Aside from that - speed becomes noticeable (I believe, at least)
when using 'xv' on a directory with lots of pictures, say, between
2000 and 3000, and the thumbnails are loaded during the first access.
This takes more than a minute (estimated, I did not look at the
clock).
Another thing is when 'xv' creates the thumbnails. A few times it
happened that a filesystem which was rather full ran out of space
when creating the thumbnails. (That's not critical, all you "loose"
are some of the thumbnails, which can be recreated any time later.)
But I don't know how/when 'xv' stores the thumbnails, I only know
that they are kept in memory as long as they are in use. Then linux
itself does some buffering, so only the first access on a directory
can make a testimony. That said, I can can only talk of my subjec-
tive impressions, and I have not noticed any slowdown until there
are 0 bytes left. But it is hard to tell, because the difference
between 96% and 100% are only 320 MB on a 8 GB partition, and that
space fills up rather fast.
While you ask - what are the "amounts" of slowdown if a reiserfs
gets more than 96% full?
- Less than 4% percent? (I might not notice that.)
- between 4% and 8%? (I might notice, but I can live with that
easily. Then again, ext3 doesn't seem to have such problems.)
- more than 8%, maybe much more? (That might become annoying.
In that case I believe that ext3 is better for my purposes.)
You see, I'm not an expert, I'm "just using filesystems". Please
take the mentioned percentages as guesses - depending on the
situation, I might not even notice 10% slowdown...
Hope that answers your question - does it?
Happy coding
dirk
--
Dirk Schenkewitz
InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16 fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching
http://www.interface-ag.de mailto:dirk.schenkewitz@interface-ag.de
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 10:48 ` Dirk Schenkewitz
@ 2003-02-12 10:59 ` Hans Reiser
2003-02-12 11:24 ` Frank Baumgart
2003-02-12 11:54 ` Dirk Schenkewitz
0 siblings, 2 replies; 94+ messages in thread
From: Hans Reiser @ 2003-02-12 10:59 UTC (permalink / raw)
To: Dirk Schenkewitz; +Cc: Reiserfs List
Dirk Schenkewitz wrote:
>Wayne,
>
>berthiaume_wayne@emc.com schrieb:
>
>
>> Dirk, I'd be interested in hearing from you your performance
>>experience with ext3 when it reaches 96% full.
>>
>>
>
>Well (*shrug*), there seems to be nothing special about it. I did
>not do any timing test when such a fileysytem went full. In fact,
>becoming 100% full is not "mormal", it happens when I put stuff
>on it just to have it out of the way for some time. The filesystem
>is then used as some kind of "storage cellar".
>
>Aside from that - speed becomes noticeable (I believe, at least)
>when using 'xv' on a directory with lots of pictures, say, between
>2000 and 3000, and the thumbnails are loaded during the first access.
>This takes more than a minute (estimated, I did not look at the
>clock).
>
>Another thing is when 'xv' creates the thumbnails. A few times it
>happened that a filesystem which was rather full ran out of space
>when creating the thumbnails. (That's not critical, all you "loose"
>are some of the thumbnails, which can be recreated any time later.)
>But I don't know how/when 'xv' stores the thumbnails, I only know
>that they are kept in memory as long as they are in use. Then linux
>itself does some buffering, so only the first access on a directory
>can make a testimony. That said, I can can only talk of my subjec-
>tive impressions, and I have not noticed any slowdown until there
>are 0 bytes left. But it is hard to tell, because the difference
>between 96% and 100% are only 320 MB on a 8 GB partition, and that
>space fills up rather fast.
>
>While you ask - what are the "amounts" of slowdown if a reiserfs
>gets more than 96% full?
> - Less than 4% percent? (I might not notice that.)
> - between 4% and 8%? (I might notice, but I can live with that
> easily. Then again, ext3 doesn't seem to have such problems.)
> - more than 8%, maybe much more? (That might become annoying.
> In that case I believe that ext3 is better for my purposes.)
>
>You see, I'm not an expert, I'm "just using filesystems". Please
>take the mentioned percentages as guesses - depending on the
>situation, I might not even notice 10% slowdown...
>
>Hope that answers your question - does it?
>
>Happy coding
> dirk
>
>
and if you can fit more data onto reiserfs partitions than onto ext3
partitions? Is it a fair comparison to compare at equal percents full?
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 10:59 ` Hans Reiser
@ 2003-02-12 11:24 ` Frank Baumgart
2003-02-12 11:35 ` Stefan Traby
2003-02-12 11:54 ` Dirk Schenkewitz
1 sibling, 1 reply; 94+ messages in thread
From: Frank Baumgart @ 2003-02-12 11:24 UTC (permalink / raw)
To: Hans Reiser; +Cc: Reiserfs List
> and if you can fit more data onto reiserfs partitions than onto ext3
> partitions? Is it a fair comparison to compare at equal percents full?
Beware of the trolls.
Frank
P.S.: reiserfs obviously needs some -o porn mount option :)
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 11:24 ` Frank Baumgart
@ 2003-02-12 11:35 ` Stefan Traby
0 siblings, 0 replies; 94+ messages in thread
From: Stefan Traby @ 2003-02-12 11:35 UTC (permalink / raw)
To: Frank Baumgart; +Cc: Hans Reiser, Reiserfs List
On Wed, Feb 12, 2003 at 12:24:24PM +0100, Frank Baumgart wrote:
> P.S.: reiserfs obviously needs some -o porn mount option :)
this is default since 3.5 on-disk format.
--
ciao -
Stefan
" rms, you are using Linux. Does it hurd? "
Stefan Traby Linux/ia32 office: +49-721-3523165
Mathystr. 18-20 V/8 Linux/alpha cell: +49-163-7030572
76133 Karlsruhe Linux/sparc http://graz03.kwc.at
Germany Linux/arm mailto:stefan@nethype.de
Europe Linux/mips mailto:stefan@hello-penguin.com
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 10:59 ` Hans Reiser
2003-02-12 11:24 ` Frank Baumgart
@ 2003-02-12 11:54 ` Dirk Schenkewitz
2003-02-12 12:42 ` Hans Reiser
1 sibling, 1 reply; 94+ messages in thread
From: Dirk Schenkewitz @ 2003-02-12 11:54 UTC (permalink / raw)
To: Reiserfs List
Hi Hans,
Hans Reiser schrieb:
>
> Dirk Schenkewitz wrote:
>
> >Wayne,
> >
> >berthiaume_wayne@emc.com schrieb:
> >
> >
> >> Dirk, I'd be interested in hearing from you your performance
> >>experience with ext3 when it reaches 96% full.
> >>
> >>
> >
> >Well (*shrug*), there seems to be nothing special about it. I did
> >not do any timing test when such a fileysytem went full. In fact,
> >becoming 100% full is not "mormal", it happens when I put stuff
> >on it just to have it out of the way for some time. The filesystem
> >is then used as some kind of "storage cellar".
> >
> >Aside from that - speed becomes noticeable (I believe, at least)
> >when using 'xv' on a directory with lots of pictures, say, between
> >2000 and 3000, and the thumbnails are loaded during the first access.
> >This takes more than a minute (estimated, I did not look at the
> >clock).
> >
> >Another thing is when 'xv' creates the thumbnails. A few times it
> >happened that a filesystem which was rather full ran out of space
> >when creating the thumbnails. (That's not critical, all you "loose"
> >are some of the thumbnails, which can be recreated any time later.)
> >But I don't know how/when 'xv' stores the thumbnails, I only know
> >that they are kept in memory as long as they are in use. Then linux
> >itself does some buffering, so only the first access on a directory
> >can make a testimony. That said, I can can only talk of my subjec-
> >tive impressions, and I have not noticed any slowdown until there
> >are 0 bytes left. But it is hard to tell, because the difference
> >between 96% and 100% are only 320 MB on a 8 GB partition, and that
> >space fills up rather fast.
> >
> >While you ask - what are the "amounts" of slowdown if a reiserfs
> >gets more than 96% full?
> > - Less than 4% percent? (I might not notice that.)
> > - between 4% and 8%? (I might notice, but I can live with that
> > easily. Then again, ext3 doesn't seem to have such problems.)
> > - more than 8%, maybe much more? (That might become annoying.
> > In that case I believe that ext3 is better for my purposes.)
> >
> >You see, I'm not an expert, I'm "just using filesystems". Please
> >take the mentioned percentages as guesses - depending on the
> >situation, I might not even notice 10% slowdown...
> >
> >Hope that answers your question - does it?
> >
> >Happy coding
> > dirk
> >
> >
> and if you can fit more data onto reiserfs partitions than onto
> ext3 partitions?
That's what I'm looking for!
> Is it a fair comparison to compare at equal percents full?
Considered that way: No. Comparison should be between absolute
bytes I can put on a reiser-filesystem against an ext3-filesystem
while the partition sizes are equal. Hm. Please excuse me if this
is a FAQ: Can you give me a hint where to find such a comparison?
To put it in other words: How much more bytes can I put on a
reiser-fs compared to an ext3-fs when the partition sizes are
equal?
But even then: If I have more space available, I will happily
use it, so even if I can put 500 MB more on it, I will manage
to fill it up - what will happen then?
Thanks for reading & answering (also in advance :-))
dirk
--
Dirk Schenkewitz
InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16 fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching
http://www.interface-ag.de mailto:dirk.schenkewitz@interface-ag.de
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 11:54 ` Dirk Schenkewitz
@ 2003-02-12 12:42 ` Hans Reiser
2003-02-12 13:25 ` Dirk Schenkewitz
0 siblings, 1 reply; 94+ messages in thread
From: Hans Reiser @ 2003-02-12 12:42 UTC (permalink / raw)
To: Dirk Schenkewitz; +Cc: Reiserfs List, Jeff Mahoney, grev
Dirk Schenkewitz wrote:
>Hi Hans,
>
>Hans Reiser schrieb:
>
>
>>Dirk Schenkewitz wrote:
>>
>>
>>
>>>Wayne,
>>>
>>>berthiaume_wayne@emc.com schrieb:
>>>
>>>
>>>
>>>
>>>> Dirk, I'd be interested in hearing from you your performance
>>>>experience with ext3 when it reaches 96% full.
>>>>
>>>>
>>>>
>>>>
>>>Well (*shrug*), there seems to be nothing special about it. I did
>>>not do any timing test when such a fileysytem went full. In fact,
>>>becoming 100% full is not "mormal", it happens when I put stuff
>>>on it just to have it out of the way for some time. The filesystem
>>>is then used as some kind of "storage cellar".
>>>
>>>Aside from that - speed becomes noticeable (I believe, at least)
>>>when using 'xv' on a directory with lots of pictures, say, between
>>>2000 and 3000, and the thumbnails are loaded during the first access.
>>>This takes more than a minute (estimated, I did not look at the
>>>clock).
>>>
>>>Another thing is when 'xv' creates the thumbnails. A few times it
>>>happened that a filesystem which was rather full ran out of space
>>>when creating the thumbnails. (That's not critical, all you "loose"
>>>are some of the thumbnails, which can be recreated any time later.)
>>>But I don't know how/when 'xv' stores the thumbnails, I only know
>>>that they are kept in memory as long as they are in use. Then linux
>>>itself does some buffering, so only the first access on a directory
>>>can make a testimony. That said, I can can only talk of my subjec-
>>>tive impressions, and I have not noticed any slowdown until there
>>>are 0 bytes left. But it is hard to tell, because the difference
>>>between 96% and 100% are only 320 MB on a 8 GB partition, and that
>>>space fills up rather fast.
>>>
>>>While you ask - what are the "amounts" of slowdown if a reiserfs
>>>gets more than 96% full?
>>>- Less than 4% percent? (I might not notice that.)
>>>- between 4% and 8%? (I might notice, but I can live with that
>>> easily. Then again, ext3 doesn't seem to have such problems.)
>>>- more than 8%, maybe much more? (That might become annoying.
>>> In that case I believe that ext3 is better for my purposes.)
>>>
>>>You see, I'm not an expert, I'm "just using filesystems". Please
>>>take the mentioned percentages as guesses - depending on the
>>>situation, I might not even notice 10% slowdown...
>>>
>>>Hope that answers your question - does it?
>>>
>>>Happy coding
>>> dirk
>>>
>>>
>>>
>>>
>>and if you can fit more data onto reiserfs partitions than onto
>>ext3 partitions?
>>
>>
>
>That's what I'm looking for!
>
>
>
>>Is it a fair comparison to compare at equal percents full?
>>
>>
>
>Considered that way: No. Comparison should be between absolute
>bytes I can put on a reiser-filesystem against an ext3-filesystem
>while the partition sizes are equal. Hm. Please excuse me if this
>is a FAQ: Can you give me a hint where to find such a comparison?
>To put it in other words: How much more bytes can I put on a
>reiser-fs compared to an ext3-fs when the partition sizes are
>equal?
>
That depends on your file size distribution, and on whether you use the
tails mount option. Probably ~10% more. Forgive me, it has been a
while since it was measured. Elena, can you copy the kernel source code
onto a partition until it fills up, make it a large partition please,
and compare ext3 to reiserfs V3/V4?
>But even then: If I have more space available, I will happily
>use it, so even if I can put 500 MB more on it, I will manage
>to fill it up - what will happen then?
>
I am not answering this because jeff improved things a lot, and I don't
know how good/bad they are currently....
>
>Thanks for reading & answering (also in advance :-))
> dirk
>
>
>
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 12:42 ` Hans Reiser
@ 2003-02-12 13:25 ` Dirk Schenkewitz
0 siblings, 0 replies; 94+ messages in thread
From: Dirk Schenkewitz @ 2003-02-12 13:25 UTC (permalink / raw)
To: Hans Reiser; +Cc: Reiserfs List
Hi Hans,
I should be more careful about what I optimize out from an email...
Hans Reiser schrieb:
>
>Dirk Schenkewitz wrote:
>
> >Hans Reiser schrieb:
> > >
> > >and if you can fit more data onto reiserfs partitions than onto
> > >ext3 partitions?
> > >
> >
> >That's what I'm looking for!
> >
> > >Is it a fair comparison to compare at equal percents full?
> >
> >Considered that way: No. Comparison should be between absolute
> >bytes I can put on a reiser-filesystem against an ext3-filesystem
> >while the partition sizes are equal. Hm. Please excuse me if this
> >is a FAQ: Can you give me a hint where to find such a comparison?
> >To put it in other words: How much more bytes can I put on a
> >reiser-fs compared to an ext3-fs when the partition sizes are
> >equal?
> >
> That depends on your file size distribution, and on whether you
> use the tails mount option. Probably ~10% more.
~10%! Not bad!!
> Forgive me, it has been a while since it was measured.
> Elena, can you copy the kernel source code onto a partition until
> it fills up, make it a large partition please, and compare ext3
> to reiserfs V3/V4?
Uhh - I didn't want to disturb something, even less to cause extra
work - now I feel guilty...
> >But even then: If I have more space available, I will happily
> >use it, so even if I can put 500 MB more on it, I will manage
> >to fill it up - what will happen then?
>
> I am not answering this because jeff improved things a lot,
> and I don't know how good/bad they are currently....
I see. Thank you VERRRY MUCH - I stay tuned 8-)
Happy coding & best wishes
dirk
--
Dirk Schenkewitz
InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16 fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching
http://www.interface-ag.de mailto:dirk.schenkewitz@interface-ag.de
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 19:43 berthiaume_wayne
2003-02-12 10:48 ` Dirk Schenkewitz
@ 2003-02-12 16:22 ` Sam Vilain
2003-02-12 16:53 ` Anders Widman
2003-02-13 20:08 ` Zygo Blaxell
1 sibling, 2 replies; 94+ messages in thread
From: Sam Vilain @ 2003-02-12 16:22 UTC (permalink / raw)
To: reiserfs-list
On Wed, 12 Feb 2003 08:43, berthiaume_wayne@emc.com wrote:
> Dirk, I'd be interested in hearing from you your performance
> experience with ext3 when it reaches 96% full.
No problem, because you get ENOSPC at 95% or 90%.
Hmm, another feature SysAdmins actually find useful, missing in reiserfs.
Along with quotas (this feature is a lazy case of a quota, really).
On Wed, 12 Feb 2003 18:12, Ross Vandegrift wrote:
> You have to start your software on some kind of foundation. Working
> hardware sounds like a great place to me.
Hmm, you've never heard of redundancy or fault tolerance then.
What part fails the most in running systems ? Disk platters.
CPUs might overheat and RAM might suddenly one day get a sticky bit, but as
you point out there ain't much you can do about it. Except buy a Tandem,
or use ECC memory.
But with disks, you can. Mirroring aside, modern hard disks use S.M.A.R.T.
technology which claims to be able to spot failures before they happen.
Many BIOSes will let you turn this feature on and off. Of course I've
never actually seen it in action :-).
Not only that, but re-attempting a failed read might just work. In that
case, you need to freshen the data (hopefully the disk will re-map the
block once it sees a write), and if that fails, re-map the block. I don't
know if any of the other filesystems do that (I seriously doubt it), but
it's what Norton 4.5 on DOS used to do to `repair' faulty disks :-).
But doing disk repair is entirely irrelevant for a filesystem. What's
important is that you don't get an Oops, a kernel Segfault or worse random
data corruption or file structure mangling, that the calling process gets
EIO instead.
Stopping random corruption from violating your assumptions is extremely
difficult; a software engineer's nightmare :-). However, modern disks are
pretty good at keeping their own CRCs, so you should expect that you can
always get an error code back from the OS if the data didn't come back the
same state you wrote it.
You (the reiserfs team) need to wire up reiserfs on a custom loopback
device, and selectively flick blocks to faulty and see what happens. It's
just a part of stress testing.
And there is no excuse - reiserfsck should do the right thing when it
encounters a filesystem with bad blocks and recover what is possible,
marking the bad blocks as bad. It needs dd_rescue built into its
operation :-).
It must suck having a free project get only slight funding. All of a
sudden a whole load of geeks get very angry and demanding. I wish I could
help, but hey it's more fun to troll.^H^H^H I've got better things to do.
--
Sam Vilain, sam@vilain.net
The reason we start a war is to fight a war, win a war, thereby
causing no more war!
- George W. Bush during the first Presidential debate
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 16:22 ` Sam Vilain
@ 2003-02-12 16:53 ` Anders Widman
2003-02-12 17:19 ` Hans Reiser
2003-02-13 20:08 ` Zygo Blaxell
1 sibling, 1 reply; 94+ messages in thread
From: Anders Widman @ 2003-02-12 16:53 UTC (permalink / raw)
To: reiserfs-list
> On Wed, 12 Feb 2003 08:43, berthiaume_wayne@emc.com wrote:
>> Dirk, I'd be interested in hearing from you your performance
>> experience with ext3 when it reaches 96% full.
> No problem, because you get ENOSPC at 95% or 90%.
> Hmm, another feature SysAdmins actually find useful, missing in reiserfs.
> Along with quotas (this feature is a lazy case of a quota, really).
> On Wed, 12 Feb 2003 18:12, Ross Vandegrift wrote:
>> You have to start your software on some kind of foundation. Working
>> hardware sounds like a great place to me.
> Hmm, you've never heard of redundancy or fault tolerance then.
> What part fails the most in running systems ? Disk platters.
> CPUs might overheat and RAM might suddenly one day get a sticky bit, but as
> you point out there ain't much you can do about it. Except buy a Tandem,
> or use ECC memory.
> But with disks, you can. Mirroring aside, modern hard disks use S.M.A.R.T.
> technology which claims to be able to spot failures before they happen.
> Many BIOSes will let you turn this feature on and off. Of course I've
> never actually seen it in action :-).
To my experience with professionally working on the hardware systems
part I very rarely seen SMART working, and I have handled probably
1000's of bad drives.
SMART sounds good on paper, but in reality it does not work. The
reason is that manufacturers do not want RMAs, and if SMART worked
as it should they would have much higher costs with bad drives (most
users run without knowing about them, or knowing they are covered
with warranty).
> Not only that, but re-attempting a failed read might just work. In that
> case, you need to freshen the data (hopefully the disk will re-map the
> block once it sees a write), and if that fails, re-map the block. I don't
> know if any of the other filesystems do that (I seriously doubt it), but
> it's what Norton 4.5 on DOS used to do to `repair' faulty disks :-).
:) yes, and many other programs too. Scandisk (which I do not see as
a role model ;).
> But doing disk repair is entirely irrelevant for a filesystem. What's
> important is that you don't get an Oops, a kernel Segfault or worse random
> data corruption or file structure mangling, that the calling process gets
> EIO instead.
Agreed, the FS should not repair the drive, though it should keep
working when problems occur.
> Stopping random corruption from violating your assumptions is extremely
> difficult; a software engineer's nightmare :-). However, modern disks are
> pretty good at keeping their own CRCs, so you should expect that you can
> always get an error code back from the OS if the data didn't come back the
> same state you wrote it.
> You (the reiserfs team) need to wire up reiserfs on a custom loopback
> device, and selectively flick blocks to faulty and see what happens. It's
> just a part of stress testing.
> And there is no excuse - reiserfsck should do the right thing when it
> encounters a filesystem with bad blocks and recover what is possible,
> marking the bad blocks as bad. It needs dd_rescue built into its
> operation :-).
Ok, lets not jump on the ReiserFS team. They are doing a wonderful job
with their filesystem. This is a concern for all FS, not only
ReieserFS.
> It must suck having a free project get only slight funding. All of a
> sudden a whole load of geeks get very angry and demanding. I wish I could
> help, but hey it's more fun to troll.^H^H^H I've got better things to do.
You are free to donate your hard earned money to them.
- Anders
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 16:53 ` Anders Widman
@ 2003-02-12 17:19 ` Hans Reiser
2003-02-12 17:40 ` Anders Widman
0 siblings, 1 reply; 94+ messages in thread
From: Hans Reiser @ 2003-02-12 17:19 UTC (permalink / raw)
To: Anders Widman; +Cc: reiserfs-list
We are doing ok financially until summer, during which I need to come up
with more money from somewhere.
There is a lot of fiscal uncertainty in a project like ours. We get
money in big chunks with no knowledge of how long we need to make it
last. There is this nagging worry that I could get unlucky for the
wrong 6 month stretch, and have to lay off everyone. This worry is
especially acute during the midst of a tech bust (sponsors dry up even
for successful projects) which is also during the debugging phase for
reiser4 (will it be debugged by June, or will it take as long as V3, or
longer....).
For V3 we are going to fix any bugs that come in (one known bug remains
that seems to be elusive and will distract our lead programmer from V4
this month), put in Oleg's write patch and chris's patches, fix fsck
when it fails, and that is it. V3 will be our feature frozen FS for
mission critical servers.
Every resource we have is going to go into getting V4 done and stable so
that we can sell it in the summer. Hopefully we will make it. One
worry is that while V4 will be much more stable by design (transactions,
fsck friendly node format with mkfsids and transaction ids (that make it
easier to figure out when two version of a file collide which one to
keep), etc.), V3 will be more stable in implementation for quite some time.
>>It must suck having a free project get only slight funding. All of a
>>sudden a whole load of geeks get very angry and demanding. I wish I could
>>help, but hey it's more fun to troll.^H^H^H I've got better things to do.
>>
>>
>
>You are free to donate your hard earned money to them.
>
> - Anders
>
>
>
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 17:19 ` Hans Reiser
@ 2003-02-12 17:40 ` Anders Widman
2003-02-12 18:15 ` Dirk Mueller
2003-02-12 18:20 ` Chris Dukes
0 siblings, 2 replies; 94+ messages in thread
From: Anders Widman @ 2003-02-12 17:40 UTC (permalink / raw)
To: reiserfs-list
> Every resource we have is going to go into getting V4 done and stable so
> that we can sell it in the summer. Hopefully we will make it.
Just a question. (I know lots of people will shout at me for asking,
but please don't :) Will V3/4 be ported to Windows, or are we doomed
to use the new MS database with integrated Palladium software?
Linux is a great OS, but there are tools that I (and probably many
other) use every day that I need. One example is Adobe Photoshop,
colour management and lots of other things - not to mention people
who want to use games ;).
As of now I can not completely go over to Linux. Therefore I would
pay to use ReiserFS on my Windows machines. Maybe I am the only one
who would, but perhaps not.
- Anders
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 17:40 ` Anders Widman
@ 2003-02-12 18:15 ` Dirk Mueller
2003-02-12 18:20 ` Anders Widman
2003-02-12 18:20 ` Chris Dukes
1 sibling, 1 reply; 94+ messages in thread
From: Dirk Mueller @ 2003-02-12 18:15 UTC (permalink / raw)
To: reiserfs-list
On Mit, 12 Feb 2003, Anders Widman wrote:
> Just a question. (I know lots of people will shout at me for asking,
> but please don't :) Will V3/4 be ported to Windows, or are we doomed
> to use the new MS database with integrated Palladium software?
very unlikely. porting a filesystem is about the same work as writing it
from scratch.
Dirk
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 18:15 ` Dirk Mueller
@ 2003-02-12 18:20 ` Anders Widman
0 siblings, 0 replies; 94+ messages in thread
From: Anders Widman @ 2003-02-12 18:20 UTC (permalink / raw)
To: reiserfs-list
> On Mit, 12 Feb 2003, Anders Widman wrote:
>> Just a question. (I know lots of people will shout at me for asking,
>> but please don't :) Will V3/4 be ported to Windows, or are we doomed
>> to use the new MS database with integrated Palladium software?
> very unlikely. porting a filesystem is about the same work as writing it
> from scratch.
Depends what is the most difficult part; to develop a good system
and algorithms, or to write the code. :)
Anyway, I see your point and I know my request was far fetched. It
is more likely that Adobe port their programs to Linux than the
other way around.
- Anders
> Dirk
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 17:40 ` Anders Widman
2003-02-12 18:15 ` Dirk Mueller
@ 2003-02-12 18:20 ` Chris Dukes
1 sibling, 0 replies; 94+ messages in thread
From: Chris Dukes @ 2003-02-12 18:20 UTC (permalink / raw)
To: Anders Widman; +Cc: reiserfs-list
On Wed, Feb 12, 2003 at 06:40:04PM +0100, Anders Widman wrote:
>
> > Every resource we have is going to go into getting V4 done and stable so
> > that we can sell it in the summer. Hopefully we will make it.
>
> Just a question. (I know lots of people will shout at me for asking,
> but please don't :) Will V3/4 be ported to Windows, or are we doomed
> to use the new MS database with integrated Palladium software?
Have you supplied namesys with funding for a port?
>
> Linux is a great OS, but there are tools that I (and probably many
> other) use every day that I need. One example is Adobe Photoshop,
> colour management and lots of other things - not to mention people
> who want to use games ;).
Does Photoshop no longer run on a Macintosh? Does colour management no longer
run on a Macintosh? As for games, have you considered a subscription to
WineX or a game console.
I apologize, but I have a habit of hounding Windows users into admitting
that the main reason they need Windows is because 1) their employer
requires it (and my response is "The employer can supply the hardware and
technical support.") or 2) They haven't really looked to see if it can be done
elsewhere. or 3) a software vendor (like Autodesk) only supports Windows.
>
> As of now I can not completely go over to Linux. Therefore I would
> pay to use ReiserFS on my Windows machines. Maybe I am the only one
> who would, but perhaps not.
Out of curiousity, what do you think that reiserfs would buy you on windows?
Would reiserfs be more of a benefit than a separate linux box running
samba or nfsd?
--
Chris Dukes
I tried being reasonable once--I didn't like it.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 16:22 ` Sam Vilain
2003-02-12 16:53 ` Anders Widman
@ 2003-02-13 20:08 ` Zygo Blaxell
1 sibling, 0 replies; 94+ messages in thread
From: Zygo Blaxell @ 2003-02-13 20:08 UTC (permalink / raw)
To: reiserfs-list
In article <200302130522.35829.sam@vilain.net>,
Sam Vilain <sam@vilain.net> wrote:
>But with disks, you can. Mirroring aside, modern hard disks use S.M.A.R.>T.
>technology which claims to be able to spot failures before they happen.
>Many BIOSes will let you turn this feature on and off. Of course I've
>never actually seen it in action :-).
I have seen SMART work. At 11:20:30 I had a disk fail, then smartd put this
in my logs:
Nov 6 11:20:30 chlorine smartd: Device: /dev/hdb, Failed attribute: 3
Oh, wait, you said "before"...no, I've never actually seen that in
action either.
SMART does give you statistics on ECC recovery rates, temperature, number of
remapped sectors, etc. which can give you a hint, if you keep track of them
over time, when your disk is beginning to have more problems than it did have
when it was newer. Maybe about 50% of failures can be predicted this way
(but you have no idea _when_ the failure will occur--this afternoon or next
summer?) it's little better than the MTBF rating. The other 50% of
failures are predicted only after the fact. :-P
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 94+ messages in thread
* Corrupted/unreadable journal: reiser vs. ext3
@ 2003-02-11 18:59 Dirk Schenkewitz
2003-02-11 20:27 ` Hans Reiser
0 siblings, 1 reply; 94+ messages in thread
From: Dirk Schenkewitz @ 2003-02-11 18:59 UTC (permalink / raw)
To: Reiserfs List
Hi Guys,
Recently I read about ReiserFS V4, taking that as a reason to take
a look at ReiserFS again. But I'm not sure if it's worth to switch
from ext3/ext2 to reiser. Because:
More than a year ago, I made up one reiser-partition for playing
around. Well, first there seemed to be nothing special about it.
Then, one day, it suddenly couldn't read its journal anymore,
which prevented the system from booting. (about 2 weeks later I
discovered why: a bad power supply had caused physical damage to
that area of the hard disk) For some reason I don't recall anymore,
I couldn't find a reiserfsck or such. I found no way to get around
the case of a corrupted/unreadable journal.
Luckily, the partition was nearly empty, so I put on an ext3 system
on that partition. That went fine for just a few days, than the bad
disk area (which now held the ext3-journal) decided to strike again.
But guess what happened:
While booting the next time, the ext3 code discovered that the jour-
nal was unreadable (watching that, I thought "oh shit, not again" -
for less than a second), put out a short message stating that and
that it will continue as ext2. No painfull attempts to recover the
journal - it just dropped it and continued, taking only a few seconds
for that.
No data was lost! I sat there for some time, staring at the screen,
hardly believing it.
After that, I removed reiser-support from the kernels I used and
since then I only used ext3. If I lost some data since then, it was
only because I accidentally deleted it - there seems to be no way
to recover anything from ext3 (unlike ext2).
Because I have large amounts of data, reliability and solidness of
a filesystem are the most important things to me, then comes space-
efficiency, then speed. Sometimes some of my filesystems get 100%
full, having only some kilobytes left (of, say, 8Gig) until I clean
up. That's my personal situation & experiences.
Now my questions:
From reading the mails from this list, I suspect that a ReiserFS:
- will sport poor performance (whatever that means, in terms of
absolute speed) if it gets more than 96% full. (*1*)
- will fall far behind ext3 when it comes to reliability, robust-
ness and crash recovery (at least when fsck is involved),
- and will have even more trouble (which may lead to complete fai-
lure) if the journal cannot be accessed.
Is any of this still true?
(*1*): What if the filesystem contains rather large files, like
CD-images, MP3s and such, filling it up completely ? Will
it still slow down?
From what I wrote, you may think that I have some prejudice against
ReiserFS. That's true, I have, because I had a bad experience with
it. Anyway, if you (the developers and/or other people reading here)
can say that nowadays ReiserFS is better than ext3, even under my
personal harsh circumstances, I will give it another try. And now,
feel free to flame me. :-)
happy coding
dirk
--
Dirk Schenkewitz
InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16 fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching
http://www.interface-ag.de mailto:dirk.schenkewitz@interface-ag.de
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 18:59 Dirk Schenkewitz
@ 2003-02-11 20:27 ` Hans Reiser
2003-02-11 21:30 ` Mike Hodson
0 siblings, 1 reply; 94+ messages in thread
From: Hans Reiser @ 2003-02-11 20:27 UTC (permalink / raw)
To: Dirk Schenkewitz; +Cc: Reiserfs List, Vitaly Fertman
I have to tell you that if you aren't willing to run fsck for reiserfs
in response to disk corruption, but you are willing to switch to a
filesystem (ext2) that runs fsck at every boot, I don't have a lot of
sympathy. Vitaly can comment more.
Hans
Dirk Schenkewitz wrote:
>Hi Guys,
>
>Recently I read about ReiserFS V4, taking that as a reason to take
>a look at ReiserFS again. But I'm not sure if it's worth to switch
>from ext3/ext2 to reiser. Because:
>
>More than a year ago, I made up one reiser-partition for playing
>around. Well, first there seemed to be nothing special about it.
>Then, one day, it suddenly couldn't read its journal anymore,
>which prevented the system from booting. (about 2 weeks later I
>discovered why: a bad power supply had caused physical damage to
>that area of the hard disk) For some reason I don't recall anymore,
>I couldn't find a reiserfsck or such. I found no way to get around
>the case of a corrupted/unreadable journal.
>
>Luckily, the partition was nearly empty, so I put on an ext3 system
>on that partition. That went fine for just a few days, than the bad
>disk area (which now held the ext3-journal) decided to strike again.
>But guess what happened:
>While booting the next time, the ext3 code discovered that the jour-
>nal was unreadable (watching that, I thought "oh shit, not again" -
>for less than a second), put out a short message stating that and
>that it will continue as ext2. No painfull attempts to recover the
>journal - it just dropped it and continued, taking only a few seconds
>for that.
>No data was lost! I sat there for some time, staring at the screen,
>hardly believing it.
>
>After that, I removed reiser-support from the kernels I used and
>since then I only used ext3. If I lost some data since then, it was
>only because I accidentally deleted it - there seems to be no way
>to recover anything from ext3 (unlike ext2).
>
>Because I have large amounts of data, reliability and solidness of
>a filesystem are the most important things to me, then comes space-
>efficiency, then speed. Sometimes some of my filesystems get 100%
>full, having only some kilobytes left (of, say, 8Gig) until I clean
>up. That's my personal situation & experiences.
>
>Now my questions:
>>From reading the mails from this list, I suspect that a ReiserFS:
> - will sport poor performance (whatever that means, in terms of
> absolute speed) if it gets more than 96% full. (*1*)
> - will fall far behind ext3 when it comes to reliability, robust-
> ness and crash recovery (at least when fsck is involved),
> - and will have even more trouble (which may lead to complete fai-
> lure) if the journal cannot be accessed.
>Is any of this still true?
>
>(*1*): What if the filesystem contains rather large files, like
> CD-images, MP3s and such, filling it up completely ? Will
> it still slow down?
>
>>From what I wrote, you may think that I have some prejudice against
>ReiserFS. That's true, I have, because I had a bad experience with
>it. Anyway, if you (the developers and/or other people reading here)
>can say that nowadays ReiserFS is better than ext3, even under my
>personal harsh circumstances, I will give it another try. And now,
>feel free to flame me. :-)
>
>happy coding
> dirk
>
>
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 20:27 ` Hans Reiser
@ 2003-02-11 21:30 ` Mike Hodson
2003-02-11 21:47 ` Hans Reiser
` (2 more replies)
0 siblings, 3 replies; 94+ messages in thread
From: Mike Hodson @ 2003-02-11 21:30 UTC (permalink / raw)
To: Hans Reiser; +Cc: reiserfs-list
> I have to tell you that if you aren't willing to run fsck for reiserfs
> in response to disk corruption, but you are willing to switch to a
> filesystem (ext2) that runs fsck at every boot, I don't have a lot of
> sympathy. Vitaly can comment more.
>
> Hans
I've used ReiserFS in the past, but have also used ext3 on my user's important
data (/home) after a good chunk of one drive was converted to
sparse/null files due to a screwup stemming from no 'badblocks' support
in reiserfs. Since then, i've used ext3 as well as Reiser but recently
dropped reiser completely. The main reason I hadn't converted to ext3
was a lack of a free harddrive 20+gb in size to copy all my files to.
Unfortunately the filesystem was turning up errors all over the place,
including random sparse files growing out of nowhere, (even my mail
queue) and files that the filesystem could not access (even ls returned
'Permission Denied'). In the time since my first filesystem was nulled
due to a reiserfsck, I hadn't fsck'd the main drive as I feared bad
blocks may be to blame for the filesystem inconsistency. After 4 failed
attempts (kernel panicks) to copy files between old and new drives, I
finally decided to run an fsck. After fscking with --rebuild-tree, it
found many errors, and corrupt directory entries, and chucked about 150
files into /lost+found/. Most were from the mail server, owned by
vpopmail.vchkpw, though others were from different websites I run and
some were even from /dev/. But, none of the existing files were made
sparse and I was able to completely copy the remaining files.
After this I completely re-checked the drive with Maxtor's disk tools
disk, and it showed that the drive was 'certified error-free'. After
seeing this, I have come to respect reiserfs even less than after my
/home/ drive got converted to nothing but nulls.
In my years of running ext2 and ext3, I can't see any reason why you
would think they require fscks at every reboot. In the time that I've
ran both ext2/ext3 and reiser, the only times ive had to fsck ext2 was
after unclean unmounting. I've never had to run it on ext3. As for
reiser, I've had to run it a few times, but each time it either
destroyed data or fixed a very large number of errors that a journalled
filesystem should not have.
I guess my point is, If reiserfs can't keep consistency without
requiring periodic fscks, you have no argument making a statement that
says ext2 is worse as it requires them every boot. Even if your
statement was not false (I know for a fact that ext2 does -not- perform
fscks on every boot, it only will do it after an unclean unmount), you
still would have no basis to say that ext2 is worse due to the
requirement of periodic fscks.
In conclusion, I don't think i'll be using your filesystem even as a
testbed for new features, due to poor reliability under everyday
circumstances. In addition, check your facts before making a statement
that insults someone and claims something that is completely untrue.
--
Mike Hodson <mike@mystica.cx> ICQ: 18145059
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 21:30 ` Mike Hodson
@ 2003-02-11 21:47 ` Hans Reiser
2003-02-11 21:58 ` Hans Reiser
2003-02-11 23:11 ` Adam Goryachev
2 siblings, 0 replies; 94+ messages in thread
From: Hans Reiser @ 2003-02-11 21:47 UTC (permalink / raw)
To: Mike Hodson; +Cc: reiserfs-list
Mike Hodson wrote:
>
>In my years of running ext2 and ext3, I can't see any reason why you
>would think they require fscks at every reboot.
>
Sorry, my experience with kernel hacking (I used ext2 for a long time
while debugging reiserfs) led me to forget that it is possible to
reboot for reasons other than that the kernel oopsed on my latest
changes to it.;-)
Now that we can user reiser3 while debugging reiser4, oopses are much
less painful.
I regret that you had a bad experience with reiserfs. I have never seen
reiserfs convert a partition to nothing but nulls. Are you sure that
you were not repartitioning at the time you had this experience?
I don't mean to slam ext2, sorry if it sounded like that, I just don't
think that being unwilling to run fsck after bad blocks occur is a
reasonable complaint about our design.
Unfortunately, fsck programs take a long time to mature, and ext2's fsck
is more mature than ours. It is important that users use the latest fsck.
With reiser4 we have built into the node format a number of features
that will make reiser4 fsck more effective than reiser3.
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 21:30 ` Mike Hodson
2003-02-11 21:47 ` Hans Reiser
@ 2003-02-11 21:58 ` Hans Reiser
2003-02-12 6:35 ` Oleg Drokin
2003-02-11 23:11 ` Adam Goryachev
2 siblings, 1 reply; 94+ messages in thread
From: Hans Reiser @ 2003-02-11 21:58 UTC (permalink / raw)
To: Mike Hodson; +Cc: reiserfs-list, vs, Oleg Drokin
Mike Hodson wrote:
>After this I completely re-checked the drive with Maxtor's disk tools
>disk, and it showed that the drive was 'certified error-free'.
>
This does not mean that there were no bad blocks that were remapped,
does it?
If you have data corruption that we can analyze, please contact us.
We believe that our current release is very stable. We have one known
bug relating to unlink we are still working on, fsck still gets bug
reports, all the linux journaling filesystems have trouble with write
caching being turned on (this is being fixed in the latest 2.5 kernel,
and some have argued over whether it is a bug or a lack of a feature)
because they don't know how to flush disk caches on commit, and other
than that we simply aren't getting bug reports for V3 in 2.4 (oleg's
write performance improvements have gotten some bug reports, but those
aren't in the stable kernel yet).
Oleg and Vladimir, what is the status of the unlink bug? I'd prefer to
say that we have no bugs at all....
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 21:58 ` Hans Reiser
@ 2003-02-12 6:35 ` Oleg Drokin
0 siblings, 0 replies; 94+ messages in thread
From: Oleg Drokin @ 2003-02-12 6:35 UTC (permalink / raw)
To: Hans Reiser; +Cc: Mike Hodson, reiserfs-list, vs
Hello!
On Wed, Feb 12, 2003 at 12:58:53AM +0300, Hans Reiser wrote:
> Oleg and Vladimir, what is the status of the unlink bug? I'd prefer to
Vladimir is investigating it. He does not know the reason of it yet.
BTW, while the bug is the bug and should of course be fixed, that particular one
is not causing any data corruption or similar stuff. It manifests itself in
directory entries pointing to nowhere (annoying once happens) under
certain workloads (I believe it starts to happen in low memory situations).
> say that we have no bugs at all....
Sure, we all would love to be able to say this.
Bye,
Oleg
^ permalink raw reply [flat|nested] 94+ messages in thread
* RE: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 21:30 ` Mike Hodson
2003-02-11 21:47 ` Hans Reiser
2003-02-11 21:58 ` Hans Reiser
@ 2003-02-11 23:11 ` Adam Goryachev
2003-02-11 23:17 ` Anders Widman
2003-02-12 1:02 ` Mike Hodson
2 siblings, 2 replies; 94+ messages in thread
From: Adam Goryachev @ 2003-02-11 23:11 UTC (permalink / raw)
To: reiserfs-list
> I've used ReiserFS in the past, but have also used ext3 on my
> user's important
> data (/home) after a good chunk of one drive was converted to
> sparse/null files due to a screwup stemming from no 'badblocks' support
> in reiserfs. Since then, i've used ext3 as well as Reiser but recently
I can't comment on your experience, but personally if I have a drive with
any number of badblocks (which are showing up to the fs layer, not invisibly
re-mapped by the drive) then I take the drive back and get a replacement, or
bin the drive.
> dropped reiser completely. The main reason I hadn't converted to ext3
> was a lack of a free harddrive 20+gb in size to copy all my files to.
I guess that is also a hardware issue, I had the same issue in trying to
convert from ext2 to reiser on a 80GB RAID5 partition...
> Unfortunately the filesystem was turning up errors all over the place,
> including random sparse files growing out of nowhere, (even my mail
> queue) and files that the filesystem could not access (even ls returned
> 'Permission Denied'). In the time since my first filesystem was nulled
I suppose that if you continue to have a growing number of FS errors, then
you either have faulty hardware, or are using a buggy version of
software.... If you already admit to badblocks, then I would blame
hardware..
> due to a reiserfsck, I hadn't fsck'd the main drive as I feared bad
> blocks may be to blame for the filesystem inconsistency. After 4 failed
> attempts (kernel panicks) to copy files between old and new drives, I
> finally decided to run an fsck. After fscking with --rebuild-tree, it
> found many errors, and corrupt directory entries, and chucked about 150
> files into /lost+found/. Most were from the mail server, owned by
> vpopmail.vchkpw, though others were from different websites I run and
> some were even from /dev/. But, none of the existing files were made
> sparse and I was able to completely copy the remaining files.
Hmmm, so apart from finding a number of errors and doing it's best to fix
them (putting them into lost+found) and recovering all of your other files
even with hardware issues present, you recovered all of your data? The
problem here is?
> After this I completely re-checked the drive with Maxtor's disk tools
> disk, and it showed that the drive was 'certified error-free'. After
> seeing this, I have come to respect reiserfs even less than after my
> /home/ drive got converted to nothing but nulls.
If a harddrive is showing badblocks and then the disk vendors tool shows no
errors, I think a simple dd over the whole disk or similar would really show
the true story....
> In my years of running ext2 and ext3, I can't see any reason why you
> would think they require fscks at every reboot. In the time that I've
> ran both ext2/ext3 and reiser, the only times ive had to fsck ext2 was
> after unclean unmounting. I've never had to run it on ext3. As for
Well, I had my share of ext2 doing an fsck after reboot, and it wasn't nice
on a 80GB partition... Sure, usually this is after a crash, which generally
is the *worst* time to have to run a fsck (ie, this just drags out the
unscheduled downtime).
> reiser, I've had to run it a few times, but each time it either
> destroyed data or fixed a very large number of errors that a journalled
> filesystem should not have.
I suppose your real issue is that you used faulty hardware. I wouldn't
expect any FS (journalling or otherwise) to be able to work faultlessly with
faulty hardware. Of course, depending on the faulty hardware, it would
probably affect different FS'es differently (depdning on where the faults
are on the disk and what data that FS tries to store on it).
> I guess my point is, If reiserfs can't keep consistency without
> requiring periodic fscks, you have no argument making a statement that
> says ext2 is worse as it requires them every boot. Even if your
> statement was not false (I know for a fact that ext2 does -not- perform
> fscks on every boot, it only will do it after an unclean unmount), you
> still would have no basis to say that ext2 is worse due to the
> requirement of periodic fscks.
ext2 requires periodic fsck's which are also rather dis-concerting when you
are never quite sure whether you need to allow a few hours downtime for a
kernel upgrade this time, or perhaps it is only a few minutes.
> In conclusion, I don't think i'll be using your filesystem even as a
> testbed for new features, due to poor reliability under everyday
> circumstances. In addition, check your facts before making a statement
> that insults someone and claims something that is completely untrue.
I wish you the best of luck with your chosen software and hardware, of
course, my experience is different, and the only place I use ext3 is when I
can't convert it, or I don't really care (ie, it's just the / fs or
something, not data/server that I care about). I haven't had any issues with
reiserfs for a long time...
Finally, I find it interesting to see people who swear they will never use
reiserfs again to be on the reiserfs mailing list :)
Regards,
Adam
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 23:11 ` Adam Goryachev
@ 2003-02-11 23:17 ` Anders Widman
2003-02-12 0:12 ` Hans Reiser
` (4 more replies)
2003-02-12 1:02 ` Mike Hodson
1 sibling, 5 replies; 94+ messages in thread
From: Anders Widman @ 2003-02-11 23:17 UTC (permalink / raw)
To: reiserfs-list
>> I've used ReiserFS in the past, but have also used ext3 on my
>> user's important
>> data (/home) after a good chunk of one drive was converted to
>> sparse/null files due to a screwup stemming from no 'badblocks' support
>> in reiserfs. Since then, i've used ext3 as well as Reiser but recently
> I can't comment on your experience, but personally if I have a drive with
> any number of badblocks (which are showing up to the fs layer, not invisibly
> re-mapped by the drive) then I take the drive back and get a replacement, or
> bin the drive.
However, the FS SHOULD support handling of bad blocks/clusters at the
FS layer, even while running in a production system. Bad blocks can
pop up at any give time for no particular reason, and it is at these
times you (we) need a strong and reliable filesystem that can handle
and logically remap broken blocks/sectors.
Sure, a disk with physical errors should be replaced, but until you
find out about the error on the drive the FS HAS TO HANDLE these kinds
of problems.
- Anders
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 23:17 ` Anders Widman
@ 2003-02-12 0:12 ` Hans Reiser
2003-02-12 10:23 ` Anders Widman
2003-02-12 5:12 ` Ross Vandegrift
` (3 subsequent siblings)
4 siblings, 1 reply; 94+ messages in thread
From: Hans Reiser @ 2003-02-12 0:12 UTC (permalink / raw)
To: Anders Widman; +Cc: reiserfs-list
Anders Widman wrote:
>>>I've used ReiserFS in the past, but have also used ext3 on my
>>>user's important
>>>data (/home) after a good chunk of one drive was converted to
>>>sparse/null files due to a screwup stemming from no 'badblocks' support
>>>in reiserfs. Since then, i've used ext3 as well as Reiser but recently
>>>
>>>
>
>
>
>>I can't comment on your experience, but personally if I have a drive with
>>any number of badblocks (which are showing up to the fs layer, not invisibly
>>re-mapped by the drive) then I take the drive back and get a replacement, or
>>bin the drive.
>>
>>
>
>However, the FS SHOULD support handling of bad blocks/clusters at the
>FS layer, even while running in a production system. Bad blocks can
>pop up at any give time for no particular reason, and it is at these
>times you (we) need a strong and reliable filesystem that can handle
>and logically remap broken blocks/sectors.
>
>Sure, a disk with physical errors should be replaced, but until you
>find out about the error on the drive the FS HAS TO HANDLE these kinds
>of problems.
>
> - Anders
>
>
>
>
>
>
>
We have gotten better at this over time. There was a point in time when
some of our guys reviewed all the bad block handling. We still find
cases where we could be better though.
For some users it would be better to boot to a corrupted filesystem
because running fsck is more of a problem than putting their data at
higher risk. For datalogging, it is probably conceivable to just toss
the journal and lose the more recent updates to it. For the default
metadata journaling, this just does not seem prudent.
I really prefer making users understand that they have a problem they
need to do something about. This is just my style. I want them to fail
to boot, and after some effort learn that there is this thing called
fsck, and dd_rescue, and that it is time to buy another hard drive and
chuck their current one. It would be best though if they were given
detailed instructions about how they need to do this when the code hits
that bad block. Vitaly, please work on that.
If we handle the journal block error without downtime, the user will
never chuck the hard drive, and that is bad in the longterm.
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 0:12 ` Hans Reiser
@ 2003-02-12 10:23 ` Anders Widman
2003-02-12 10:47 ` Hans Reiser
0 siblings, 1 reply; 94+ messages in thread
From: Anders Widman @ 2003-02-12 10:23 UTC (permalink / raw)
To: reiserfs-list
>>
>>However, the FS SHOULD support handling of bad blocks/clusters at the
>>FS layer, even while running in a production system. Bad blocks can
>>pop up at any give time for no particular reason, and it is at these
>>times you (we) need a strong and reliable filesystem that can handle
>>and logically remap broken blocks/sectors.
>>
>>Sure, a disk with physical errors should be replaced, but until you
>>find out about the error on the drive the FS HAS TO HANDLE these kinds
>>of problems.
>>
>> - Anders
>>
> We have gotten better at this over time. There was a point in time when
> some of our guys reviewed all the bad block handling. We still find
> cases where we could be better though.
I never intended to blame ReiserFS, just mate the point in general
for any FS :)
> For some users it would be better to boot to a corrupted filesystem
> because running fsck is more of a problem than putting their data at
> higher risk. For datalogging, it is probably conceivable to just toss
> the journal and lose the more recent updates to it. For the default
> metadata journaling, this just does not seem prudent.
> I really prefer making users understand that they have a problem they
> need to do something about. This is just my style. I want them to fail
> to boot, and after some effort learn that there is this thing called
> fsck, and dd_rescue, and that it is time to buy another hard drive and
> chuck their current one. It would be best though if they were given
> detailed instructions about how they need to do this when the code hits
> that bad block. Vitaly, please work on that.
> If we handle the journal block error without downtime, the user will
> never chuck the hard drive, and that is bad in the longterm.
But a user never knows he has a media error before his system
crashes (or do a surface scan), or monitor his logs very closely.
Not all users does this.
To me a FS should be able to handle both read and write errors and
be able to reallocate these errors to a sane are of the media. When
this occurs then it should be noted in the kernel log (or similar).
Then users can run a cron job to monitor the log after exactly this
error message.
The whole point is this that errors can occur at any time when a
system is up and running and it _always_ takes some time for the
user to react and find out about the problem. In the time between
the error has occurred and the when the user finds out and can
administer it, then we need a solid and secure FS that can manage to
run the system and protect the data (which is why we choose one FS
over the other).
To my knowledge only Windows with NTFS can handle just this -
relocating bad blocks on the fly and notifying the user such has
happened.
- Anders
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 10:23 ` Anders Widman
@ 2003-02-12 10:47 ` Hans Reiser
2003-02-12 11:12 ` Adam Goryachev
0 siblings, 1 reply; 94+ messages in thread
From: Hans Reiser @ 2003-02-12 10:47 UTC (permalink / raw)
To: Anders Widman; +Cc: reiserfs-list
Anders Widman wrote:
>
>
>
>>If we handle the journal block error without downtime, the user will
>>never chuck the hard drive, and that is bad in the longterm.
>>
>>
>
> But a user never knows he has a media error before his system
> crashes (or do a surface scan), or monitor his logs very closely.
> Not all users does this.
>
> To me a FS should be able to handle both read and write errors and
> be able to reallocate these errors to a sane are of the media. When
> this occurs then it should be noted in the kernel log (or similar).
>
> Then users can run a cron job to monitor the log after exactly this
> error message.
>
> The whole point is this that errors can occur at any time when a
> system is up and running and it _always_ takes some time for the
> user to react and find out about the problem. In the time between
> the error has occurred and the when the user finds out and can
> administer it, then we need a solid and secure FS that can manage to
> run the system and protect the data (which is why we choose one FS
> over the other).
>
> To my knowledge only Windows with NTFS can handle just this -
> relocating bad blocks on the fly and notifying the user such has
> happened.
>
> - Anders
>
>
>
>
>
>
>
>
Yes, you are probably right, we should do it for those cases where it is
feasible.
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread
* RE: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 10:47 ` Hans Reiser
@ 2003-02-12 11:12 ` Adam Goryachev
2003-02-12 13:42 ` Anders Widman
2003-02-12 16:39 ` Sam Vilain
0 siblings, 2 replies; 94+ messages in thread
From: Adam Goryachev @ 2003-02-12 11:12 UTC (permalink / raw)
To: reiserfs-list
> Anders Widman wrote:
> >>If we handle the journal block error without downtime, the user will
> >>never chuck the hard drive, and that is bad in the longterm.
> > But a user never knows he has a media error before his system
> > crashes (or do a surface scan), or monitor his logs very closely.
> > Not all users does this.
> >
> > To me a FS should be able to handle both read and write errors and
> > be able to reallocate these errors to a sane are of the media. When
> > this occurs then it should be noted in the kernel log (or similar).
> >
> > Then users can run a cron job to monitor the log after exactly this
> > error message.
[SNIP]
While this is all perfectly true, there also remains the question of "If we
*know* the media is faulty in this spot, how do we know the media isn't
faulty in these spots?"
ie, once you are on faulty hardware, you never really know what is the
correct course of action for this specific fault.
I can conceive of a few things that *might* be the right thing in various
circumstances:
A) Immediately re-mount the drive read-only, and wait for the sysadmin to
either re-mount rw or to do some other data recovery/repair
B) Immediately dis-mount the drive and wait
C) OK, I tried to write to sector 1324 so lets just try each consecutive
available sector until it doesn't return an error (possibly marking the
sectors bad/used as we go)
D) Just return an error to the application
Of course, all of the above would *also* log an error message to the kernel
log.
Now, some options might be suitable if for example the head of the disk has
crashed into the surface and hence the more you attempt to
read/write/seek/etc the more damage you do (as opposed to immediately
stopping all access to the disk and thereby preserving the data).
Some options are more suitable for example you have a system which
absolutely needs to run 20 hours/7days and it is only writing to disk as a
informational log.
Of course, it all depends on the reason the hardware error has showed it's
ugly face, and what your individual circumstances are. Perhaps re-mapping is
just making the corruption worse (for every write the drive gets more
confused and scribbles in the wrong spot)...
Like most things MS do, they just take the common approach and pretend that
it will work for everyone.
Just my 0.02c worth.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 11:12 ` Adam Goryachev
@ 2003-02-12 13:42 ` Anders Widman
2003-02-12 14:15 ` Russell Coker
2003-02-12 16:39 ` Sam Vilain
1 sibling, 1 reply; 94+ messages in thread
From: Anders Widman @ 2003-02-12 13:42 UTC (permalink / raw)
To: reiserfs-list
>> Anders Widman wrote:
>> >>If we handle the journal block error without downtime, the user will
>> >>never chuck the hard drive, and that is bad in the longterm.
>> > But a user never knows he has a media error before his system
>> > crashes (or do a surface scan), or monitor his logs very closely.
>> > Not all users does this.
>> >
>> > To me a FS should be able to handle both read and write errors and
>> > be able to reallocate these errors to a sane are of the media. When
>> > this occurs then it should be noted in the kernel log (or similar).
>> >
>> > Then users can run a cron job to monitor the log after exactly this
>> > error message.
> [SNIP]
> While this is all perfectly true, there also remains the question of "If we
> *know* the media is faulty in this spot, how do we know the media isn't
> faulty in these spots?"
> ie, once you are on faulty hardware, you never really know what is the
> correct course of action for this specific fault.
> I can conceive of a few things that *might* be the right thing in various
> circumstances:
> A) Immediately re-mount the drive read-only, and wait for the sysadmin to
> either re-mount rw or to do some other data recovery/repair
This would be devastating for some form of servers and cause downtime
and loss of work for plenty. Consider an application server, or a
document server within a company. If writes were prohibited then lots
of programs do stop function, or saving fails.
There are millions of other examples I can think of too, that would
make me not choose this option.
Unplanned downtime do cause lot of harm to any business.
> B) Immediately dis-mount the drive and wait
Urg.. Even worse than the above case.
> C) OK, I tried to write to sector 1324 so lets just try each consecutive
> available sector until it doesn't return an error (possibly marking the
> sectors bad/used as we go)
Yes, or use another algorithm to find safe/free sectors.
> D) Just return an error to the application
This would at least allow some services/applications to continue
running. Still, important data and services may not work.
> Of course, all of the above would *also* log an error message to the kernel
> log.
> Now, some options might be suitable if for example the head of the disk has
> crashed into the surface and hence the more you attempt to
> read/write/seek/etc the more damage you do (as opposed to immediately
> stopping all access to the disk and thereby preserving the data).
> Some options are more suitable for example you have a system which
> absolutely needs to run 20 hours/7days and it is only writing to disk as a
> informational log.
Agreed. There are many different cases that should be met with
different approaches.
> Of course, it all depends on the reason the hardware error has showed it's
> ugly face, and what your individual circumstances are. Perhaps re-mapping is
> just making the corruption worse (for every write the drive gets more
> confused and scribbles in the wrong spot)...
I am not sure about that. You certainly do not write on top of user
data, but on (marked) free space. If the drive cannot write, then no
harm would be done to the user data anyway. Of course in case of
electronics failure/firmware failure then you would perhaps get random
writes instead of where you want them. However I believe you would
have more serious problems before then (like not being able to read).
> Like most things MS do, they just take the common approach and pretend that
> it will work for everyone.
Easy to blame MS, but Bad Blocks handling from FAT12 and ever since
has probably saved more data than not having it.
Just because MS does things it does not mean they are bad and that
Linux world should refuse to do something similar. That is just
shooting yourself in the foot.
> Just my 0.02c worth.
How about 0.00c? ;)
- Anders
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 13:42 ` Anders Widman
@ 2003-02-12 14:15 ` Russell Coker
2003-02-12 15:26 ` Anders Widman
2003-02-13 3:31 ` Zygo Blaxell
0 siblings, 2 replies; 94+ messages in thread
From: Russell Coker @ 2003-02-12 14:15 UTC (permalink / raw)
To: Anders Widman, reiserfs-list
On Wed, 12 Feb 2003 14:42, Anders Widman wrote:
> > A) Immediately re-mount the drive read-only, and wait for the sysadmin to
> > either re-mount rw or to do some other data recovery/repair
>
> This would be devastating for some form of servers and cause downtime
> and loss of work for plenty. Consider an application server, or a
> document server within a company. If writes were prohibited then lots
> of programs do stop function, or saving fails.
Servers should have RAID. There is no excuse. With RAID the regular disk
errors should not be an issue.
> Unplanned downtime do cause lot of harm to any business.
It's better to stop when there's a serious error than to blindly continue and
make things worse.
> Easy to blame MS, but Bad Blocks handling from FAT12 and ever since
> has probably saved more data than not having it.
>
> Just because MS does things it does not mean they are bad and that
> Linux world should refuse to do something similar. That is just
> shooting yourself in the foot.
The FAT bad block handling was developed when RAID was virtually unknown (and
not available for PCs) and when no commonly available hard drives had the
ability to relocate bad blocks (the drive controller received the analog
signal from the disk heads - there was no abstraction).
Now all machines other than laptops are getting RAID, all hard drives support
re-mapping bad sectors, and the entire situation is different.
Bad block handling is only needed for laptops.
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 14:15 ` Russell Coker
@ 2003-02-12 15:26 ` Anders Widman
2003-02-12 16:22 ` bscott
` (2 more replies)
2003-02-13 3:31 ` Zygo Blaxell
1 sibling, 3 replies; 94+ messages in thread
From: Anders Widman @ 2003-02-12 15:26 UTC (permalink / raw)
To: reiserfs-list
> On Wed, 12 Feb 2003 14:42, Anders Widman wrote:
>> > A) Immediately re-mount the drive read-only, and wait for the sysadmin to
>> > either re-mount rw or to do some other data recovery/repair
>>
>> This would be devastating for some form of servers and cause downtime
>> and loss of work for plenty. Consider an application server, or a
>> document server within a company. If writes were prohibited then lots
>> of programs do stop function, or saving fails.
> Servers should have RAID. There is no excuse. With RAID the regular disk
> errors should not be an issue.
Yes, most should use some form of hardware redundancy.
>> Unplanned downtime do cause lot of harm to any business.
> It's better to stop when there's a serious error than to blindly continue and
> make things worse.
I (and I think no one else) never said continue blindly. Most
users/workstations do not have RAID and probably never will.
I can take any normal home-user as an example. They use single drives,
but still they need to be able to rely on the filesystem to protect
their data and not stop working if the drive has a few bad blocks.
Not all users has spare drives, nor can find a replacement right away
so they have to be able to use their computers even though there are
bad blocks.
There are two kinds of people I come across when it comes to Linux:
Those that say it is perfect and blame everything on the user or the
hardware.
The others want to make Linux a viable option for "normal" users and
want Linux to be able to replace Windows or Mac OS. The only way I see
that happen is if Linux starts to get more userfriendly and safe.
- Anders
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 15:26 ` Anders Widman
@ 2003-02-12 16:22 ` bscott
2003-02-12 16:28 ` Russell Coker
2003-02-13 3:42 ` Zygo Blaxell
2 siblings, 0 replies; 94+ messages in thread
From: bscott @ 2003-02-12 16:22 UTC (permalink / raw)
To: ReiserFS Mailing List; +Cc: Anders Widman
On Wed, 12 Feb 2003, at 4:26pm, andewid@tnonline.net wrote:
> They use single drives, but still they need to be able to rely on the
> filesystem to protect their data and not stop working if the drive has a
> few bad blocks.
Question: What should the filesystem software do?
The filesystem stores data in blocks on disk. It expects to be able read
and write to them to do its job. When you take that away, you take away a
fundamental.
> Not all users has spare drives, nor can find a replacement right away so
> they have to be able to use their computers even though there are bad
> blocks.
"Not all users have UPSes (battery backups), nor can they find a UPS right
away, so they have to be able to use their computers even though the power
is out."
> The only way I see that happen is if Linux starts to get more userfriendly
> and safe.
A bad disk drive isn't a Linux issue. It's a hardware issue. Windows
and/or MacOS cannot magically make a bad disk into a good disk, either.
--
Ben Scott <bscott@ntisys.com>
| The opinions expressed in this message are those of the author and do |
| not represent the views or policy of any other person or organization. |
| All information is provided without warranty of any kind. |
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 15:26 ` Anders Widman
2003-02-12 16:22 ` bscott
@ 2003-02-12 16:28 ` Russell Coker
2003-02-12 16:40 ` Anders Widman
2003-02-13 3:42 ` Zygo Blaxell
2 siblings, 1 reply; 94+ messages in thread
From: Russell Coker @ 2003-02-12 16:28 UTC (permalink / raw)
To: Anders Widman, reiserfs-list
On Wed, 12 Feb 2003 16:26, Anders Widman wrote:
> >> Unplanned downtime do cause lot of harm to any business.
> >
> > It's better to stop when there's a serious error than to blindly continue
> > and make things worse.
>
> I (and I think no one else) never said continue blindly. Most
> users/workstations do not have RAID and probably never will.
Hard drive costs are constantly decreasing while the value of data is
constantly increasing. I think that the use of RAID will increase steadily.
> The others want to make Linux a viable option for "normal" users and
> want Linux to be able to replace Windows or Mac OS. The only way I see
> that happen is if Linux starts to get more userfriendly and safe.
I guess you're not familiar with what NT does then.
NT 3.5x would sometimes get confused about it's data and umount the file
system in question to avoid the risk of damaging data.
In case of a serious kernel error NT will give a BSOD in situations where
Linux by default will print an Oops message and continue running.
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 16:28 ` Russell Coker
@ 2003-02-12 16:40 ` Anders Widman
0 siblings, 0 replies; 94+ messages in thread
From: Anders Widman @ 2003-02-12 16:40 UTC (permalink / raw)
To: reiserfs-list
> On Wed, 12 Feb 2003 16:26, Anders Widman wrote:
>> >> Unplanned downtime do cause lot of harm to any business.
>> >
>> > It's better to stop when there's a serious error than to blindly continue
>> > and make things worse.
>>
>> I (and I think no one else) never said continue blindly. Most
>> users/workstations do not have RAID and probably never will.
> Hard drive costs are constantly decreasing while the value of data is
> constantly increasing. I think that the use of RAID will increase steadily.
>> The others want to make Linux a viable option for "normal" users and
>> want Linux to be able to replace Windows or Mac OS. The only way I see
>> that happen is if Linux starts to get more userfriendly and safe.
> I guess you're not familiar with what NT does then.
> NT 3.5x would sometimes get confused about it's data and umount the file
> system in question to avoid the risk of damaging data.
> In case of a serious kernel error NT will give a BSOD in situations where
> Linux by default will print an Oops message and continue running.
NT3.5 is a little old to compare a modern OS with, is it not? I have
had numerous Linux kernel crashes that were not recoverable also.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 15:26 ` Anders Widman
2003-02-12 16:22 ` bscott
2003-02-12 16:28 ` Russell Coker
@ 2003-02-13 3:42 ` Zygo Blaxell
2003-02-13 10:13 ` Anders Widman
2 siblings, 1 reply; 94+ messages in thread
From: Zygo Blaxell @ 2003-02-13 3:42 UTC (permalink / raw)
To: reiserfs-list
In article <46110589437.20030212162613@tnonline.net>,
Anders Widman <andewid@tnonline.net> wrote:
>The others want to make Linux a viable option for "normal" users and
>want Linux to be able to replace Windows or Mac OS. The only way I see
>that happen is if Linux starts to get more userfriendly and safe.
Last time I checked, Windows and Mac OS come to a near total halt when
they see a disk error while doing a write on non-removable media, unless
the application goes to extraordinary lengths to handle the error itself.
Frankly, I used to mount my ext3 filesystems on servers with
'errors=panic', causing a reboot at the very first sign of trouble (past
tense as I now use reiserfs which doesn't like that option ;-).
The sooner the server goes out of production and starts running fsck,
the sooner it will finish running fsck and come back into production
(or, in the worst case, the sooner an admin person will start pulling
out backup tapes and ordering replacement disks).
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-13 3:42 ` Zygo Blaxell
@ 2003-02-13 10:13 ` Anders Widman
2003-02-13 14:44 ` Rudy Zijlstra
0 siblings, 1 reply; 94+ messages in thread
From: Anders Widman @ 2003-02-13 10:13 UTC (permalink / raw)
To: reiserfs-list
> In article <46110589437.20030212162613@tnonline.net>,
> Anders Widman <andewid@tnonline.net> wrote:
>>The others want to make Linux a viable option for "normal" users and
>>want Linux to be able to replace Windows or Mac OS. The only way I see
>>that happen is if Linux starts to get more userfriendly and safe.
> Last time I checked, Windows and Mac OS come to a near total halt when
> they see a disk error while doing a write on non-removable media, unless
> the application goes to extraordinary lengths to handle the error itself.
Actually no. :) Windows continue to run (ok, maybe now win9x or WinNT,
but these are old anyway). You can just remove a harddrive in Windows
XP and the system continues to run. Or you can add new PCI cards and
Windows will find those too.
> Frankly, I used to mount my ext3 filesystems on servers with
> 'errors=panic', causing a reboot at the very first sign of trouble (past
> tense as I now use reiserfs which doesn't like that option ;-).
> The sooner the server goes out of production and starts running fsck,
> the sooner it will finish running fsck and come back into production
> (or, in the worst case, the sooner an admin person will start pulling
> out backup tapes and ordering replacement disks).
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-13 10:13 ` Anders Widman
@ 2003-02-13 14:44 ` Rudy Zijlstra
0 siblings, 0 replies; 94+ messages in thread
From: Rudy Zijlstra @ 2003-02-13 14:44 UTC (permalink / raw)
To: Anders Widman; +Cc: reiserfs-list
On Thu, 13 Feb 2003, Anders Widman wrote:
> > In article <46110589437.20030212162613@tnonline.net>,
> > Anders Widman <andewid@tnonline.net> wrote:
> >>The others want to make Linux a viable option for "normal" users and
> >>want Linux to be able to replace Windows or Mac OS. The only way I see
> >>that happen is if Linux starts to get more userfriendly and safe.
>
> > Last time I checked, Windows and Mac OS come to a near total halt when
> > they see a disk error while doing a write on non-removable media, unless
> > the application goes to extraordinary lengths to handle the error itself.
>
> Actually no. :) Windows continue to run (ok, maybe now win9x or WinNT,
> but these are old anyway). You can just remove a harddrive in Windows
> XP and the system continues to run. Or you can add new PCI cards and
> Windows will find those too.
>
>
Provided you first shut it down, then yes. I am not aware of PC hardware
that will allow you to savely do this with power on the board. Disk removal and
addition also worked using Win2K. And by the way, also using Linux -:)
If you get troubles with the system disk under windows, i do not know
what happens, likely to be interesting... And I have had Linux running
with 1 disk disconnected after it was mounted. unexpexted SCSI disconnect.
All kept working, except for the paritions that were unreachable. Which
happened to be reiserfs and were unharmed.
Cheers
Rudy
P.S. I am getting RAID for that particular system...
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 14:15 ` Russell Coker
2003-02-12 15:26 ` Anders Widman
@ 2003-02-13 3:31 ` Zygo Blaxell
1 sibling, 0 replies; 94+ messages in thread
From: Zygo Blaxell @ 2003-02-13 3:31 UTC (permalink / raw)
To: reiserfs-list
In article <200302121515.46953.russell@coker.com.au>,
Russell Coker <russell@coker.com.au> wrote:
>Now all machines other than laptops are getting RAID, all hard drives support
>re-mapping bad sectors, and the entire situation is different.
Actually, laptops get RAID too... ;-)
My laptop can have up to 3 2.5" IDE disks simultaneously installed, if
I remove optional equipment such as second batteries and CD-ROM.
"/" is /dev/loop7 (rijndael encryption) on top of /dev/md0 on top of
/dev/hd[ab]2.
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 11:12 ` Adam Goryachev
2003-02-12 13:42 ` Anders Widman
@ 2003-02-12 16:39 ` Sam Vilain
1 sibling, 0 replies; 94+ messages in thread
From: Sam Vilain @ 2003-02-12 16:39 UTC (permalink / raw)
To: Adam Goryachev, reiserfs-list
On Thu, 13 Feb 2003 00:12, Adam Goryachev wrote:
> I can conceive of a few things that *might* be the right thing in
> various circumstances:
>
> A) Immediately re-mount the drive read-only, and wait for the sysadmin
> to either re-mount rw or to do some other data recovery/repair
>
> B) Immediately dis-mount the drive and wait
>
> C) OK, I tried to write to sector 1324 so lets just try each consecutive
> available sector until it doesn't return an error (possibly marking the
> sectors bad/used as we go)
>
> D) Just return an error to the application
Or a mixture...
C) with a max limit of, say 5 attempts, then D). And then, later if it
gets `really bad', where most I/O operations are failing, then A).
But I'd consider it acceptable behaviour for bounds check exceptions (ie,
unreported filesystem corruption) or situations where you have lost a
large amount of really critical structural information to invoke B). Much
better than an Oops.
Whoever made that statement about the hard disk head crashing... now that's
certainly a laughable suggestion; a hard disk continuing after a head
crash. If anything, my experience with disks has been that if they start
failing, you have to sort things out sooner rather than later.
--
Sam Vilain, sam@vilain.net
You can judge your age by the amount of pain you feel when you come
in contact with a new idea.
JOHN NUVEEN
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 23:17 ` Anders Widman
2003-02-12 0:12 ` Hans Reiser
@ 2003-02-12 5:12 ` Ross Vandegrift
2003-02-12 7:17 ` Oleg Drokin
` (2 subsequent siblings)
4 siblings, 0 replies; 94+ messages in thread
From: Ross Vandegrift @ 2003-02-12 5:12 UTC (permalink / raw)
To: Anders Widman; +Cc: reiserfs-list
On Wed, Feb 12, 2003 at 12:17:47AM +0100, Anders Widman wrote:
> Sure, a disk with physical errors should be replaced, but until you
> find out about the error on the drive the FS HAS TO HANDLE these kinds
> of problems.
No, this is *ridiculous*. A filesystem should tolerate a failed disk when
the VM system can handle bad memory, or the scheduler can handle an
overheated CPU. In all three cases, hard failure of a machine is
certain, unless hardware is replaced before the failure occurs.
You have to start your software on some kind of foundation. Working
hardware sounds like a great place to me.
--
Ross Vandegrift
ross@willow.seitz.com
A Pope has a Water Cannon. It is a Water Cannon.
He fires Holy-Water from it. It is a Holy-Water Cannon.
He Blesses it. It is a Holy Holy-Water Cannon.
He Blesses the Hell out of it. It is a Wholly Holy Holy-Water Cannon.
He has it pierced. It is a Holey Wholly Holy Holy-Water Cannon.
He makes it official. It is a Canon Holey Wholly Holy Holy-Water Cannon.
Batman and Robin arrive. He shoots them.
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 23:17 ` Anders Widman
2003-02-12 0:12 ` Hans Reiser
2003-02-12 5:12 ` Ross Vandegrift
@ 2003-02-12 7:17 ` Oleg Drokin
2003-02-12 10:17 ` Alexander Lyamin
2003-02-12 16:25 ` Vitaly Fertman
4 siblings, 0 replies; 94+ messages in thread
From: Oleg Drokin @ 2003-02-12 7:17 UTC (permalink / raw)
To: Anders Widman; +Cc: reiserfs-list
Hello!
On Wed, Feb 12, 2003 at 12:17:47AM +0100, Anders Widman wrote:
> >> I've used ReiserFS in the past, but have also used ext3 on my
> >> user's important
> >> data (/home) after a good chunk of one drive was converted to
> >> sparse/null files due to a screwup stemming from no 'badblocks' support
> >> in reiserfs. Since then, i've used ext3 as well as Reiser but recently
> > I can't comment on your experience, but personally if I have a drive with
> > any number of badblocks (which are showing up to the fs layer, not invisibly
> > re-mapped by the drive) then I take the drive back and get a replacement, or
> > bin the drive.
> However, the FS SHOULD support handling of bad blocks/clusters at the
Well, the FS itself support this. Kind of ;)
Just mark bad blocks are "used".
Of course this does not work when bad block is in journal (solved with
relocate/custom journal) or in bitmap block.
Said that, I know that ext3 does not do very well if there is a bad block
in the journal area.
Another problem is write errors (especially into journal areas). I do not
know about ext3, but reiserfs just fails in such a case, though I know
that SuSE people are working on resolving this problem.
> FS layer, even while running in a production system. Bad blocks can
> pop up at any give time for no particular reason, and it is at these
> times you (we) need a strong and reliable filesystem that can handle
> and logically remap broken blocks/sectors.
Hm. None of existing filesystems for Linux can do this to my knowledge.
Bye,
Oleg
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 23:17 ` Anders Widman
` (2 preceding siblings ...)
2003-02-12 7:17 ` Oleg Drokin
@ 2003-02-12 10:17 ` Alexander Lyamin
2003-02-12 10:19 ` Alexander Lyamin
2003-02-12 16:25 ` Vitaly Fertman
4 siblings, 1 reply; 94+ messages in thread
From: Alexander Lyamin @ 2003-02-12 10:17 UTC (permalink / raw)
To: Anders Widman; +Cc: reiserfs-list
Wed, Feb 12, 2003 at 12:17:47AM +0100, Anders Widman wrote:
> However, the FS SHOULD support handling of bad blocks/clusters at the
> FS layer, even while running in a production system. Bad blocks can
> pop up at any give time for no particular reason, and it is at these
> times you (we) need a strong and reliable filesystem that can handle
> and logically remap broken blocks/sectors.
Once i compared disk storage systems to a burger where each of components -
ham (hdd), cheese(controller) and bread (filesystem) think that they are smartest one. and totally unwilling to cooperate to give a burger better taste.
problem with remapping that modern HDD's do this thing, and when you have
bad blocks leacked in upper layer (FS) chances are that
THINGS ARE REALLY BAD AND UGLY.
you'd better bin this HDD, unless you dont care about data.
--
"Cache remedies via multi-variable logic shorts will leave you crying."(cl)
Lex Lyamin
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 10:17 ` Alexander Lyamin
@ 2003-02-12 10:19 ` Alexander Lyamin
0 siblings, 0 replies; 94+ messages in thread
From: Alexander Lyamin @ 2003-02-12 10:19 UTC (permalink / raw)
To: Alexander Lyamin; +Cc: Anders Widman, reiserfs-list
Wed, Feb 12, 2003 at 01:17:06PM +0300, Alexander Lyamin wrote:
> Wed, Feb 12, 2003 at 12:17:47AM +0100, Anders Widman wrote:
> > However, the FS SHOULD support handling of bad blocks/clusters at the
> > FS layer, even while running in a production system. Bad blocks can
> > pop up at any give time for no particular reason, and it is at these
> > times you (we) need a strong and reliable filesystem that can handle
> > and logically remap broken blocks/sectors.
>
> Once i compared disk storage systems to a burger where each of components -
> ham (hdd), cheese(controller) and bread (filesystem) think that they are smartest one. and totally unwilling to cooperate to give a burger better taste.
>
> problem with remapping that modern HDD's do this thing, and when you have
> bad blocks leacked in upper layer (FS) chances are that
> THINGS ARE REALLY BAD AND UGLY.
>
> you'd better bin this HDD, unless you dont care about data.
But being a diffrent-thinking kind of bread i could not say we were not trying.
Vitaly could comment on this more :)
--
"Cache remedies via multi-variable logic shorts will leave you crying."(cl)
Lex Lyamin
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 23:17 ` Anders Widman
` (3 preceding siblings ...)
2003-02-12 10:17 ` Alexander Lyamin
@ 2003-02-12 16:25 ` Vitaly Fertman
2003-02-12 16:56 ` Anders Widman
4 siblings, 1 reply; 94+ messages in thread
From: Vitaly Fertman @ 2003-02-12 16:25 UTC (permalink / raw)
To: reiserfs-list
On Wednesday 12 February 2003 02:17, Anders Widman wrote:
> >> I've used ReiserFS in the past, but have also used ext3 on my
> >> user's important
> >> data (/home) after a good chunk of one drive was converted to
> >> sparse/null files due to a screwup stemming from no 'badblocks' support
> >> in reiserfs. Since then, i've used ext3 as well as Reiser but recently
> >
> > I can't comment on your experience, but personally if I have a drive with
> > any number of badblocks (which are showing up to the fs layer, not
> > invisibly re-mapped by the drive) then I take the drive back and get a
> > replacement, or bin the drive.
>
> However, the FS SHOULD support handling of bad blocks/clusters at the
> FS layer, even while running in a production system. Bad blocks can
> pop up at any give time for no particular reason, and it is at these
> times you (we) need a strong and reliable filesystem that can handle
> and logically remap broken blocks/sectors.
>
> Sure, a disk with physical errors should be replaced, but until you
> find out about the error on the drive the FS HAS TO HANDLE these kinds
> of problems.
That is difficult to say if bad blocks should be handled at fs layer or not.
It would be useful to have this problem solved somehow, but harddrives with
their remappings looks like the proper part of doing this. And probably fs
layer should just skilfully use some interface for such remapping. Well,
remapping is probably not correct word here. Thus, Xuan Baldauf
<xuan--reiserfs@baldauf.org> sent us his program once claimed that it recovered
blocks w/out remapping. The explanations were the following:
> The problem is that often multiple adjacent blocks are bad. You'll have to detect
> them manually. Once you know the bad blocks, just trying to overwrite them usually
> does not succeed because the disk wants to seek to that block exactly (which does
> not work for the same reason the block is bad). But if the whole track is
> rewritten, the bad blocks usually are gone.
>
> I suspect track wandering for this: due to small misalignments at each write, a track (or more
> precisely, and arc of the track which contains the block to be written) slowly wanders. If the
> misalignments do not zero out each other, they add up to a bias. If an arc of an has been
> written many times, it will have wandered under these conditions. If the wandering has
> progressed too far, the wandering arc slowly reaches the next neighbouring track.
>
> Now imagine an access to the wandered track: if the head seeks to the original position of the
> wandered track, it may not be able to read the wandered arc because it is too far away (lower
> signal quality). If the head seeks to the new position of the wandered arc, the signal may be
> interfered by the neighbouring track.
>
> Both effects may occur, which one does not really matter, both makes parts of the wandered arc
> inaccessible
>
> The problem is: the individual wandered arc is no longer accessible, because the disk
> controller cannot sync to the block it is flying over because of the bad
> signal-to-noise-ratio. And if the wandered arc is accessible, another write will make it
> further wander up to inaccessibility.
>
> But if the seek to the track of the arc which should be overwritten occurs before the wandered
> arc, the disk controller actually can sync to the track and then write the whole track,
> effectivily creating the track new and only having the bias of the not-wandered part of the
> track. Thus, the wandered arc has not wandered anymore compared to the other arcs of the
> track.
Well, it worked. We had some bad blocks on a drive, write to them failed, after using
this program there were no bad blocks anymore.
So it would be possible to do some actions to 1) get some blocks back in the described
way, 1.1) write to really bad blocks should have remaped them already here if there is
a space in remap area 2) save bad blocks to badblock list in fs if they are still bad -
out of remap area.
Would be not bad to try to recover in this way already remapped blocks - do not know how
to get the list of them only.
Ok, but what if the IO error you got is not a bad block, but a bad cable? Do you want
the fs to work in the described way? trying to fix all automatically? I am not sure.
Now about the user space. Using badblocks and some programs like Xuan Baldauf sent us
and just trying to write to bad blocks make them being remapped - that is how you can
try to get rid of some amount of badblocks. Should a drive with amount of bad blocks
which exceeds the remap area be used? It is a realy rare case that the amount of bad
blocks of such a drive does not get increased - the case where you may want to continue
using the drive - so this is why a proper support for bad blocks was not implemented
in reiserfs yet. And probably it is not the most urgent thing to do.
--
Thanks,
Vitaly Fertman
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 16:25 ` Vitaly Fertman
@ 2003-02-12 16:56 ` Anders Widman
2003-02-12 17:13 ` Oleg Drokin
0 siblings, 1 reply; 94+ messages in thread
From: Anders Widman @ 2003-02-12 16:56 UTC (permalink / raw)
To: reiserfs-list
> On Wednesday 12 February 2003 02:17, Anders Widman wrote:
>> >> I've used ReiserFS in the past, but have also used ext3 on my
>> >> user's important
>> >> data (/home) after a good chunk of one drive was converted to
>> >> sparse/null files due to a screwup stemming from no 'badblocks' support
>> >> in reiserfs. Since then, i've used ext3 as well as Reiser but recently
>> >
>> > I can't comment on your experience, but personally if I have a drive with
>> > any number of badblocks (which are showing up to the fs layer, not
>> > invisibly re-mapped by the drive) then I take the drive back and get a
>> > replacement, or bin the drive.
>>
>> However, the FS SHOULD support handling of bad blocks/clusters at the
>> FS layer, even while running in a production system. Bad blocks can
>> pop up at any give time for no particular reason, and it is at these
>> times you (we) need a strong and reliable filesystem that can handle
>> and logically remap broken blocks/sectors.
>>
>> Sure, a disk with physical errors should be replaced, but until you
>> find out about the error on the drive the FS HAS TO HANDLE these kinds
>> of problems.
> That is difficult to say if bad blocks should be handled at fs layer or not.
> It would be useful to have this problem solved somehow, but harddrives with
> their remappings looks like the proper part of doing this. And probably fs
> layer should just skilfully use some interface for such remapping. Well,
> remapping is probably not correct word here. Thus, Xuan Baldauf
> <xuan--reiserfs@baldauf.org> sent us his program once claimed that it recovered
> blocks w/out remapping. The explanations were the following:
>> The problem is that often multiple adjacent blocks are bad. You'll have to detect
>> them manually. Once you know the bad blocks, just trying to overwrite them usually
>> does not succeed because the disk wants to seek to that block exactly (which does
>> not work for the same reason the block is bad). But if the whole track is
>> rewritten, the bad blocks usually are gone.
>>
>> I suspect track wandering for this: due to small misalignments at each write, a track (or more
>> precisely, and arc of the track which contains the block to be written) slowly wanders. If the
>> misalignments do not zero out each other, they add up to a bias. If an arc of an has been
>> written many times, it will have wandered under these
>> conditions. If the wandering has
>> progressed too far, the wandering arc slowly reaches the next neighbouring track.
>>
>> Now imagine an access to the wandered track: if the head seeks to the original position of the
>> wandered track, it may not be able to read the wandered arc
>> because it is too far away (lower
>> signal quality). If the head seeks to the new position of the wandered arc, the signal may be
>> interfered by the neighbouring track.
>>
>> Both effects may occur, which one does not really matter, both makes parts of the wandered arc
>> inaccessible
>>
>> The problem is: the individual wandered arc is no longer accessible, because the disk
>> controller cannot sync to the block it is flying over because of the bad
>> signal-to-noise-ratio. And if the wandered arc is accessible, another write will make it
>> further wander up to inaccessibility.
>>
>> But if the seek to the track of the arc which should be
>> overwritten occurs before the wandered
>> arc, the disk controller actually can sync to the track and then write the whole track,
>> effectivily creating the track new and only having the bias of the not-wandered part of the
>> track. Thus, the wandered arc has not wandered anymore compared to the other arcs of the
>> track.
> Well, it worked. We had some bad blocks on a drive, write to them failed, after using
> this program there were no bad blocks anymore.
> So it would be possible to do some actions to 1) get some blocks back in the described
> way, 1.1) write to really bad blocks should have remaped them already here if there is
> a space in remap area 2) save bad blocks to badblock list in fs if they are still bad -
> out of remap area.
> Would be not bad to try to recover in this way already remapped blocks - do not know how
> to get the list of them only.
> Ok, but what if the IO error you got is not a bad block, but a bad cable? Do you want
> the fs to work in the described way? Trying to fix all automatically? I am not sure.
How about trial and (then) error? :)
> Now about the user space. Using badblocks and some programs like Xuan Baldauf sent us
> and just trying to write to bad blocks make them being remapped - that is how you can
> try to get rid of some amount of badblocks. Should a drive with amount of bad blocks
> which exceeds the remap area be used? It is a realy rare case that the amount of bad
> blocks of such a drive does not get increased - the case where you may want to continue
> using the drive - so this is why a proper support for bad blocks was not implemented
> in reiserfs yet. And probably it is not the most urgent thing to do.
No, perhaps bad blocks handling is not the major i mprovement we
need, however I feel it is still an important part of any FS - to
handle errors gracefully and not throw the user to ground.
- Anders
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 16:56 ` Anders Widman
@ 2003-02-12 17:13 ` Oleg Drokin
0 siblings, 0 replies; 94+ messages in thread
From: Oleg Drokin @ 2003-02-12 17:13 UTC (permalink / raw)
To: Anders Widman; +Cc: reiserfs-list
Hello!
On Wed, Feb 12, 2003 at 05:56:58PM +0100, Anders Widman wrote:
> > So it would be possible to do some actions to 1) get some blocks back in the described
> > way, 1.1) write to really bad blocks should have remaped them already here if there is
> > a space in remap area 2) save bad blocks to badblock list in fs if they are still bad -
> > out of remap area.
> > Would be not bad to try to recover in this way already remapped blocks - do not know how
> > to get the list of them only.
> > Ok, but what if the IO error you got is not a bad block, but a bad cable? Do you want
> > the fs to work in the described way? Trying to fix all automatically? I am not sure.
> How about trial and (then) error? :)
That might be suitable for fsck, but not for kernel I am sure.
Kernel should just probably return error or try to use different block (if it was
doing write) and if certain number of attempts failed, return error too.
Also remount R/O if write error is in system area (journal, superblock, bitmaps)
or special mount option was given that demands remounting R/O on io errors.
Bye,
Oleg
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-11 23:11 ` Adam Goryachev
2003-02-11 23:17 ` Anders Widman
@ 2003-02-12 1:02 ` Mike Hodson
2003-02-12 7:25 ` Oleg Drokin
` (2 more replies)
1 sibling, 3 replies; 94+ messages in thread
From: Mike Hodson @ 2003-02-12 1:02 UTC (permalink / raw)
To: Adam Goryachev; +Cc: reiserfs-list
> I can't comment on your experience, but personally if I have a drive with
> any number of badblocks (which are showing up to the fs layer, not invisibly
> re-mapped by the drive) then I take the drive back and get a replacement, or
> bin the drive.
I'd have liked to do that, but 2 problems. 1, the drive was an OEM drive,
and the reseller I bought it from went out of business- Western Digital
wouldn't accept an RMA directly from an end-user on that drive. and 2, I
didn't even notice errors until one day it wouldnt work properly. At
that time I thought it was a reiserfs inconstancy, and I fscked. at that
point I noticed a few {DriveReady -SeekComplete} errors and then most of
the data on the drive nulled itself out.
> I suppose that if you continue to have a growing number of FS errors, then
> you either have faulty hardware, or are using a buggy version of
> software.... If you already admit to badblocks, then I would blame
> hardware..
The case I outlined at the beginning of my message involved 2 specific
drives. The drive that completely failed was a WD 6.4 gig. The 20gig was
a Maxtor that I've had for 2.5 years, that to my knowledge does not have
media errors, but there have been some FS corruptions. I don't think its
media since I never see any 'DriveReady-seekcomplete' errors that you
usually get with bad sectors.
> Hmmm, so apart from finding a number of errors and doing it's best to fix
> them (putting them into lost+found) and recovering all of your other files
> even with hardware issues present, you recovered all of your data? The
> problem here is?
The fact that the filesystem got so many errors in the first place. And
as ive said, hardware issues are AFAIK not the cause.
If a harddrive is showing badblocks and then the disk vendors tool shows no
errors, I think a simple dd over the whole disk or similar would really show
the true story....
If i haven't made myself clear, the badblocks problem was with a
different drive. The thought occured to me that there may have been bad
blocks due to the sporadic corruption, but I don't have any error
messages to back up that thought.
> ext2 requires periodic fsck's which are also rather dis-concerting when you
> are never quite sure whether you need to allow a few hours downtime for a
> kernel upgrade this time, or perhaps it is only a few minutes.
Well one way of being completely sure is to reset the mount count in the
filesystem before rebooting, or to set the fstab to never automatically
fsck. then on some set schedule, fsck along with a kernel upgrade, and
schedule the downtime
> Finally, I find it interesting to see people who swear they will never use
> reiserfs again to be on the reiserfs mailing list :).
I may use it at some point, when its as well proven as the second
extended filesystem is currently. Its interesting to see how many people
have errors, and when that number gets lower and more people start
posting good things I may reevaluate it at some point :)
Mike
--
Mike Hodson <mike@mystica.cx> ICQ: 18145059
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 1:02 ` Mike Hodson
@ 2003-02-12 7:25 ` Oleg Drokin
2003-02-12 9:45 ` Hans Reiser
2003-02-12 16:09 ` Sam Vilain
2 siblings, 0 replies; 94+ messages in thread
From: Oleg Drokin @ 2003-02-12 7:25 UTC (permalink / raw)
To: Mike Hodson; +Cc: Adam Goryachev, reiserfs-list
Hello!
On Tue, Feb 11, 2003 at 07:02:21PM -0600, Mike Hodson wrote:
> > Finally, I find it interesting to see people who swear they will never use
> > reiserfs again to be on the reiserfs mailing list :).
> I may use it at some point, when its as well proven as the second
> extended filesystem is currently. Its interesting to see how many people
> have errors, and when that number gets lower and more people start
> posting good things I may reevaluate it at some point :)
I guess you are not subscribed to ext3-users mailinglist?
The thing is that all the fs lists are filled with error reports (if fs is used
by someone of course).
When user have zero problems, he is just busy doing his own things usually.
Users seem to only write letters to FS mailing lists when they have problems.
(of course there are some exclusions, but general tendency is like this).
Bye,
Oleg
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 1:02 ` Mike Hodson
2003-02-12 7:25 ` Oleg Drokin
@ 2003-02-12 9:45 ` Hans Reiser
2003-02-12 16:09 ` Sam Vilain
2 siblings, 0 replies; 94+ messages in thread
From: Hans Reiser @ 2003-02-12 9:45 UTC (permalink / raw)
To: Mike Hodson; +Cc: Adam Goryachev, reiserfs-list
Mike Hodson wrote:
> At
>that time I thought it was a reiserfs inconstancy, and I fscked. at that
>point I noticed a few {DriveReady -SeekComplete} errors and then most of
>the data on the drive nulled itself out.
>
It is only to be expected that using fsck on a bad hard drive is going
to lead to complete disaster.
Maybe we should ask the user if he'd like us to verify the media first.
Though if they follow the instructions to use dd_rescue first, then
they'll know if it has bad sectors..... Probably a lot of users aren't
going to use dd_rescue first even if told to, and we should expect that.....
>
>
>>Hmmm, so apart from finding a number of errors and doing it's best to fix
>>them (putting them into lost+found) and recovering all of your other files
>>even with hardware issues present, you recovered all of your data? The
>>problem here is?
>>
>>
>The fact that the filesystem got so many errors in the first place. And
>as ive said, hardware issues are AFAIK not the cause.
>
>
>
>I may use it at some point, when its as well proven as the second
>extended filesystem is currently. Its interesting to see how many people
>have errors, and when that number gets lower and more people start
>posting good things I may reevaluate it at some point :)
>
>Mike
>
>
>
I think you'll find that it is a lot more stable now.
--
Hans
^ permalink raw reply [flat|nested] 94+ messages in thread* Re: Corrupted/unreadable journal: reiser vs. ext3
2003-02-12 1:02 ` Mike Hodson
2003-02-12 7:25 ` Oleg Drokin
2003-02-12 9:45 ` Hans Reiser
@ 2003-02-12 16:09 ` Sam Vilain
2 siblings, 0 replies; 94+ messages in thread
From: Sam Vilain @ 2003-02-12 16:09 UTC (permalink / raw)
To: Mike Hodson; +Cc: reiserfs-list
On Wed, 12 Feb 2003 14:02, Mike Hodson wrote:
> Well one way of being completely sure is to reset the mount count in the
> filesystem before rebooting, or to set the fstab to never automatically
> fsck. then on some set schedule, fsck along with a kernel upgrade, and
> schedule the downtime
Nah. Set up a mirror, wait for a fairly quiet time, sync, split the
mirror, fsck the split mirror, and only do something if that fsck fails
:-).
Solaris does all this very well. It's equivalent of `md-utils' - Online
Disk Suite - does journalling for you of all writes (including data) if
you turn it on; at the block level, ignorant of the FS. IMHO that's a
much better place to do the journalling. It's simple, solid.
--
Sam Vilain, sam@vilain.net
Do you have blacks, too?
- George W. Bush, talking to Fernando Henrique Cardoso (the president
of Brazil). Reported in Der Speigel on May 19 2002. Never
reported in any US paper or news source.
^ permalink raw reply [flat|nested] 94+ messages in thread
end of thread, other threads:[~2003-02-24 1:14 UTC | newest]
Thread overview: 94+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-12 20:57 Corrupted/unreadable journal: reiser vs. ext3 Dirk Schenkewitz
-- strict thread matches above, loose matches on Subject: below --
2003-02-20 9:55 Dirk Schenkewitz
2003-02-20 10:20 ` Anders Widman
2003-02-17 10:04 Dirk Schenkewitz
2003-02-20 1:27 ` Juan Quintela
2003-02-20 9:03 ` Anders Widman
2003-02-14 14:30 Dirk Schenkewitz
2003-02-14 14:20 Dirk Schenkewitz
2003-02-14 20:58 ` Valdis.Kletnieks
2003-02-14 0:18 Sam Vilain
2003-02-23 23:31 ` Zygo Blaxell
2003-02-24 1:14 ` Anders Widman
2003-02-14 0:17 Sam Vilain
2003-02-14 0:16 Sam Vilain
2003-02-23 23:10 ` Zygo Blaxell
2003-02-12 20:05 Dirk Schenkewitz
2003-02-13 22:49 ` Zygo Blaxell
2003-02-14 0:32 ` Hans Reiser
2003-02-14 8:18 ` Oleg Drokin
2003-02-14 10:13 ` Andreas Dilger
2003-02-14 10:17 ` Oleg Drokin
2003-02-14 10:50 ` Andreas Dilger
2003-02-14 10:59 ` Oleg Drokin
2003-02-14 13:34 ` Hans Reiser
2003-02-14 16:04 ` Rudy Zijlstra
2003-02-14 19:06 ` Andreas Dilger
2003-02-14 19:19 ` Hans Reiser
2003-02-15 12:51 ` Vitaly Fertman
2003-02-15 13:00 ` Vitaly Fertman
2003-02-18 19:50 ` Hans Reiser
2003-02-18 20:05 ` Vitaly Fertman
2003-02-18 22:18 ` Hans Reiser
2003-02-15 13:04 ` Anders Widman
2003-02-15 13:23 ` Oleg Drokin
2003-02-17 19:43 ` Hans Reiser
2003-02-15 22:37 ` Andreas Dilger
2003-02-18 18:21 ` Hans Reiser
2003-02-18 19:22 ` Oleg Drokin
2003-02-18 19:28 ` Hans Reiser
2003-02-18 21:17 ` Valdis.Kletnieks
2003-02-18 22:02 ` Matthias Andree
2003-02-19 6:26 ` Oleg Drokin
2003-02-18 22:23 ` Hans Reiser
2003-02-12 18:27 Anders Widman
2003-02-11 19:43 berthiaume_wayne
2003-02-12 10:48 ` Dirk Schenkewitz
2003-02-12 10:59 ` Hans Reiser
2003-02-12 11:24 ` Frank Baumgart
2003-02-12 11:35 ` Stefan Traby
2003-02-12 11:54 ` Dirk Schenkewitz
2003-02-12 12:42 ` Hans Reiser
2003-02-12 13:25 ` Dirk Schenkewitz
2003-02-12 16:22 ` Sam Vilain
2003-02-12 16:53 ` Anders Widman
2003-02-12 17:19 ` Hans Reiser
2003-02-12 17:40 ` Anders Widman
2003-02-12 18:15 ` Dirk Mueller
2003-02-12 18:20 ` Anders Widman
2003-02-12 18:20 ` Chris Dukes
2003-02-13 20:08 ` Zygo Blaxell
2003-02-11 18:59 Dirk Schenkewitz
2003-02-11 20:27 ` Hans Reiser
2003-02-11 21:30 ` Mike Hodson
2003-02-11 21:47 ` Hans Reiser
2003-02-11 21:58 ` Hans Reiser
2003-02-12 6:35 ` Oleg Drokin
2003-02-11 23:11 ` Adam Goryachev
2003-02-11 23:17 ` Anders Widman
2003-02-12 0:12 ` Hans Reiser
2003-02-12 10:23 ` Anders Widman
2003-02-12 10:47 ` Hans Reiser
2003-02-12 11:12 ` Adam Goryachev
2003-02-12 13:42 ` Anders Widman
2003-02-12 14:15 ` Russell Coker
2003-02-12 15:26 ` Anders Widman
2003-02-12 16:22 ` bscott
2003-02-12 16:28 ` Russell Coker
2003-02-12 16:40 ` Anders Widman
2003-02-13 3:42 ` Zygo Blaxell
2003-02-13 10:13 ` Anders Widman
2003-02-13 14:44 ` Rudy Zijlstra
2003-02-13 3:31 ` Zygo Blaxell
2003-02-12 16:39 ` Sam Vilain
2003-02-12 5:12 ` Ross Vandegrift
2003-02-12 7:17 ` Oleg Drokin
2003-02-12 10:17 ` Alexander Lyamin
2003-02-12 10:19 ` Alexander Lyamin
2003-02-12 16:25 ` Vitaly Fertman
2003-02-12 16:56 ` Anders Widman
2003-02-12 17:13 ` Oleg Drokin
2003-02-12 1:02 ` Mike Hodson
2003-02-12 7:25 ` Oleg Drokin
2003-02-12 9:45 ` Hans Reiser
2003-02-12 16:09 ` Sam Vilain
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.