From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hans Reiser Subject: Re: Corrupted/unreadable journal: reiser vs. ext3 Date: Wed, 12 Feb 2003 03:12:33 +0300 Message-ID: <3E499171.8080201@namesys.com> References: <3452483515.20030212001747@tnonline.net> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com In-Reply-To: <3452483515.20030212001747@tnonline.net> List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Anders Widman Cc: reiserfs-list@namesys.com Anders Widman wrote: >>>I've used ReiserFS in the past, but have also used ext3 on my >>>user's important >>>data (/home) after a good chunk of one drive was converted to >>>sparse/null files due to a screwup stemming from no 'badblocks' support >>>in reiserfs. Since then, i've used ext3 as well as Reiser but recently >>> >>> > > > >>I can't comment on your experience, but personally if I have a drive with >>any number of badblocks (which are showing up to the fs layer, not invisibly >>re-mapped by the drive) then I take the drive back and get a replacement, or >>bin the drive. >> >> > >However, the FS SHOULD support handling of bad blocks/clusters at the >FS layer, even while running in a production system. Bad blocks can >pop up at any give time for no particular reason, and it is at these >times you (we) need a strong and reliable filesystem that can handle >and logically remap broken blocks/sectors. > >Sure, a disk with physical errors should be replaced, but until you >find out about the error on the drive the FS HAS TO HANDLE these kinds >of problems. > > - Anders > > > > > > > We have gotten better at this over time. There was a point in time when some of our guys reviewed all the bad block handling. We still find cases where we could be better though. For some users it would be better to boot to a corrupted filesystem because running fsck is more of a problem than putting their data at higher risk. For datalogging, it is probably conceivable to just toss the journal and lose the more recent updates to it. For the default metadata journaling, this just does not seem prudent. I really prefer making users understand that they have a problem they need to do something about. This is just my style. I want them to fail to boot, and after some effort learn that there is this thing called fsck, and dd_rescue, and that it is time to buy another hard drive and chuck their current one. It would be best though if they were given detailed instructions about how they need to do this when the code hits that bad block. Vitaly, please work on that. If we handle the journal block error without downtime, the user will never chuck the hard drive, and that is bad in the longterm. -- Hans