From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S263426AbTDSS3X (ORCPT ); Sat, 19 Apr 2003 14:29:23 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263427AbTDSS3X (ORCPT ); Sat, 19 Apr 2003 14:29:23 -0400 Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:27552 "EHLO www.linux.org.uk") by vger.kernel.org with ESMTP id S263426AbTDSS3W (ORCPT ); Sat, 19 Apr 2003 14:29:22 -0400 Date: Sat, 19 Apr 2003 19:41:20 +0100 From: "Dr. David Alan Gilbert" To: linux-kernel@vger.kernel.org Subject: Re: Are linux-fs's drive-fault-tolerant by concept? Message-ID: <20030419184120.GH669@gallifrey> References: <20030419180421.0f59e75b.skraw@ithnet.com> <87lly6flrz.fsf@deneb.enyo.de> <20030419200712.3c48a791.skraw@ithnet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030419200712.3c48a791.skraw@ithnet.com> X-Chocolate: 70 percent or better cocoa solids preferably X-Operating-System: Linux/2.5.66 (i686) X-Uptime: 19:28:58 up 1 day, 7:29, 1 user, load average: 0.15, 0.25, 0.21 User-Agent: Mutt/1.5.4i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi, Besides the problem that most drive manufacturers now seem to use cheese as the data storage surface, I think there are some other problems: 1) I don't trust drive firmware. 2) I don't think all drives are set to remap sectors by default. 3) I don't believe that all drivers recover neatly from a drive error. 4) It is OK saying return the drive and get a new one - but many of us can't do this in a commercial environment where the contents of the drive are confidential - leading to stacks of dead drives (often many inside their now short warranty periods). To be fair I'm not sure if it is only the drive firmware I don't trust - it could be the controllers and the IDE drivers as well - I don't know. While RAID works well for drives that just go pop and die, for drives with dodgy firmware we just sit there and watch the filesystems decay. I don't think the kernel can do much about that - but it is a sad state. I'd find two things useful in this respect: 1) A tool to check the consistency of a RAID - presuming I shut my RAID down safely I should actually be able to use the redundant information to test it; this should reveal corruption early. (Perhaps the kernel could check a few sectors a second in the background) 2) A disc exerciser - something that I can use to see if this drive, connected to this controller, on this motherboard on this kernel actually works and keeps its data safe before I put it into live service. Dave (After a few weeks of fighting pissy IDE hard drives) ---------------- Have a happy GNU millennium! ---------------------- / Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \ \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex / \ _________________________|_____ http://www.treblig.org |_______/