From mboxrd@z Thu Jan 1 00:00:00 1970 From: Phil Turmel Subject: Re: need a little help rebuilding a raid 10 Date: Tue, 06 Dec 2011 09:52:11 -0500 Message-ID: <4EDE2C1B.8000008@turmel.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Greg Freemyer Cc: Linux RAID List-Id: linux-raid.ids Hi Greg, On 12/06/2011 09:11 AM, Greg Freemyer wrote: > Hmm... > > My rebuild failed. At first glance I had both a failed drive and a failed slot? > > What I don't understand is I have I/O errors in /var/log/messages from > when the rebuild failed over night. Something in your system is untrustworthy. > But this morning, hdparm --read-sector is reading the "bad" sectors fine. What does smartctl say about your drives (all of them)? > I already tried replacing the drive and the replacement drive also > reported media errors during the rebuild, that's why I came to believe > I had a bad slot. > > Now I have non-repeatable media errors. > > fyi: I have the problem drive connected via eSata now, so it's a > different controller totally than where it was when the failure first > occurred. Are the errors in /var/log/messages only from that drive? If so, then that drive is probably toast. > Any thoughts? Your prior e-mail said that you re-created the array. I didn't see that you had definitively nailed down the problem at that point, so it probably wasn't a good idea. In particular, it destroys all prior metadata on the array members. If you didn't keep the output of "mdadm -E" for each drive, that information is now lost. In general, "--create" is a last resort, and only to be used for recovery when you have absolute confidence you understand the layout (mdadm -E printouts of the original array). "--assemble --force" is the proper step after "--assemble" fails. I would completely scrub the questionable drive with random data, run a long smartctl test on it, and replace it if it reports any re-allocated sectors at that point. I would also run long smartctl tests on the other drives, looking for pending sectors or re-allocated sectors. If any, I would plan on replacements for them as well, and would try to validate the content of your files. You do have a backup to compare against, after all. If you are running a Debian-based distro, and the array contains your rootfs, you might find "debsums" useful. HTH, Phil