public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Martin Drab <drab@kepler.fjfi.cvut.cz>
Cc: Cynbe ru Taren <cynbe@muq.org>, linux-kernel@vger.kernel.org
Subject: Re: FYI: RAID5 unusably unstable through 2.6.14
Date: Thu, 02 Feb 2006 15:33:58 -0500	[thread overview]
Message-ID: <43E26CB6.7030808@tmr.com> (raw)
In-Reply-To: <Pine.LNX.4.60.0601172047560.25680@kepler.fjfi.cvut.cz>

Martin Drab wrote:

> Well, I had a similar experience lately with the Adaptec AAC-2410SA RAID 
> 5 array. Due to the CPU overheating the whole box was suddenly shot down 
> by the CPU damage protection mechanism. While there is no battery backup 
> on this particular RAID controller, the sudden poweroff caused some very 
> localized inconsistency of one disk in the RAID. The configuration was 
> 1x160 GB and 3x120GB, with the 160 GB being split into 120 GB part within 
> the RAID 5 and a 40 GB part as a separate volume. The inconsistency 
> happend in the 40 GB part of the 160 GB HDD (as reported by the Adaptec 
> BIOS media check). In particular the problem was in the /dev/sda2 (with 
> /dev/sda being the 40 GB Volume, /dev/sda1 being an NTFS Windows system, 
> and /dev/sda2 being ext3 Linux system).
> 
> Now, what is interesting, is that Linux completely refused any possible 
> access to every byte within /dev/sda, not even dd(1) reading from any 
> position within /dev/sda, not even "fdisk /dev/sda", nothing. Everything 
> ended up with lots of following messages:
> 
>         sd 0:0:0:0: SCSI error: return code = 0x8000002
>         sda: Current: sense key: Hardware Error
>             Additional sense: Internal target failure
>         Info fld=0x0
>         end_request: I/O error, dev sda, sector <some sector number>

But /dev/sda is not a Linux filesystem, running fsck on it makes no 
sense. You wanted to run on /dev/sda2.
> 
> I've consulted this with Mark Salyzyn, because I thought it was a problem 
> of the AACRAID driver. But I was told, that there is nothing that AACRAID 
> can possibly do about it, and that it is a problem of the upper Linux 
> layers (block device layer?) that are strictly fault intollerant, and 
> thouth the problem was just an inconsistency of one particular localized 
> region inside /dev/sda2, Linux was COMPLETELY UNABLE (!!!!!) to read a 
> single byte from the ENTIRE VOLUME (/dev/sda)!

The obvious test of this "it's not us" statement is to connect that one 
drive to another type controller and see if the upper level code 
recovers. I'm assuming that "sda" is a real drive and not some 
pseudo-drive which exists only in the firmware of the RAID controller. 
That message is curious, did you cat /proc/scsi/scsi to see what the 
system thought was there? Use the infamous "cdrecord -scanbus" command?

> 
> And now for the best part: From Windows, I was able to access the ENTIRE 
> VOLUME without the slightest problem. Not only did Windows boot entirely 
> from the /dev/sda1, but using Total Commander's ext3 plugin I was also 
> able to access the ENTIRE /dev/sda2 and at least extract the most 
> important data and configurations, before I did the complete low-level 
> formatting of the drive, which fixed the inconsistency problem.
> 
> I call this "AN IRONY" to be forced to use Windows to extract information 
> from Linux partition, wouldn't you? ;)
> 
> (Besides, even GRUB (using BIOS) accessed the /dev/sda without 
> complications - as it was the bootable volume. Only Linux failed here a 
> 100%. :()

 From the way you say sda when you presumably mean sda1 or sda2 it's not 
clear if you don't understand the difference between drive and partition 
access or are just so pissed off you are not taking the time to state 
the distinction clearly.

There was a problem with recovery from errors in RAID-5 which is 
addressed by recent changes to fail a sector, try rewriting it, etc. I 
would have to read linux-raid archives to explain it, so I'll stop with 
the overview. I don't think that's the issue here, you're using a RAID 
controller rather than the software RAID, so it should not apply.

I assume that the problem is gone, so we can't do any more analysis 
after the fact.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

  parent reply	other threads:[~2006-02-02 20:32 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-17 19:35 FYI: RAID5 unusably unstable through 2.6.14 Cynbe ru Taren
2006-01-17 19:39 ` Benjamin LaHaise
2006-01-17 20:13   ` Martin Drab
2006-01-17 23:39     ` Michael Loftis
2006-01-18  2:30       ` Martin Drab
2006-02-02 20:33     ` Bill Davidsen [this message]
2006-02-03  0:57       ` Martin Drab
2006-02-03  1:13         ` Martin Drab
2006-02-03 15:41         ` Phillip Susi
2006-02-03 16:13           ` Martin Drab
2006-02-03 16:38             ` Phillip Susi
2006-02-03 17:22               ` Roger Heflin
2006-02-03 19:38                 ` Phillip Susi
2006-02-03 17:51             ` Martin Drab
2006-02-03 19:10               ` Roger Heflin
2006-02-03 19:12                 ` Martin Drab
2006-02-03 19:41                   ` Phillip Susi
2006-02-03 19:45                     ` Martin Drab
2006-01-17 19:56 ` Kyle Moffett
2006-01-17 19:58 ` David R
2006-01-17 20:00 ` Kyle Moffett
2006-01-17 23:27 ` Michael Loftis
2006-01-18  0:12   ` Kyle Moffett
2006-01-18 11:24     ` Erik Mouw
2006-01-18  0:21   ` Phillip Susi
2006-01-18  0:29     ` Michael Loftis
2006-01-18  2:10       ` Phillip Susi
2006-01-18  3:01         ` Michael Loftis
2006-01-18 16:49           ` Krzysztof Halasa
2006-01-18 16:47         ` Krzysztof Halasa
2006-02-02 22:10     ` Bill Davidsen
2006-02-08 21:58       ` Pavel Machek
2006-01-18 10:54 ` Helge Hafting
2006-01-18 16:15   ` Mark Lord
2006-01-18 17:32     ` Alan Cox
2006-01-19 15:59       ` Mark Lord
2006-01-19 16:25         ` Alan Cox
2006-02-08 14:46           ` Alan Cox
2006-01-18 23:37     ` Neil Brown
2006-01-19 15:53       ` Mark Lord
2006-01-19  0:13 ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2006-02-03 17:00 Salyzyn, Mark
2006-02-03 17:39 ` Martin Drab
2006-02-03 19:46 ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43E26CB6.7030808@tmr.com \
    --to=davidsen@tmr.com \
    --cc=cynbe@muq.org \
    --cc=drab@kepler.fjfi.cvut.cz \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox