linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert L Mathews <lists@tigertech.com>
To: linux-raid@vger.kernel.org
Subject: RAID 1 failure on single disk causes disk subsystem to lock up
Date: Sun, 30 Mar 2008 16:22:46 -0700	[thread overview]
Message-ID: <47F020C6.1060809@tigertech.com> (raw)

I'm using a two-disk SATA RAID 1 array on a number of identical servers, 
currently running kernel 2.6.8 (I know that's outdated; we use security 
backports and will soon be upgrading to 2.6.18).

Over the last year, a disk has failed on three different servers (with 
different brands of disks).

What I'd hope to happen in such situations is that the bad disk would be 
dropped from the RAID array automatically, and the machine would 
continue running with a degraded array.

However, in all three cases, that's not what happened. Instead, 
something like the following is printed to dmesg:

  ata2: command 0x35 timeout, stat 0xd0 host_stat 0x20
  scsi1: ERROR on channel 0, id 0, lun 0, CDB: Write (10) 00 07 b2 c7 80 
00 00 10 00
  Current sdb: sense key Medium Error
  Additional sense: Write error - auto reallocation failed
  end_request: I/O error, dev sdb, sector 129156992
  ATA: abnormal status 0xD0 on port 0xE407

Once this happens, all disk reads and writes fail to complete. "top" and 
"ps" show many processes stuck in the "D" state, from which they never 
recover. Using "kill -9" on them has no effect.

If I run a new program that requires disk access, that program hangs the 
terminal and can't be killed.

Using "iostat" shows no reads or writes occurring either at the md layer 
or on the underlying /dev/sda and /dev/sdb devices, although the "%util" 
column, oddly, shows 100% usage for the failed disk.

Running any mdadm command doesn't work. I don't see anything on the 
screen and that terminal hangs, presumably because mdadm tries doing 
disk access and gets hung in the "D" state, too.

I've waited several minutes to see if the machine will recover, and it 
doesn't. I eventually have to power cycle it.

Shouldn't the write error cause the bad disk to be gracefully removed 
from the array? Is this something that's likely to work better when we 
upgrade to a newer kernel version?

-- 
Robert L Mathews

  "In the beginning, the universe was created. This has made a lot of
   people very angry and has been widely regarded as a bad move."
                                                    -- Douglas Adams

             reply	other threads:[~2008-03-30 23:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-30 23:22 Robert L Mathews [this message]
2008-03-31 10:01 ` RAID 1 failure on single disk causes disk subsystem to lock up Justin Piszcz
2008-03-31 17:30   ` Robert L Mathews
2008-03-31 19:12     ` Justin Piszcz
     [not found] ` <47F0281F.1070404@harddata.com>
2008-03-31 17:27   ` Robert L Mathews
2008-03-31 19:54     ` Peter Grandi
2008-04-01 19:33       ` Richard Scobie
2008-04-02 18:33       ` Robert L Mathews
2008-04-03 21:01         ` Peter Grandi
2008-04-04 19:11           ` Robert L Mathews
2008-04-02 17:43 ` Bill Davidsen
2008-04-02 18:42   ` Robert L Mathews

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47F020C6.1060809@tigertech.com \
    --to=lists@tigertech.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).