Logging-Loop when a drive in a raid1 fails.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Renner <michael.renner@geizhals.at>
To: linux-raid@vger.kernel.org
Subject: Logging-Loop when a drive in a raid1 fails.
Date: Mon, 13 Dec 2004 06:52:02 +0100	[thread overview]
Message-ID: <41BD2E02.8020100@geizhals.at> (raw)

[-- Attachment #1: Type: text/plain, Size: 1167 bytes --]

Hi,

one of the drives in a software raid1 failed, on a machine running 
2.6.9-rc2, leading to this "logging-spree" (see attachment).

Sorry if this has been fixed in the meanwhile; it's not that easy to 
test codepaths for failing drives with various kernels without having 
access to special block devices which support on-demand-failing.

Furthermore I'm a bit concerned about the overall quality of the md 
support in 2.6, this is not the first bug I see with codepaths which 
don't get used during normal operation. When all drives are fine the 
driver is rock solid, but as soon as a block device gets funky, hell 
tends to break loose. I'm a bit surprised because I've never had any 
problems in 2.4. Were there major api/code changes which could be the 
cause for that?

Another odd behaviour, for which I don't have exact information anymore, 
was when md tried to do a resync of a degraded raid5, hit a bad block on 
one of the (supposedly) "good" drives, and entered a tight loop of 
resync processes starting/aborting. This was with 2.6.9, unfortunately I 
wasn't able to take records because of time constraints and arising panic.

best regards,
michael

[-- Attachment #2: md-log.txt --]
[-- Type: text/plain, Size: 2996 bytes --]

Dec 13 02:03:13 stuff kernel: scsi0: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 01 5b ff 3c 00 00 08 00
Dec 13 02:03:13 stuff kernel: Info fld=0x15bff3e, Current sda: sense key Medium Error
Dec 13 02:03:13 stuff kernel: Additional sense: Unrecovered read error
Dec 13 02:03:13 stuff kernel: end_request: I/O error, dev sda, sector 22806332
Dec 13 02:03:13 stuff kernel: raid1: Disk failure on sda2, disabling device.
Dec 13 02:03:13 stuff kernel: ^IOperation continuing on 1 devices
Dec 13 02:03:13 stuff kernel: printk: 77 messages suppressed.
Dec 13 02:03:13 stuff kernel: raid1: sda2: rescheduling sector 18886472
Dec 13 02:03:13 stuff kernel: RAID1 conf printout:
Dec 13 02:03:13 stuff kernel:  --- wd:1 rd:2
Dec 13 02:03:13 stuff kernel:  disk 0, wo:1, o:0, dev:sda2
Dec 13 02:03:13 stuff kernel:  disk 1, wo:0, o:1, dev:sdb2
Dec 13 02:03:13 stuff kernel: RAID1 conf printout:
Dec 13 02:03:13 stuff kernel:  --- wd:1 rd:2
Dec 13 02:03:13 stuff kernel:  disk 1, wo:0, o:1, dev:sdb2
Dec 13 02:03:13 stuff kernel: raid1: sdb2: redirecting sector 18886472 to another mirror
Dec 13 02:03:13 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:13 stuff kernel: raid1: sdb2: redirecting sector 18886472 to another mirror
Dec 13 02:03:13 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:13 stuff kernel: raid1: sdb2: redirecting sector 18886472 to another mirror
Dec 13 02:03:13 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:13 stuff kernel: raid1: sdb2: redirecting sector 18886472 to another mirror
Dec 13 02:03:13 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:13 stuff kernel: raid1: sdb2: redirecting sector 18886472 to another mirror
Dec 13 02:03:18 stuff kernel: printk: 119408 messages suppressed.
Dec 13 02:03:18 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:23 stuff kernel: printk: 119581 messages suppressed.
Dec 13 02:03:23 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:28 stuff kernel: printk: 125937 messages suppressed.
Dec 13 02:03:28 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:33 stuff kernel: printk: 132411 messages suppressed.
Dec 13 02:03:34 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:38 stuff kernel: printk: 128773 messages suppressed.
Dec 13 02:03:39 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:43 stuff kernel: printk: 129557 messages suppressed.
Dec 13 02:03:44 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:48 stuff kernel: printk: 130365 messages suppressed.
Dec 13 02:03:49 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:53 stuff kernel: printk: 129959 messages suppressed.
Dec 13 02:03:54 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:58 stuff kernel: printk: 125235 messages suppressed.
Dec 13 02:03:59 stuff kernel: raid1: sdb2: rescheduling sector 18886472
[.. Continue ad nauseam ..]

next             reply	other threads:[~2004-12-13  5:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-13  5:52 Michael Renner [this message]
2004-12-13 15:07 ` Logging-Loop when a drive in a raid1 fails Paul Clements
2004-12-14 10:19   ` Michael Renner
2004-12-14 15:19     ` Paul Clements

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41BD2E02.8020100@geizhals.at \
    --to=michael.renner@geizhals.at \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).