Logging-Loop when a drive in a raid1 fails.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Michael Renner <michael.renner@geizhals.at>
To: linux-raid@vger.kernel.org
Subject: Logging-Loop when a drive in a raid1 fails.
Date: Mon, 13 Dec 2004 06:52:02 +0100	[thread overview]
Message-ID: <41BD2E02.8020100@geizhals.at> (raw)

[-- Attachment #1: Type: text/plain, Size: 1167 bytes --]

Hi,

one of the drives in a software raid1 failed, on a machine running 
2.6.9-rc2, leading to this "logging-spree" (see attachment).

Sorry if this has been fixed in the meanwhile; it's not that easy to 
test codepaths for failing drives with various kernels without having 
access to special block devices which support on-demand-failing.

Furthermore I'm a bit concerned about the overall quality of the md 
support in 2.6, this is not the first bug I see with codepaths which 
don't get used during normal operation. When all drives are fine the 
driver is rock solid, but as soon as a block device gets funky, hell 
tends to break loose. I'm a bit surprised because I've never had any 
problems in 2.4. Were there major api/code changes which could be the 
cause for that?

Another odd behaviour, for which I don't have exact information anymore, 
was when md tried to do a resync of a degraded raid5, hit a bad block on 
one of the (supposedly) "good" drives, and entered a tight loop of 
resync processes starting/aborting. This was with 2.6.9, unfortunately I 
wasn't able to take records because of time constraints and arising panic.

best regards,
michael

[-- Attachment #2: md-log.txt --]
[-- Type: text/plain, Size: 2996 bytes --]

Dec 13 02:03:13 stuff kernel: scsi0: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 01 5b ff 3c 00 00 08 00
Dec 13 02:03:13 stuff kernel: Info fld=0x15bff3e, Current sda: sense key Medium Error
Dec 13 02:03:13 stuff kernel: Additional sense: Unrecovered read error
Dec 13 02:03:13 stuff kernel: end_request: I/O error, dev sda, sector 22806332
Dec 13 02:03:13 stuff kernel: raid1: Disk failure on sda2, disabling device.
Dec 13 02:03:13 stuff kernel: ^IOperation continuing on 1 devices
Dec 13 02:03:13 stuff kernel: printk: 77 messages suppressed.
Dec 13 02:03:13 stuff kernel: raid1: sda2: rescheduling sector 18886472
Dec 13 02:03:13 stuff kernel: RAID1 conf printout:
Dec 13 02:03:13 stuff kernel:  --- wd:1 rd:2
Dec 13 02:03:13 stuff kernel:  disk 0, wo:1, o:0, dev:sda2
Dec 13 02:03:13 stuff kernel:  disk 1, wo:0, o:1, dev:sdb2
Dec 13 02:03:13 stuff kernel: RAID1 conf printout:
Dec 13 02:03:13 stuff kernel:  --- wd:1 rd:2
Dec 13 02:03:13 stuff kernel:  disk 1, wo:0, o:1, dev:sdb2
Dec 13 02:03:13 stuff kernel: raid1: sdb2: redirecting sector 18886472 to another mirror
Dec 13 02:03:13 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:13 stuff kernel: raid1: sdb2: redirecting sector 18886472 to another mirror
Dec 13 02:03:13 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:13 stuff kernel: raid1: sdb2: redirecting sector 18886472 to another mirror
Dec 13 02:03:13 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:13 stuff kernel: raid1: sdb2: redirecting sector 18886472 to another mirror
Dec 13 02:03:13 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:13 stuff kernel: raid1: sdb2: redirecting sector 18886472 to another mirror
Dec 13 02:03:18 stuff kernel: printk: 119408 messages suppressed.
Dec 13 02:03:18 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:23 stuff kernel: printk: 119581 messages suppressed.
Dec 13 02:03:23 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:28 stuff kernel: printk: 125937 messages suppressed.
Dec 13 02:03:28 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:33 stuff kernel: printk: 132411 messages suppressed.
Dec 13 02:03:34 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:38 stuff kernel: printk: 128773 messages suppressed.
Dec 13 02:03:39 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:43 stuff kernel: printk: 129557 messages suppressed.
Dec 13 02:03:44 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:48 stuff kernel: printk: 130365 messages suppressed.
Dec 13 02:03:49 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:53 stuff kernel: printk: 129959 messages suppressed.
Dec 13 02:03:54 stuff kernel: raid1: sdb2: rescheduling sector 18886472
Dec 13 02:03:58 stuff kernel: printk: 125235 messages suppressed.
Dec 13 02:03:59 stuff kernel: raid1: sdb2: rescheduling sector 18886472
[.. Continue ad nauseam ..]

next             reply	other threads:[~2004-12-13  5:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-13  5:52 Michael Renner [this message]
2004-12-13 15:07 ` Logging-Loop when a drive in a raid1 fails Paul Clements
2004-12-14 10:19   ` Michael Renner
2004-12-14 15:19     ` Paul Clements

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41BD2E02.8020100@geizhals.at \
    --to=michael.renner@geizhals.at \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.