linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: wangkaird <wangkaird@gmail.com>
To: linux-raid@vger.kernel.org
Subject: raid5: a problem of repeat recovery
Date: Sun, 18 Aug 2013 21:28:21 +0800	[thread overview]
Message-ID: <1376832501.8512.5.camel@bogon> (raw)

Hello,
I'm currently fighting a raid problem with a wired log as follows:
(md19 is a 3-devices raid5)

......
Aug 13 09:40:06 node-0 kernel: md19: detected capacity change from
2147483648 to 0
Aug 13 09:40:06 node-0 kernel: md: md19 stopped.
Aug 13 09:40:06 node-0 kernel: md: unbind<dm-117>
Aug 13 09:40:06 node-0 kernel: md: export_rdev(dm-117)
Aug 13 09:40:06 node-0 kernel: md: unbind<dm-420>
Aug 13 09:40:06 node-0 kernel: md: bind<dm-420>
Aug 13 09:40:06 node-0 kernel: raid5: device dm-420 operational as raid
disk 1
Aug 13 09:40:06 node-0 kernel: raid5: device dm-304 operational as raid
disk 0
Aug 13 09:40:06 node-0 kernel: raid5: allocated 3230kB for md19
Aug 13 09:40:06 node-0 kernel: 1: w=1 pa=0 pr=3 m=1 a=0 r=3 op1=0 op2=0
Aug 13 09:40:06 node-0 kernel: 0: w=2 pa=0 pr=3 m=1 a=0 r=3 op1=0 op2=0
Aug 13 09:40:06 node-0 kernel: raid5: raid level 4 set md19 active with
2 out of 3 devices, algorithm 0
Aug 13 09:40:06 node-0 kernel: md19: bitmap initialized from disk: read
1/1 pages, set 0 bits
Aug 13 09:40:06 node-0 kernel: created bitmap (1 pages) for device md19
Aug 13 09:40:06 node-0 kernel: md19: detected capacity change from 0 to
2147483648
Aug 13 09:40:06 node-0 kernel: md19: detected capacity change from 0 to
2147483648
Aug 13 09:40:06 node-0 kernel: md19: unknown partition table
Aug 13 09:40:06 node-0 kernel: md: bind<dm-117>
Aug 13 09:40:06 node-0 kernel: RAID5 conf printout:
Aug 13 09:40:06 node-0 kernel: --- rd:3 wd:2
Aug 13 09:40:06 node-0 kernel: disk 0, o:1, dev:dm-304
Aug 13 09:40:06 node-0 kernel: disk 1, o:1, dev:dm-420
Aug 13 09:40:06 node-0 kernel: disk 2, o:1, dev:dm-117
Aug 13 09:40:06 node-0 kernel: md: recovery of RAID array md19
Aug 13 09:40:06 node-0 kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Aug 13 09:40:06 node-0 kernel: md: using maximum available idle IO
bandwidth (but not more than 400000 KB/sec) for recovery.
Aug 13 09:40:06 node-0 kernel: md: using 128k window, over a total of
1048576 blocks.
Aug 13 09:40:06 node-0 kernel: md: md19: recovery done.
Aug 13 09:40:06 node-0 kernel: RAID5 conf printout:
Aug 13 09:40:06 node-0 kernel: --- rd:3 wd:3
Aug 13 09:40:06 node-0 kernel: disk 0, o:1, dev:dm-304
Aug 13 09:40:06 node-0 kernel: disk 1, o:1, dev:dm-420
Aug 13 09:40:06 node-0 kernel: disk 2, o:1, dev:dm-117

An hour later, something wrong with dm-304. 

Aug 13 10:57:29 node-0 kernel: raid5: Disk failure on dm-304, disabling
device.
Aug 13 10:57:29 node-0 kernel: raid5: Operation continuing on 2 devices.
Aug 13 10:57:29 node-0 kernel: md: recovery of RAID array md19
Aug 13 10:57:29 node-0 kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Aug 13 10:57:29 node-0 kernel: md: using maximum available idle IO
bandwidth (but not more than 400000 KB/sec) for recovery.
Aug 13 10:57:29 node-0 kernel: md: using 128k window, over a total of
1048576 blocks.
Aug 13 10:57:29 node-0 kernel: md: resuming recovery of md19 from
checkpoint.
Aug 13 10:57:29 node-0 kernel: md: md19: recovery done.
Aug 13 10:57:29 node-0 kernel: RAID5 conf printout:
Aug 13 10:57:29 node-0 kernel: --- rd:3 wd:2
Aug 13 10:57:29 node-0 kernel: disk 0, o:0, dev:dm-304
Aug 13 10:57:29 node-0 kernel: disk 1, o:1, dev:dm-420
Aug 13 10:57:29 node-0 kernel: disk 2, o:1, dev:dm-117

And the wired thing begins. 
The log as follows, was printed thousands of time.

Aug 13 10:57:29 node-0 kernel: md: recovery of RAID array md19
Aug 13 10:57:29 node-0 kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Aug 13 10:57:29 node-0 kernel: md: using maximum available idle IO
bandwidth (but not more than 400000 KB/sec) for recovery.
Aug 13 10:57:29 node-0 kernel: md: using 128k window, over a total of
1048576 blocks.
Aug 13 10:57:29 node-0 kernel: md: resuming recovery of md19 from
checkpoint.
Aug 13 10:57:29 node-0 kernel: md: md19: recovery done.
Aug 13 10:57:29 node-0 kernel: RAID5 conf printout:
Aug 13 10:57:29 node-0 kernel: --- rd:3 wd:2
Aug 13 10:57:29 node-0 kernel: disk 0, o:0, dev:dm-304
Aug 13 10:57:29 node-0 kernel: disk 1, o:1, dev:dm-420
Aug 13 10:57:29 node-0 kernel: disk 2, o:1, dev:dm-117

finally, it finished.

Aug 13 10:57:30 node-0 kernel: md: recovery of RAID array md19
Aug 13 10:57:30 node-0 kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Aug 13 10:57:30 node-0 kernel: md: using maximum available idle IO
bandwidth (but not more than 400000 KB/sec) for recovery.
Aug 13 10:57:30 node-0 kernel: md: using 128k window, over a total of
1048576 blocks.
Aug 13 10:57:30 node-0 kernel: md: resuming recovery of md19 from
checkpoint.
Aug 13 10:57:30 node-0 kernel: md: md19: recovery done.
Aug 13 10:57:30 node-0 kernel: end_request: I/O error, dev dm-297,
sector 2048
Aug 13 10:57:30 node-0 kernel: end_request: I/O error, dev dm-297,
sector 1152
Aug 13 10:57:30 node-0 kernel: end_request: I/O error, dev dm-297,
sector 512
Aug 13 10:57:30 node-0 kernel: end_request: I/O error, dev dm-297,
sector 0
Aug 13 10:57:30 node-0 kernel: RAID5 conf printout:
Aug 13 10:57:30 node-0 kernel: --- rd:3 wd:2
Aug 13 10:57:30 node-0 kernel: disk 0, o:0, dev:dm-304
Aug 13 10:57:30 node-0 kernel: disk 1, o:1, dev:dm-420
Aug 13 10:57:30 node-0 kernel: disk 2, o:1, dev:dm-117
Aug 13 10:57:30 node-0 kernel: RAID5 conf printout:
Aug 13 10:57:30 node-0 kernel: --- rd:3 wd:2
Aug 13 10:57:30 node-0 kernel: disk 0, o:0, dev:dm-304
Aug 13 10:57:30 node-0 kernel: disk 1, o:1, dev:dm-420
Aug 13 10:57:30 node-0 kernel: disk 2, o:1, dev:dm-117
Aug 13 10:57:30 node-0 kernel: RAID5 conf printout:
Aug 13 10:57:30 node-0 kernel: --- rd:3 wd:2
Aug 13 10:57:30 node-0 kernel: disk 1, o:1, dev:dm-420
Aug 13 10:57:30 node-0 kernel: disk 2, o:1, dev:dm-117

and then, dm-304, dm-420, dm-117 were unbound and exported.
my system is centos 6.0 and the version of mdadm is 3.2.2 .
This problem confuses me for a long time. Who can tell me what happened?



                 reply	other threads:[~2013-08-18 13:28 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1376832501.8512.5.camel@bogon \
    --to=wangkaird@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).