Re: Lose two disks during Raid 10 rebuild

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Lose two disks during Raid 10 rebuild
       [not found] <8E77BA43C8998042B05BB83386E8CF044CE0C8@SFO1EXC-MBXP06.nbttech.com>
@ 2012-08-23 21:07 ` NeilBrown
  0 siblings, 0 replies; only message in thread
From: NeilBrown @ 2012-08-23 21:07 UTC (permalink / raw)
  To: Steven La; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 4564 bytes --]

On Thu, 23 Aug 2012 19:28:27 +0000 Steven La <Steven.La@riverbed.com> wrote:

> Hello all,
> 
> Got the following messages from syslog during Raid 10 rebuild cycle.
> 
> Aug  3 01:48:11 oak-sh283 kernel: sd 0:0:0:0: [sda] Unhandled sense code
> Aug  3 01:48:11 oak-sh283 kernel: sd 0:0:0:0: [sda] Result: hostbyte=invalid
> driverbyte=DRIVER_SENSE
> Aug  3 01:48:11 oak-sh283 kernel: sd 0:0:0:0: [sda] Sense Key : Medium Error
> [current]


"Medium Error" normally means that the recording medium (magnetic regions) is
corrupt in some way and a valid data block cannot be extracted.



> Aug  3 01:48:11 oak-sh283 kernel: Info fld=0x3ae0f43c
> Aug  3 01:48:11 oak-sh283 kernel: sd 0:0:0:0: [sda] Add. Sense: Unrecovered
> read error
> Aug  3 01:48:11 oak-sh283 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 3a e0
> f3 ab 00 01 00 00
> Aug  3 01:48:11 oak-sh283 kernel: end_request: I/O error, dev sda, sector
> 987821116
> Aug  3 01:48:11 oak-sh283 kernel: md/raid10:md7: Disk failure on sda8,
> disabling device.
> Aug  3 01:48:11 oak-sh283 kernel: md/raid10:md7: Operation continuing on 2
> devices.
> Aug  3 01:48:11 oak-sh283 kernel: md: md7: recovery done.
> Aug  3 01:48:11 oak-sh283 kernel: md/raid10:md7: Disk failure on sdc8,
> disabling device.

Presumably md7 was trying to recover sdc8 from sda8.  It got a data error on
sda8, so could not recover sda8 and so marked it as failed.

> Aug  3 01:48:11 oak-sh283 kernel: md/raid10:md7: Operation continuing on 2
> devices.
> Aug  3 01:48:14 oak-sh283 kernel: md: unbind<sdc8>
> Aug  3 01:48:14 oak-sh283 kernel: md: export_rdev(sdc8)
> Aug  3 01:48:14 oak-sh283 kernel: md: unbind<sda8>
> Aug  3 01:48:14 oak-sh283 kernel: md: export_rdev(sda8)
> Aug  3 01:48:16 oak-sh283 raid_rebuild: Sending sighup to hald[22152] for event
> RebuildFinished for /dev/md7
> 
> 
> [admin@oak-sh283 ~]# cat /proc/mdstat
> 
> Personalities : [linear] [raid0] [raid1] [raid10]
> 
> md5 : active raid10 sdc9[1] sde9[2] sdg9[3] sda9[0]
> 
>       562997760 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
> 
> 
> md7 : active raid10 sde8[2] sdg8[3]
> 
>       562997760 blocks 64K chunks 2 near-copies [4/2] [__UU]
> 
> 
> 
> md6 : active raid10 sdc7[1] sde7[2] sdg7[3] sda7[0]
> 
>       562997760 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
> 
> 
> md3 : active raid10 sdc6[1] sde6[2] sdg6[3] sda6[0]
> 
>       52435968 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
> 
> 
> md0 : active raid10 sdc2[1] sde2[2] sdg2[3] sda2[0]
> 
>       10490240 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
> 
> 
> md4 : active raid10 sdb3[0] sdh3[3] sdf3[2] sdd3[1]
> 
>       19518720 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
> 
> 
> md2 : active raid10 sdc3[1] sde3[2] sdg3[3] sda3[0]
> 
>       67119360 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
> 
> 
> md1 : active raid10 sdc5[1] sde5[2] sdg5[3] sda5[0]
> 
>       134222848 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
> 
> 
> 
> 
> >From the error message below (also shown above), the block that cannot be read from sda
> 
> has lba=0x3ae0f3ab.
> 
> 
> 
> Aug  3 01:48:11 oak-sh283 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 3a e0
> 
> f3 ab 00 01 00 00
> 
> 
> [admin@oak-sh283 ~]# fdisk -s /dev/sda
> 
> 976762584

This number is in kilobytes. 1 TB.

> 
> 
> 
> The last block on the drive is 0x3a3836d8

This is a sector number. 976762584 sectors or 500102443008 bytes into the
device.  About half way.


You can probably correct the bad sector by
 dd if=/dev/zero of=/dev/sda seek=976762584 count=1 oflag=direct

I would try to read from the address first yo make sure it is in error:

 dd of=/dev/null if=/dev/sda skip=976762584 count=1 oflag=direct

Then read the entire device to ensure there are no other media errors.
Then stop the array and re-assemble with --force.
Then try the recovery again.

NeilBrown





> 
> 
> 
> (gdb) p/x 976762584
> 
> $1 = 0x3a3836d8
> 
> (gdb) p 0x3ae0f3ab
> 
> $2 = 987820971
> 
> So, it seems like the lba number used in the Read(10) command has exceeded the last block of the drive.
> Has anyone had this problem before? What else can I look at?
> 
> Relevant info are shown below,
> 
> [admin@oak-sh283 ~]# mdadm -V
> mdadm - v2.6.4 - 19th October 2007
> 
> [admin@oak-sh283 ~]# uname -a
> Linux oak-sh283 2.6.32 #1 SMP Wed Aug 1 01:38:35 PDT 2012 x86_64 x86_64 x86_64 GNU/Linux
> 
> Thanks and regards,
> --Steven
> 
> 
> 
> 
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2012-08-23 21:07 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <8E77BA43C8998042B05BB83386E8CF044CE0C8@SFO1EXC-MBXP06.nbttech.com>
2012-08-23 21:07 ` Lose two disks during Raid 10 rebuild NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).