An oddity: UNC error while re-adding/resyncing

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* An oddity: UNC error while re-adding/resyncing
@ 2010-03-26  0:43 John Robinson
  2010-03-26  3:50 ` Michael Evans
  0 siblings, 1 reply; 2+ messages in thread
From: John Robinson @ 2010-03-26  0:43 UTC (permalink / raw)
  To: Linux RAID

I did `mdadm --add /dev/md1 /dev/sdd2` and got the following in my 
kernel log:

Mar 25 23:56:21 beast kernel: md: bind<sdd2>
Mar 25 23:56:21 beast kernel: RAID5 conf printout:
Mar 25 23:56:21 beast kernel:  --- rd:3 wd:2 fd:1
Mar 25 23:56:21 beast kernel:  disk 0, o:1, dev:sda2
Mar 25 23:56:21 beast kernel:  disk 1, o:1, dev:sdb2
Mar 25 23:56:21 beast kernel:  disk 2, o:1, dev:sdd2
Mar 25 23:56:21 beast kernel: md: syncing RAID array md1
Mar 25 23:56:21 beast kernel: md: minimum _guaranteed_ reconstruction 
speed: 1000 KB/sec/disc.
Mar 25 23:56:21 beast kernel: md: using maximum available idle IO 
bandwidth (but not more than 2
00000 KB/sec) for reconstruction.
Mar 25 23:56:21 beast kernel: md: using 128k window, over a total of 
976655360 blocks.
Mar 25 23:56:22 beast kernel: ata3.00: exception Emask 0x0 SAct 0x3 SErr 
0x0 action 0x0
Mar 25 23:56:22 beast kernel: ata3.00: irq_stat 0x40000008
Mar 25 23:56:22 beast kernel: ata3.00: cmd 
60/00:00:a5:3f:03/04:00:00:00:00/40 tag 0 ncq 524288
in
Mar 25 23:56:25 beast kernel:          res 
41/40:00:a0:41:03/8c:00:00:00:00/40 Emask 0x409 (medi
a error) <F>
Mar 25 23:56:25 beast kernel: ata3.00: status: { DRDY ERR }
Mar 25 23:56:26 beast kernel: ata3.00: error: { UNC }
Mar 25 23:56:26 beast kernel: ata3.00: configured for UDMA/133
Mar 25 23:56:26 beast kernel: ata3: EH complete
Mar 25 23:56:26 beast kernel: SCSI device sda: 1953525168 512-byte hdwr 
sectors (1000205 MB)
Mar 25 23:56:26 beast kernel: sda: Write Protect is off
Mar 25 23:56:27 beast kernel: SCSI device sda: drive cache: write back
Mar 25 23:56:27 beast kernel: ata3.00: exception Emask 0x0 SAct 0x3 SErr 
0x0 action 0x0
Mar 25 23:56:28 beast kernel: ata3.00: irq_stat 0x40000008
Mar 25 23:56:28 beast kernel: ata3.00: cmd 
60/00:08:a5:3f:03/04:00:00:00:00/40 tag 1 ncq 524288
in
Mar 25 23:56:28 beast kernel:          res 
41/40:00:a2:41:03/8c:00:00:00:00/40 Emask 0x409 (medi
a error) <F>
Mar 25 23:56:28 beast kernel: ata3.00: status: { DRDY ERR }
Mar 25 23:56:28 beast kernel: ata3.00: error: { UNC }
Mar 25 23:56:29 beast kernel: ata3.00: configured for UDMA/133
Mar 25 23:56:29 beast kernel: ata3: EH complete
Mar 25 23:56:29 beast kernel: SCSI device sda: 1953525168 512-byte hdwr 
sectors (1000205 MB)
Mar 25 23:56:29 beast kernel: sda: Write Protect is off
Mar 25 23:56:29 beast kernel: SCSI device sda: drive cache: write back
Mar 25 23:56:34 beast kernel: md: md1: sync done.
Mar 25 23:56:34 beast kernel: RAID5 conf printout:
Mar 25 23:56:34 beast kernel:  --- rd:3 wd:3 fd:0
Mar 25 23:56:34 beast kernel:  disk 0, o:1, dev:sda2
Mar 25 23:56:34 beast kernel:  disk 1, o:1, dev:sdb2
Mar 25 23:56:34 beast kernel:  disk 2, o:1, dev:sdd2

i.e. a brief whinge about another of the discs in the RAID, while doing 
the resync. And this is repeatable. Now, is this simply a sign that I 
need a new disc, or is there something else funny going on? It's not as 
if either of the discs (the one I was re-adding or the one that had the 
UNC during the resync) is getting dropped from the array. But the one 
with the UNC does have one offline uncorrectable and two current pending 
sectors, according to smartctl.

NB CentOS 5, 2.6.18-128.4.1.el5 kernel, mdadm 2.6.4. Probably time I 
updated a few packages.

Cheers,

John.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: An oddity: UNC error while re-adding/resyncing
  2010-03-26  0:43 An oddity: UNC error while re-adding/resyncing John Robinson
@ 2010-03-26  3:50 ` Michael Evans
  0 siblings, 0 replies; 2+ messages in thread
From: Michael Evans @ 2010-03-26  3:50 UTC (permalink / raw)
  To: John Robinson; +Cc: Linux RAID

On Thu, Mar 25, 2010 at 5:43 PM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
> I did `mdadm --add /dev/md1 /dev/sdd2` and got the following in my kernel
> log:
>
> Mar 25 23:56:21 beast kernel: md: bind<sdd2>
> Mar 25 23:56:21 beast kernel: RAID5 conf printout:
> Mar 25 23:56:21 beast kernel:  --- rd:3 wd:2 fd:1
> Mar 25 23:56:21 beast kernel:  disk 0, o:1, dev:sda2
> Mar 25 23:56:21 beast kernel:  disk 1, o:1, dev:sdb2
> Mar 25 23:56:21 beast kernel:  disk 2, o:1, dev:sdd2
> Mar 25 23:56:21 beast kernel: md: syncing RAID array md1
> Mar 25 23:56:21 beast kernel: md: minimum _guaranteed_ reconstruction speed:
> 1000 KB/sec/disc.
> Mar 25 23:56:21 beast kernel: md: using maximum available idle IO bandwidth
> (but not more than 2
> 00000 KB/sec) for reconstruction.
> Mar 25 23:56:21 beast kernel: md: using 128k window, over a total of
> 976655360 blocks.
> Mar 25 23:56:22 beast kernel: ata3.00: exception Emask 0x0 SAct 0x3 SErr 0x0
> action 0x0
> Mar 25 23:56:22 beast kernel: ata3.00: irq_stat 0x40000008
> Mar 25 23:56:22 beast kernel: ata3.00: cmd
> 60/00:00:a5:3f:03/04:00:00:00:00/40 tag 0 ncq 524288
> in
> Mar 25 23:56:25 beast kernel:          res
> 41/40:00:a0:41:03/8c:00:00:00:00/40 Emask 0x409 (medi
> a error) <F>
> Mar 25 23:56:25 beast kernel: ata3.00: status: { DRDY ERR }
> Mar 25 23:56:26 beast kernel: ata3.00: error: { UNC }
> Mar 25 23:56:26 beast kernel: ata3.00: configured for UDMA/133
> Mar 25 23:56:26 beast kernel: ata3: EH complete
> Mar 25 23:56:26 beast kernel: SCSI device sda: 1953525168 512-byte hdwr
> sectors (1000205 MB)
> Mar 25 23:56:26 beast kernel: sda: Write Protect is off
> Mar 25 23:56:27 beast kernel: SCSI device sda: drive cache: write back
> Mar 25 23:56:27 beast kernel: ata3.00: exception Emask 0x0 SAct 0x3 SErr 0x0
> action 0x0
> Mar 25 23:56:28 beast kernel: ata3.00: irq_stat 0x40000008
> Mar 25 23:56:28 beast kernel: ata3.00: cmd
> 60/00:08:a5:3f:03/04:00:00:00:00/40 tag 1 ncq 524288
> in
> Mar 25 23:56:28 beast kernel:          res
> 41/40:00:a2:41:03/8c:00:00:00:00/40 Emask 0x409 (medi
> a error) <F>
> Mar 25 23:56:28 beast kernel: ata3.00: status: { DRDY ERR }
> Mar 25 23:56:28 beast kernel: ata3.00: error: { UNC }
> Mar 25 23:56:29 beast kernel: ata3.00: configured for UDMA/133
> Mar 25 23:56:29 beast kernel: ata3: EH complete
> Mar 25 23:56:29 beast kernel: SCSI device sda: 1953525168 512-byte hdwr
> sectors (1000205 MB)
> Mar 25 23:56:29 beast kernel: sda: Write Protect is off
> Mar 25 23:56:29 beast kernel: SCSI device sda: drive cache: write back
> Mar 25 23:56:34 beast kernel: md: md1: sync done.
> Mar 25 23:56:34 beast kernel: RAID5 conf printout:
> Mar 25 23:56:34 beast kernel:  --- rd:3 wd:3 fd:0
> Mar 25 23:56:34 beast kernel:  disk 0, o:1, dev:sda2
> Mar 25 23:56:34 beast kernel:  disk 1, o:1, dev:sdb2
> Mar 25 23:56:34 beast kernel:  disk 2, o:1, dev:sdd2
>
> i.e. a brief whinge about another of the discs in the RAID, while doing the
> resync. And this is repeatable. Now, is this simply a sign that I need a new
> disc, or is there something else funny going on? It's not as if either of
> the discs (the one I was re-adding or the one that had the UNC during the
> resync) is getting dropped from the array. But the one with the UNC does
> have one offline uncorrectable and two current pending sectors, according to
> smartctl.
>
> NB CentOS 5, 2.6.18-128.4.1.el5 kernel, mdadm 2.6.4. Probably time I updated
> a few packages.
>
> Cheers,
>
> John.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Niel, I'm not sure if this is good advice or not, since the data is
the same it may be cached.  However I propose:

1) resync the device (validate the reads are good)  -- scratch that
it's raid 5 and doesn't know to assign lesser trust to slower drives.

1) Unmount the filesystem in question (use a recover cd or usb drive whatever)
2) Determine your DATA stripe size, In this case it appears to be
(128K per drive? for 256K per stripe?) or 128K (per stripe)?
3) badblocks -b $((256*1024)) -n /dev/whatever

-n is non-destructive read-write; which should cause the entire device
contents to be read and safely re-written to the drives.  This should
cause the replacement of any pending sectors.

This is less optimal than just performing the desired operation on the
segment in question, but a LOT safer since the tools in question take
effort to make mistakes.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-03-26  3:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-26  0:43 An oddity: UNC error while re-adding/resyncing John Robinson
2010-03-26  3:50 ` Michael Evans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).