From: Phil Lobbes <phil@perkpartners.com>
To: linux-raid@vger.kernel.org
Subject: RAID1 == two different ARRAY in scan, and Q on read error corrected
Date: Fri, 18 Apr 2008 15:35:59 -0400 [thread overview]
Message-ID: <27567.1208547359@perkpartners.com> (raw)
Hi,
I have been lurking for a little while on the mail list and been doing
some investigation on my own. I don't mean to impose and hopefully this
is the right forum for these questions. If anyone has some
suggestions/recommendations/guidance on the following two questions I'm
all ears!
_________________________________________________________________
Q1: RAID1 == two different ARRAY in scan
I recently upgraded my server from Fedora Core 5 to Fedora 8 and along
with that I noticed something that either overlooked before or perhaps
caused during the upgrade. On that system I have a 300G RAID1 mirror:
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdc1[0] sdd1[1]
293049600 blocks [2/2] [UU]
unused devices: <none>
When I use mdadm --examine --scan my 300G RAID1 mirror returns two
separate UUIDs with different devices for each:
* (correct) a "complete disk partition" aka /dev/sd{c,d}1
* (bogus) a entire device aka /dev/sd{c,d}
# mdadm --examine --scan --verbose
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=12c2d7a3:0b791468:9e965247:f4354b36
devices=/dev/sdd,/dev/sdc
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=7b879b21:7cc83b9c:765dd3f3:2af46d19
devices=/dev/sdd1,/dev/sdc1
I didn't find a match in a FAQ or other posting so I was hoping to get
some insight/pointers here.
Should I:
a. Ignore this?
b. Zero out the superblock on sd{c,d}? I'm no expert here so not
positive this is a good option. My theory is that a superblock for
sdc must be different than a superblock for sdc1 so if that is
correct the "fix" might be something like:
# mdadm --zero-superblock /dev/sdc /dev/sdd
Is this correct and safe? No worries about it somehow impacting
/dev/sdc1 and /dev/sdd1 and the good mirror, right?
c. Something else altogether?
For what it's worth, I suppose there is a chance I may have caused this
by trying to 'rename' the md# used by the ARRAY /dev/md0 => /dev/md3.
-----------------------------------------------------------------
* Disk/Partition info:
NOTE: Valid mirror is for partition /dev/sd{c,d}1 (not device
/dev/sd{c,d})
# fdisk -l /dev/sdc /dev/sdd
Disk /dev/sdc: 300.0 GB, 300090728448 bytes
255 heads, 63 sectors/track, 36483 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdc1 1 36483 293049666 fd Linux raid autodetect
Disk /dev/sdd: 300.0 GB, 300090728448 bytes
255 heads, 63 sectors/track, 36483 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdd1 1 36483 293049666 fd Linux raid autodetect
_________________________________________________________________
* Q2: On read error corrected messages
On an unrelated note, during/after the upgrade I noticed that I'm now
seeing a few of these events logged:
Apr 15 11:07:14 kernel: raid1: sdc1: rescheduling sector 517365296
Apr 15 11:07:54 kernel: raid1:md0: read error corrected (8 sectors at 517365296 on sdc1)
Apr 15 11:07:54 kernel: raid1: sdc1: redirecting sector 517365296 to another mirror
Apr 15 11:08:32 kernel: raid1: sdc1: rescheduling sector 517365472
Apr 15 11:09:09 kernel: raid1:md0: read error corrected (8 sectors at 517365472 on sdc1)
Apr 15 11:09:09 kernel: raid1: sdc1: redirecting sector 517365472 to another mirror
And also more of these:
Apr 18 14:01:45 smartd[2104]: Device: /dev/sdc, 3 Currently unreadable (pending) sectors
Apr 18 14:01:45 smartd[2104]: Device: /dev/sdc, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 240 to 241
Apr 18 14:01:45 smartd[2104]: Device: /dev/sdd, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 238 to 239
Here's some info from smartctl:
# smartctl -a /dev/sdc
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Maxtor DiamondMax 10 family (ATA/133 and SATA/150)
Device Model: Maxtor 6B300S0
Serial Number: B60370HH
Firmware Version: BANC1980
User Capacity: 300,090,728,448 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Fri Apr 18 15:09:02 2008 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
...
SMART Error Log Version: 1
ATA Error Count: 36 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 36 occurred at disk power-on lifetime: 27108 hours (1129 days + 12 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
5e 00 00 00 00 00 a0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
00 00 00 00 00 00 a0 00 18d+12:45:51.593 NOP [Abort queued commands]
00 00 08 1f 5f d6 e0 00 18d+12:45:48.339 NOP [Abort queued commands]
00 00 00 00 00 00 e0 00 18d+12:45:48.338 NOP [Abort queued commands]
00 00 00 00 00 00 a0 00 18d+12:45:48.335 NOP [Abort queued commands]
00 03 46 00 00 00 a0 00 18d+12:45:48.332 NOP [Reserved subcommand]
Luckily, I'm not an expert on hard drives (nor their failures) but I'm
hoping that somebody might be able to give me some insight on any of
this and if I should be concerned or if I should just considered these
unreadable sectors as "normal" in the life of the drive.
Sincerely,
Phil
next reply other threads:[~2008-04-18 19:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-18 19:35 Phil Lobbes [this message]
2008-04-18 22:02 ` RAID1 == two different ARRAY in scan, and Q on read error corrected Richard Scobie
2008-04-18 23:49 ` David Lethe
2008-04-19 3:15 ` Richard Scobie
2008-04-19 17:26 ` Phil Lobbes
2008-04-19 19:58 ` Richard Scobie
2008-04-19 20:43 ` David Lethe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=27567.1208547359@perkpartners.com \
--to=phil@perkpartners.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.