RAID1 == two different ARRAY in scan, and Q on read error corrected

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Phil Lobbes <phil@perkpartners.com>
To: linux-raid@vger.kernel.org
Subject: RAID1 == two different ARRAY in scan, and Q on read error corrected
Date: Fri, 18 Apr 2008 15:35:59 -0400	[thread overview]
Message-ID: <27567.1208547359@perkpartners.com> (raw)

Hi,

I have been lurking for a little while on the mail list and been doing
some investigation on my own.  I don't mean to impose and hopefully this
is the right forum for these questions.  If anyone has some
suggestions/recommendations/guidance on the following two questions I'm
all ears!

_________________________________________________________________
Q1: RAID1 == two different ARRAY in scan

I recently upgraded my server from Fedora Core 5 to Fedora 8 and along
with that I noticed something that either overlooked before or perhaps
caused during the upgrade.  On that system I have a 300G RAID1 mirror:

  # cat /proc/mdstat
  Personalities : [raid1]
  md0 : active raid1 sdc1[0] sdd1[1]
        293049600 blocks [2/2] [UU]

  unused devices: <none>

When I use mdadm --examine --scan my 300G RAID1 mirror returns two
separate UUIDs with different devices for each:
* (correct) a "complete disk partition" aka /dev/sd{c,d}1
* (bogus) a entire device aka /dev/sd{c,d}

  # mdadm --examine --scan --verbose
  ARRAY /dev/md0 level=raid1 num-devices=2 UUID=12c2d7a3:0b791468:9e965247:f4354b36
     devices=/dev/sdd,/dev/sdc
  ARRAY /dev/md0 level=raid1 num-devices=2 UUID=7b879b21:7cc83b9c:765dd3f3:2af46d19
     devices=/dev/sdd1,/dev/sdc1

I didn't find a match in a FAQ or other posting so I was hoping to get
some insight/pointers here.

Should I:
a. Ignore this?

b. Zero out the superblock on sd{c,d}?  I'm no expert here so not
   positive this is a good option.  My theory is that a superblock for
   sdc must be different than a superblock for sdc1 so if that is
   correct the "fix" might be something like:

   # mdadm --zero-superblock /dev/sdc /dev/sdd

   Is this correct and safe?  No worries about it somehow impacting
   /dev/sdc1 and /dev/sdd1 and the good mirror, right?

c. Something else altogether?

For what it's worth, I suppose there is a chance I may have caused this
by trying to 'rename' the md# used by the ARRAY /dev/md0 => /dev/md3.

-----------------------------------------------------------------
* Disk/Partition info:

NOTE: Valid mirror is for partition /dev/sd{c,d}1 (not device
/dev/sd{c,d})

# fdisk -l /dev/sdc /dev/sdd

Disk /dev/sdc: 300.0 GB, 300090728448 bytes
255 heads, 63 sectors/track, 36483 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       36483   293049666   fd  Linux raid autodetect

Disk /dev/sdd: 300.0 GB, 300090728448 bytes
255 heads, 63 sectors/track, 36483 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       36483   293049666   fd  Linux raid autodetect

_________________________________________________________________
* Q2: On read error corrected messages

On an unrelated note, during/after the upgrade I noticed that I'm now
seeing a few of these events logged:

Apr 15 11:07:14  kernel: raid1: sdc1: rescheduling sector 517365296
Apr 15 11:07:54  kernel: raid1:md0: read error corrected (8 sectors at 517365296 on sdc1)
Apr 15 11:07:54  kernel: raid1: sdc1: redirecting sector 517365296 to another mirror
Apr 15 11:08:32  kernel: raid1: sdc1: rescheduling sector 517365472
Apr 15 11:09:09  kernel: raid1:md0: read error corrected (8 sectors at 517365472 on sdc1)
Apr 15 11:09:09  kernel: raid1: sdc1: redirecting sector 517365472 to another mirror

And also more of these:

Apr 18 14:01:45  smartd[2104]: Device: /dev/sdc, 3 Currently unreadable (pending) sectors
Apr 18 14:01:45  smartd[2104]: Device: /dev/sdc, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 240 to 241
Apr 18 14:01:45  smartd[2104]: Device: /dev/sdd, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 238 to 239

Here's some info from smartctl:

# smartctl -a /dev/sdc
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Maxtor DiamondMax 10 family (ATA/133 and SATA/150)
Device Model:     Maxtor 6B300S0
Serial Number:    B60370HH
Firmware Version: BANC1980
User Capacity:    300,090,728,448 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Fri Apr 18 15:09:02 2008 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
...

SMART Error Log Version: 1
ATA Error Count: 36 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 36 occurred at disk power-on lifetime: 27108 hours (1129 days + 12 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  5e 00 00 00 00 00 a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 00 00 00 00 a0 00  18d+12:45:51.593  NOP [Abort queued commands]
  00 00 08 1f 5f d6 e0 00  18d+12:45:48.339  NOP [Abort queued commands]
  00 00 00 00 00 00 e0 00  18d+12:45:48.338  NOP [Abort queued commands]
  00 00 00 00 00 00 a0 00  18d+12:45:48.335  NOP [Abort queued commands]
  00 03 46 00 00 00 a0 00  18d+12:45:48.332  NOP [Reserved subcommand]

Luckily, I'm not an expert on hard drives (nor their failures) but I'm
hoping that somebody might be able to give me some insight on any of
this and if I should be concerned or if I should just considered these
unreadable sectors as "normal" in the life of the drive.

Sincerely,
Phil

next             reply	other threads:[~2008-04-18 19:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-18 19:35 Phil Lobbes [this message]
2008-04-18 22:02 ` RAID1 == two different ARRAY in scan, and Q on read error corrected Richard Scobie
2008-04-18 23:49   ` David Lethe
2008-04-19  3:15     ` Richard Scobie
2008-04-19 17:26       ` Phil Lobbes
2008-04-19 19:58         ` Richard Scobie
2008-04-19 20:43           ` David Lethe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27567.1208547359@perkpartners.com \
    --to=phil@perkpartners.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.