Re: Need urgent help in fixing raid5 array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mike Myers <mikesm559@yahoo.com>
To: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: linux-raid@vger.kernel.org, john lists <john4lists@gmail.com>
Subject: Re: Need urgent help in fixing raid5 array
Date: Fri, 2 Jan 2009 10:46:13 -0800 (PST)	[thread overview]
Message-ID: <746863.34803.qm@web30803.mail.mud.yahoo.com> (raw)
In-Reply-To: alpine.DEB.1.10.0901021319330.26159@p34.internal.lan

Well, I can read from sdg1 just fine.  It seems to work ok, at least for a few GB of data.   I'll try this on some of the other disks, but it is possible for to pull the disks out of the backplane and run the SFF-8087 fanout cables direct to each drive and bypass the backplane completely.  It certainly would be easy to do this for the at least the sdo1 drive and see if I can get better results going direct to the disk.  I have moved the disks around the backplane a bit to deal with the issues of the controller failure, so I am pretty sure it's not just one bad slot or the like.  

So you've seen a backplane fail in away that the disks come up fine at boot but have corrupted data transfers across them?  I wonder about the sata cables in that case as well.  I could hook up a pair of PMP's to my SI3132's and bypass the 8077 cables as well.

Thx
Mike






----- Original Message ----
From: Justin Piszcz <jpiszcz@lucidpixels.com>
To: Mike Myers <mikesm559@yahoo.com>
Cc: linux-raid@vger.kernel.org; john lists <john4lists@gmail.com>
Sent: Friday, January 2, 2009 10:22:29 AM
Subject: Re: Need urgent help in fixing raid5 array



On Fri, 2 Jan 2009, Mike Myers wrote:

> Thanks for the response.  When I try and assemble the array with just 6 disks (the 5 good ones and one of sdo1 or sdg1) I get:
>
> (none):~> mdadm /dev/md1 --assemble /dev/sdf1   /dev/sdh1 /dev/sdj1 /dev/sdk1 /dev/sdo1 /dev/sdd1 --force
> mdadm: /dev/md1 assembled from 5 drives - not enough to start the array.
>
>
> (none):~> mdadm /dev/md1 --assemble /dev/sdf1  /dev/sdg1 /dev/sdh1 /dev/sdj1 /dev/sdk1  /dev/sdd1 --force
> mdadm: /dev/md1 assembled from 5 drives - not enough to start the array.
>
> As for the smart info:
>
> (none):~> smartctl -i /dev/sdo1
> smartctl 5.39 2008-05-08 21:56 [x86_64-unknown-linux-gnu] (local build)
> Copyright (C) 2002-8 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family:     Hitachi Deskstar 7K1000
> Device Model:     Hitachi HDS721010KLA330
> Serial Number:    GTJ000PAG552VC
> Firmware Version: GKAOA70M
> User Capacity:    1,000,204,886,016 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   7
> ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1
> Local Time is:    Fri Jan  2 09:32:07 2009 PST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> and
>
> (none):~> smartctl -i /dev/sdg1
> smartctl 5.39 2008-05-08 21:56 [x86_64-unknown-linux-gnu] (local build)
> Copyright (C) 2002-8 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family:     Hitachi Deskstar 7K1000
> Device Model:     Hitachi HDS721010KLA330
> Serial Number:    GTA000PAG5R0AA
> Firmware Version: GKAOA70M
> User Capacity:    1,000,204,886,016 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   7
> ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1
> Local Time is:    Fri Jan  2 10:04:55 2009 PST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
>
> When I tried to read the smart data from the sdo1, the drive went offline and I get a controller error!
I would figure out why this happens first and fix it if possible. Backplane?
Cable? Controller?  Btw: The interest bits from smartctl-- need to see
smartctl -a so we can see the statistics for each of the identifiers.

>
>
> Here's what I get talking to sdg1:
>
> (none):~> smartctl -l error  /dev/sdg1
> smartctl 5.39 2008-05-08 21:56 [x86_64-unknown-linux-gnu] (local build)
> Copyright (C) 2002-8 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF READ SMART DATA SECTION ===
> SMART Error Log Version: 1
> ATA Error Count: 1
>        CR = Command Register [HEX]
>        FR = Features Register [HEX]
>        SC = Sector Count Register [HEX]
>        SN = Sector Number Register [HEX]
>        CL = Cylinder Low Register [HEX]
>        CH = Cylinder High Register [HEX]
>        DH = Device/Head Register [HEX]
>        DC = Device Command Register [HEX]
>        ER = Error register [HEX]
>        ST = Status register [HEX]
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
>
> Error 1 occurred at disk power-on lifetime: 6388 hours (266 days + 4 hours)
>  When the command that caused the error occurred, the device was active or idle.
>
>  After command completion occurred, registers were:
>  ER ST SC SN CL CH DH
>  -- -- -- -- -- -- --
>  84 51 00 00 00 00 a0
>
>  Commands leading to the command that caused the error were:
>  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>  -- -- -- -- -- -- -- --  ----------------  --------------------
>  ec 00 00 00 00 00 a0 08   6d+23:19:44.200  IDENTIFY DEVICE
>  25 00 01 01 00 00 00 04   6d+23:19:44.000  READ DMA EXT
>  25 00 80 be 1b ba ef ff   6d+23:19:42.500  READ DMA EXT
>  25 00 c0 7f 1b ba e0 08   6d+23:19:42.500  READ DMA EXT
>  25 00 40 3f 1b ba e0 08   6d+23:19:30.300  READ DMA EXT
>
>
>
> As for RAID6, well this array started off as a 3 disk RAID5, and then got incrementally grown as capacity needs grew.  I wasn't going to go beyond 7 disks in the raid set, but since you can't reshape raid5 into raid6, and so I would have another few TB of disk available to move the raid set data to.  Since I use XFS, I can't just move off data and then shrink the filesystem to minimize the needs.  md and XFS make it easy to add disks, but very hard to remove them.  :-(
>
> It looks like my best bet is to try and get sd1g back into the raid set somehow, but I don't understand why md isn't assembling it into the set.  Should I try and clone sdo1 to a new disk, or sdg1?  But I am not sure what help that would be if md won't assemble with it.
>
> thx
> mike

As far as re-assembling the array, I would wait for Neil or someone who has
done this a few times but you need to find out why disks are giving I/O errors.

If you run:

dd if=/dev/sda of=/dev/null bs=1M &
dd if=/dev/sdb of=/dev/null bs=1M &

for each disk, can you do that for all disks in the raid array and then
see if any errors occur?  if you flood your system with that much I/O and
it doesnt have any problems I'd say you're good to go, but if you run those
commands and background them/run them simultaenously and drives start dropping
left and right, I'd wonder about the backplane myself..

Justin.


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2009-01-02 18:46 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <451872.61166.qm@web30802.mail.mud.yahoo.com>
2009-01-01 15:40 ` Need urgent help in fixing raid5 array Justin Piszcz
2009-01-01 17:51   ` Mike Myers
2009-01-01 18:29     ` Justin Piszcz
2009-01-01 18:40       ` Jon Nelson
2009-01-01 20:38         ` Mike Myers
2009-01-02  6:19       ` Mike Myers
2009-01-02 12:10         ` Justin Piszcz
2009-01-02 18:12           ` Mike Myers
2009-01-02 18:22             ` Justin Piszcz
2009-01-02 18:46               ` Mike Myers [this message]
2009-01-02 18:57                 ` Justin Piszcz
2009-01-02 20:46                   ` Mike Myers
2009-01-02 20:56                   ` Mike Myers
2009-01-02 21:37                   ` Mike Myers
2009-01-03  4:19                   ` Mike Myers
2009-01-03  4:43                     ` Guy Watkins
2009-01-03  5:02                       ` Mike Myers
2009-01-03 12:46                         ` John Robinson
2009-01-03 15:49                           ` Mike Myers
2009-01-03 16:14                             ` John Robinson
2009-01-03 16:47                               ` Mike Myers
2009-01-03 19:03                               ` Mike Myers
2009-01-05 22:11         ` Neil Brown
2009-01-05 22:22           ` Mike Myers
2009-01-05 22:53             ` NeilBrown
2009-01-06  2:46               ` Mike Myers
2009-01-06  4:00                 ` NeilBrown
2009-01-06  5:55                   ` Mike Myers
2009-01-06 23:23                     ` Neil Brown
2009-01-06  6:24                   ` Mike Myers
2009-01-06 23:31                     ` Neil Brown
2009-01-06 23:54                       ` Mike Myers
2009-01-07  0:19                         ` NeilBrown
2009-01-13  5:38                       ` Mike Myers
2009-01-13  5:57                         ` Mike Myers
2009-01-01 15:31 Mike Myers
  -- strict thread matches above, loose matches on Subject: below --
2008-12-05 17:03 Mike Myers
2008-12-06  0:18 ` Mike Myers
2008-12-06  0:24   ` Justin Piszcz
2008-12-06  0:47     ` Mike Myers
2008-12-06  0:51       ` Justin Piszcz
2008-12-06  0:58         ` Mike Myers
2008-12-06 19:02         ` Mike Myers
2008-12-06 19:30           ` Mike Myers
2008-12-06 20:14             ` Mike Myers
2008-12-06  0:52     ` David Lethe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=746863.34803.qm@web30803.mail.mud.yahoo.com \
    --to=mikesm559@yahoo.com \
    --cc=john4lists@gmail.com \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).