Re: Need urgent help in fixing raid5 array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Jon Nelson" <jnelson-linux-raid@jamponi.net>
To: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: Mike Myers <mikesm559@yahoo.com>,
	linux-raid@vger.kernel.org, john lists <john4lists@gmail.com>
Subject: Re: Need urgent help in fixing raid5 array
Date: Thu, 1 Jan 2009 12:40:18 -0600	[thread overview]
Message-ID: <cccedfc60901011040m518f2502gee0d252063612b4e@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.1.10.0901011328380.17888@p34.internal.lan>

Also the contents of /etc/mdadm.conf


On Thu, Jan 1, 2009 at 12:29 PM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> I think some output would be pertinent here:
>
> mdadm -D /dev/md0..1..2 etc
>
> cat /proc/mdstat
>
> dmesg/syslog of the errors you are seeing etc
>
>
>
> On Thu, 1 Jan 2009, Mike Myers wrote:
>
>> The disks that are problematic are still online as far as the OS can tell.
>>  I can do a dd from them and pull off data at the normal speeds, so I don't
>> understand if that's the case why the backplane would be a problem here.  I
>> can try and move them to another slot however (I have a 20 slot SATA
>> backplane in there) and see if that changes how md deals with it.
>>
>> The OS sees the drive, it inits fine, but md shows it as removed and won't
>> let me add it back to the array because of the "device being busy".  I don't
>> understand the criteria that md uses to add a drive I guess.  The uuid looks
>> fine, and if the events is off, then the -f flag should take care of that.
>>  I've never seen a "device busy" failure on an add before.
>>
>> thx
>> mike
>>
>>
>>
>>
>> ----- Original Message ----
>> From: Justin Piszcz <jpiszcz@lucidpixels.com>
>> To: Mike Myers <mikesm559@yahoo.com>
>> Cc: linux-raid@vger.kernel.org; john lists <john4lists@gmail.com>
>> Sent: Thursday, January 1, 2009 7:40:21 AM
>> Subject: Re: Need urgent help in fixing raid5 array
>>
>>
>>
>> On Thu, 1 Jan 2009, Mike Myers wrote:
>>
>>> Well, thanks for all your help last month.  As i posted, things came
>>> back up and I survived the failure.  Now, I have yet another problem.
>>> :(  After 5 years of running a linux server as a dedicated NAS, I am
>>> hitting some very weird problems.  This server started as an single
>>> processor AMD system with 4 320GB drives, and has been upgraded
>>> multiple times so that it is now a quad core Intel rackmounted 4U
>>> system with 14 1 TB drives and I have never lost data in any of the
>>> upgrades of CPU, motherboard and disk controller hardware and disk
>>> drives.  Now after last month's near death experience I am faced with
>>> another serious problem in less than a month.  Any help you guys could
>>> give me would be most appreciated.  This is a sucky way to start the
>>> new year.
>>>
>>> The array I had problems with last month (md2
>>> comprised of 7 1 TB drives in a RAID5 config) is running just fine.
>>> md1, which is built of 7 1 TB hitachi 7K1000 drives is now having
>>> problems.  We returned from a 10 day family visit with everything
>>> running just fine.  There ws a brief power outage today, abt 3 mins,
>>> but I can't see how that could be related as the server is on a high
>>> quality rackmount 3U APC UPS that handled the outage just fine.  I was
>>> working on the system getting X to work again after a nvidia driver
>>> update, and when that was working fine, checked the disks to discover
>>> that md1 was in a degraded state, with /dev/sdl1 kicked out of the
>>> array (removed).  I tried to do a dd from the drive to verify it's
>>> location in the rack, but I got an i/o error.  This was most odd, and
>>> so went to the rack and pulled the disk and reinserted it.  No system
>>> log entries recorded the device being pulled or re-installed.  So I am
>>> thinking that a cable somehow
>>> has come loose.  I power the system
>>> down, pull it out of the rack, look at the cable that goes to the
>>> drive, everything looks fine.
>>>
>>> So I reboot the system, and now
>>> the array won't come online because now in addition to the drive that
>>> shows as (removed), one of the other drives shows as a faulty spare.
>>> Well, learning from the last go around, I reassemble the array with the
>>> --force option, and the array comes back up.  But LVM won't come back
>>> up because it sees the physical volume that maps to md1 as missing.
>>> Now I am very concerned.  After trying a bunch of things, I do a
>>> pvcreate with the missing UUID on md1, restart the vg and the logical
>>> volume comes back up.  I was thinking I may have told lvm to use an
>>> array of bad data, but to my surprise, I mounted the filesystem and
>>> everything looked intact!  Ok, sometimes you win.  So I do one more
>>> reboot to get the system back up in multiuser so I can back up some of
>>> the more important media stored on the volume (it's got about 10 Tb
>>> used, but most of that is PVR recordings, but there is a lot of ripped
>>> music and DVD's that I really don't
>>> want to rerip) on a another server that has some space on it while I
>>> figure out what has been happening.
>>>
>>> The
>>> reboot again fails because of a problem with md1.  This time, another
>>> one of the drives shows as removed (/dev/sdm1), and I can't reassemble
>>> the array with a --force option.  It is acting like /dev/sdl1 (the
>>> other removed unit), and even though I can read from the drives fine,
>>> their UUID is fine, etc..., md does not consider them as part of the
>>> array.  /dev/sdo1 (which was the drive that looked like a faulty spare)
>>> seems OK when trying to do the assemble.  sdm1 seemed just fine before
>>> the reboot, and was showing no problems before.  They are not hooked up
>>> on the same controller cable ( a SAS to SATA fanout), and the LSI MPT
>>> controller card seems to talk to the other disks just fine.
>>>
>>> Anyways,
>>> I have no idea as to what's going on.  When I try to add sdm1 or sdl1
>>> back into the array, md complains the device is busy, which is very odd
>>> because it's not part of another array or doing anything else in the
>>> system.
>>>
>>> Any idea as to what could be happening here?  I am beyond frustrated.
>>>
>>> thanks,
>>> Mike
>>>
>>>
>>>
>>
>> If you are using a hotswap chasis, then it has some sort of
>> sata-backplane.  I have seen backplanes go bad in the past, that would be
>> my first replacement.
>>
>> Justin.
>>
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Jon

next prev parent reply	other threads:[~2009-01-01 18:40 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <451872.61166.qm@web30802.mail.mud.yahoo.com>
2009-01-01 15:40 ` Need urgent help in fixing raid5 array Justin Piszcz
2009-01-01 17:51   ` Mike Myers
2009-01-01 18:29     ` Justin Piszcz
2009-01-01 18:40       ` Jon Nelson [this message]
2009-01-01 20:38         ` Mike Myers
2009-01-02  6:19       ` Mike Myers
2009-01-02 12:10         ` Justin Piszcz
2009-01-02 18:12           ` Mike Myers
2009-01-02 18:22             ` Justin Piszcz
2009-01-02 18:46               ` Mike Myers
2009-01-02 18:57                 ` Justin Piszcz
2009-01-02 20:46                   ` Mike Myers
2009-01-02 20:56                   ` Mike Myers
2009-01-02 21:37                   ` Mike Myers
2009-01-03  4:19                   ` Mike Myers
2009-01-03  4:43                     ` Guy Watkins
2009-01-03  5:02                       ` Mike Myers
2009-01-03 12:46                         ` John Robinson
2009-01-03 15:49                           ` Mike Myers
2009-01-03 16:14                             ` John Robinson
2009-01-03 16:47                               ` Mike Myers
2009-01-03 19:03                               ` Mike Myers
2009-01-05 22:11         ` Neil Brown
2009-01-05 22:22           ` Mike Myers
2009-01-05 22:53             ` NeilBrown
2009-01-06  2:46               ` Mike Myers
2009-01-06  4:00                 ` NeilBrown
2009-01-06  5:55                   ` Mike Myers
2009-01-06 23:23                     ` Neil Brown
2009-01-06  6:24                   ` Mike Myers
2009-01-06 23:31                     ` Neil Brown
2009-01-06 23:54                       ` Mike Myers
2009-01-07  0:19                         ` NeilBrown
2009-01-13  5:38                       ` Mike Myers
2009-01-13  5:57                         ` Mike Myers
2009-01-01 15:31 Mike Myers
  -- strict thread matches above, loose matches on Subject: below --
2008-12-05 17:03 Mike Myers
2008-12-06  0:18 ` Mike Myers
2008-12-06  0:24   ` Justin Piszcz
2008-12-06  0:47     ` Mike Myers
2008-12-06  0:51       ` Justin Piszcz
2008-12-06  0:58         ` Mike Myers
2008-12-06 19:02         ` Mike Myers
2008-12-06 19:30           ` Mike Myers
2008-12-06 20:14             ` Mike Myers
2008-12-06  0:52     ` David Lethe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cccedfc60901011040m518f2502gee0d252063612b4e@mail.gmail.com \
    --to=jnelson-linux-raid@jamponi.net \
    --cc=john4lists@gmail.com \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=mikesm559@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).