removing faulty drive on 3ware 9xxx card

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* removing faulty drive on 3ware 9xxx card
@ 2005-06-09 19:08 Richard Jacobsen
  2005-06-09 20:08 ` Harry Mangalam
  0 siblings, 1 reply; 2+ messages in thread
From: Richard Jacobsen @ 2005-06-09 19:08 UTC (permalink / raw)
  To: linux-raid

Hello everyone,

I have a drive which is constantly putting out:

3w-9xxx: scsi0: AEN: ERROR (0x04:0x0009): Drive timeout detected:port=4,

However the 3ware cli reports it as still a valid member of the array:

//beautemps> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     2328.2    ON     OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     232.88 GB   488397168     WD-WMAEP28256 
p1     OK               u0     232.88 GB   488397168     WD-WMAEP28252 
p2     OK               u0     232.88 GB   488397168     WD-WMAEP27015 
p3     OK               u0     232.88 GB   488397168     WD-WMAEP28280 
p4     OK               u0     232.88 GB   488397168     WD-WMAEP28256 
p5     OK               u0     232.88 GB   488397168     WD-WMAEP28257 
p6     OK               u0     232.88 GB   488397168     WD-WMAEP28253 
p7     OK               u0     232.88 GB   488397168     WD-WMAEP28252 
p8     OK               u0     232.88 GB   488397168     WD-WMAEP28566 
p9     OK               u0     232.88 GB   488397168     WD-WMAEP25657 
p10    OK               u0     232.88 GB   488397168     WD-WMAEP28584 
p11    OK               -      232.88 GB   488397168     WD-WMAEP28250 

Since I'm assuming that this constant drive timeout is what is making my array
show to a crawl, I'd like to remove p4 from the array, have the hotswap on 
p11 take over, then replace p4.

I'm thinking that:

maint remove c0 p4

Is the command I'm looking for.  Any caveats before I try? 

Thanks,
Richard

-- 
"a professional is simply one who gets paid for doing what an amateur does for
love."
																	-- Ursula K. Le Guin

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: removing faulty drive on 3ware 9xxx card
  2005-06-09 19:08 removing faulty drive on 3ware 9xxx card Richard Jacobsen
@ 2005-06-09 20:08 ` Harry Mangalam
  0 siblings, 0 replies; 2+ messages in thread
From: Harry Mangalam @ 2005-06-09 20:08 UTC (permalink / raw)
  To: Richard Jacobsen, linux-raid



That looks right, tho you haven't mentioned what version of the SW you're 
using. ANd you DO have the docs, right? ;)

If not, go here to get them:
http://www.3ware.com/support/downloadpageeng.asp?SNO=7

Or you could test the robustness of the system and just yank it.  I'd be 
interested in the results.. :)

After the bad disk is pulled, the rebuild should start immediately on your hot 
spare AFAIK, and when you replace the bad disk, you should then be able to 
specify it as the hot spare.

The web version of their SW (3dm2) works for me and is considerably more 
intuitive than the tw_cli (tho that's no saying a lot).

You might also try to get the SMART info from the disk (the 3ware SW can 
extract the raw numbers but will not interpret it).  

also:

Konstantin Olchanski <olchansk@sam.triumf.ca> recently wrote that:
I use the 3ware driver that comes with the Red Hat kernels, the
additional monitoring tools from 3ware do not work. SMART monitoring
works via "smartctl -a -d 3ware,0 /dev/twe0".
and added offline:
 BTW, I had to mknod /dev/twe0 manually, this
is how it looks like:

[root@tw00 ~]# ls -l /dev/twe0
crw-------  1 root root 254, 0 Jun  8 15:03 /dev/twe0



here's the section of man page for my version of tw_cli (2.00.00.042)

[maint] rebuild cid uid pid [ignoreECC]
    This command allows you to rebuild a DEGRADED unit by using the specified 
port. Rebuild only applies to redundant arrays such as RAID-1, RAID-5, 
RAID-10 and RAID-50. During rebuild, bad sectors on the source disk will 
cause the rebuild to fail. You can allow for the operation to continue via 
ignoreECC. Rebuild process is a background task and will change the state of 
a unit to REBUILDING. Various info commands also show a percent completion as 
rebuilding progresses. 

    Note that the port (disk) to be used to rebuild a unit, must be a SPARE or 
configured disk.

Let us know what happens...
hjm


On Thursday 09 June 2005 12:08 pm, Richard Jacobsen wrote:
> Hello everyone,
>
> I have a drive which is constantly putting out:
>
> 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0009): Drive timeout detected:port=4,
>
> However the 3ware cli reports it as still a valid member of the array:
>
> //beautemps> info c0
>
> Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify 
> IgnECC
> ---------------------------------------------------------------------------
>--- u0    RAID-5    OK             -      64K     2328.2    ON     OFF     
> OFF
>
> Port   Status           Unit   Size        Blocks        Serial
> ---------------------------------------------------------------
> p0     OK               u0     232.88 GB   488397168     WD-WMAEP28256
> p1     OK               u0     232.88 GB   488397168     WD-WMAEP28252
> p2     OK               u0     232.88 GB   488397168     WD-WMAEP27015
> p3     OK               u0     232.88 GB   488397168     WD-WMAEP28280
> p4     OK               u0     232.88 GB   488397168     WD-WMAEP28256
> p5     OK               u0     232.88 GB   488397168     WD-WMAEP28257
> p6     OK               u0     232.88 GB   488397168     WD-WMAEP28253
> p7     OK               u0     232.88 GB   488397168     WD-WMAEP28252
> p8     OK               u0     232.88 GB   488397168     WD-WMAEP28566
> p9     OK               u0     232.88 GB   488397168     WD-WMAEP25657
> p10    OK               u0     232.88 GB   488397168     WD-WMAEP28584
> p11    OK               -      232.88 GB   488397168     WD-WMAEP28250
>
> Since I'm assuming that this constant drive timeout is what is making my
> array show to a crawl, I'd like to remove p4 from the array, have the
> hotswap on p11 take over, then replace p4.
>
> I'm thinking that:
>
> maint remove c0 p4
>
> Is the command I'm looking for.  Any caveats before I try?
>
> Thanks,
> Richard

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm@tacgi.com 
            <<plain text preferred>>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-06-09 20:08 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-09 19:08 removing faulty drive on 3ware 9xxx card Richard Jacobsen
2005-06-09 20:08 ` Harry Mangalam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).