* removing faulty drive on 3ware 9xxx card
@ 2005-06-09 19:08 Richard Jacobsen
2005-06-09 20:08 ` Harry Mangalam
0 siblings, 1 reply; 2+ messages in thread
From: Richard Jacobsen @ 2005-06-09 19:08 UTC (permalink / raw)
To: linux-raid
Hello everyone,
I have a drive which is constantly putting out:
3w-9xxx: scsi0: AEN: ERROR (0x04:0x0009): Drive timeout detected:port=4,
However the 3ware cli reports it as still a valid member of the array:
//beautemps> info c0
Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC
------------------------------------------------------------------------------
u0 RAID-5 OK - 64K 2328.2 ON OFF OFF
Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 232.88 GB 488397168 WD-WMAEP28256
p1 OK u0 232.88 GB 488397168 WD-WMAEP28252
p2 OK u0 232.88 GB 488397168 WD-WMAEP27015
p3 OK u0 232.88 GB 488397168 WD-WMAEP28280
p4 OK u0 232.88 GB 488397168 WD-WMAEP28256
p5 OK u0 232.88 GB 488397168 WD-WMAEP28257
p6 OK u0 232.88 GB 488397168 WD-WMAEP28253
p7 OK u0 232.88 GB 488397168 WD-WMAEP28252
p8 OK u0 232.88 GB 488397168 WD-WMAEP28566
p9 OK u0 232.88 GB 488397168 WD-WMAEP25657
p10 OK u0 232.88 GB 488397168 WD-WMAEP28584
p11 OK - 232.88 GB 488397168 WD-WMAEP28250
Since I'm assuming that this constant drive timeout is what is making my array
show to a crawl, I'd like to remove p4 from the array, have the hotswap on
p11 take over, then replace p4.
I'm thinking that:
maint remove c0 p4
Is the command I'm looking for. Any caveats before I try?
Thanks,
Richard
--
"a professional is simply one who gets paid for doing what an amateur does for
love."
-- Ursula K. Le Guin
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: removing faulty drive on 3ware 9xxx card
2005-06-09 19:08 removing faulty drive on 3ware 9xxx card Richard Jacobsen
@ 2005-06-09 20:08 ` Harry Mangalam
0 siblings, 0 replies; 2+ messages in thread
From: Harry Mangalam @ 2005-06-09 20:08 UTC (permalink / raw)
To: Richard Jacobsen, linux-raid
That looks right, tho you haven't mentioned what version of the SW you're
using. ANd you DO have the docs, right? ;)
If not, go here to get them:
http://www.3ware.com/support/downloadpageeng.asp?SNO=7
Or you could test the robustness of the system and just yank it. I'd be
interested in the results.. :)
After the bad disk is pulled, the rebuild should start immediately on your hot
spare AFAIK, and when you replace the bad disk, you should then be able to
specify it as the hot spare.
The web version of their SW (3dm2) works for me and is considerably more
intuitive than the tw_cli (tho that's no saying a lot).
You might also try to get the SMART info from the disk (the 3ware SW can
extract the raw numbers but will not interpret it).
also:
Konstantin Olchanski <olchansk@sam.triumf.ca> recently wrote that:
I use the 3ware driver that comes with the Red Hat kernels, the
additional monitoring tools from 3ware do not work. SMART monitoring
works via "smartctl -a -d 3ware,0 /dev/twe0".
and added offline:
BTW, I had to mknod /dev/twe0 manually, this
is how it looks like:
[root@tw00 ~]# ls -l /dev/twe0
crw------- 1 root root 254, 0 Jun 8 15:03 /dev/twe0
here's the section of man page for my version of tw_cli (2.00.00.042)
[maint] rebuild cid uid pid [ignoreECC]
This command allows you to rebuild a DEGRADED unit by using the specified
port. Rebuild only applies to redundant arrays such as RAID-1, RAID-5,
RAID-10 and RAID-50. During rebuild, bad sectors on the source disk will
cause the rebuild to fail. You can allow for the operation to continue via
ignoreECC. Rebuild process is a background task and will change the state of
a unit to REBUILDING. Various info commands also show a percent completion as
rebuilding progresses.
Note that the port (disk) to be used to rebuild a unit, must be a SPARE or
configured disk.
Let us know what happens...
hjm
On Thursday 09 June 2005 12:08 pm, Richard Jacobsen wrote:
> Hello everyone,
>
> I have a drive which is constantly putting out:
>
> 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0009): Drive timeout detected:port=4,
>
> However the 3ware cli reports it as still a valid member of the array:
>
> //beautemps> info c0
>
> Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify
> IgnECC
> ---------------------------------------------------------------------------
>--- u0 RAID-5 OK - 64K 2328.2 ON OFF
> OFF
>
> Port Status Unit Size Blocks Serial
> ---------------------------------------------------------------
> p0 OK u0 232.88 GB 488397168 WD-WMAEP28256
> p1 OK u0 232.88 GB 488397168 WD-WMAEP28252
> p2 OK u0 232.88 GB 488397168 WD-WMAEP27015
> p3 OK u0 232.88 GB 488397168 WD-WMAEP28280
> p4 OK u0 232.88 GB 488397168 WD-WMAEP28256
> p5 OK u0 232.88 GB 488397168 WD-WMAEP28257
> p6 OK u0 232.88 GB 488397168 WD-WMAEP28253
> p7 OK u0 232.88 GB 488397168 WD-WMAEP28252
> p8 OK u0 232.88 GB 488397168 WD-WMAEP28566
> p9 OK u0 232.88 GB 488397168 WD-WMAEP25657
> p10 OK u0 232.88 GB 488397168 WD-WMAEP28584
> p11 OK - 232.88 GB 488397168 WD-WMAEP28250
>
> Since I'm assuming that this constant drive timeout is what is making my
> array show to a crawl, I'd like to remove p4 from the array, have the
> hotswap on p11 take over, then replace p4.
>
> I'm thinking that:
>
> maint remove c0 p4
>
> Is the command I'm looking for. Any caveats before I try?
>
> Thanks,
> Richard
--
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm@tacgi.com
<<plain text preferred>>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2005-06-09 20:08 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-09 19:08 removing faulty drive on 3ware 9xxx card Richard Jacobsen
2005-06-09 20:08 ` Harry Mangalam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).