Manually failing a disk during a md raid6 reshape

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Manually failing a disk during a md raid6 reshape
@ 2012-12-04 18:44 EJ Vincent
  2012-12-05 13:24 ` Mikael Abrahamsson
  2012-12-06  3:47 ` Jack Wang
  0 siblings, 2 replies; 3+ messages in thread
From: EJ Vincent @ 2012-12-04 18:44 UTC (permalink / raw)
  To: linux-raid mailing list

Greetings,

I currently have a md raid6 reshape (growth, adding a disk) under-way.  
Dmesg is beginning to fill up with:

[23661.913556] ata15.00: status: { DRDY }
[23661.914443] ata15.00: failed command: WRITE FPDMA QUEUED
[23661.915326] ata15.00: cmd 61/80:f0:00:34:e9/00:00:45:00:00/40 tag 30 
ncq 65536 out
[23661.915329]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 
0x4 (timeout)
[23661.917110] ata15.00: status: { DRDY }
[23661.917977] ata15: hard resetting link
[23664.116087] ata15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[23664.118667] ata15.00: configured for UDMA/100
[23664.118685] ata15.00: device reported invalid CHS sector 0

What's the best way to identify ata15.00 and is it permissible to 
manually fail this device, without damaging and/or interrupting the 
reshape?

I'm also perfectly OK with waiting for it to complete on it's own, 
although the speed of the reshape operation has diminished, by my 
estimate, about 25% since beginning.

Also, any conjecture on what might lead to the above error(s)?

Thank you very much.

Best,
-EJ

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Manually failing a disk during a md raid6 reshape
  2012-12-04 18:44 Manually failing a disk during a md raid6 reshape EJ Vincent
@ 2012-12-05 13:24 ` Mikael Abrahamsson
  2012-12-06  3:47 ` Jack Wang
  1 sibling, 0 replies; 3+ messages in thread
From: Mikael Abrahamsson @ 2012-12-05 13:24 UTC (permalink / raw)
  To: EJ Vincent; +Cc: linux-raid mailing list

On Tue, 4 Dec 2012, EJ Vincent wrote:

> What's the best way to identify ata15.00 and is it permissible to 
> manually fail this device, without damaging and/or interrupting the 
> reshape?

using "iostat -x 5" you might see the AWAIT or similar values shoot up on 
one of the drives when this is happening, that might be a good indicator. 
Apart from that it's not that easy to go from ataXX -> sdX format.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Manually failing a disk during a md raid6 reshape
  2012-12-04 18:44 Manually failing a disk during a md raid6 reshape EJ Vincent
  2012-12-05 13:24 ` Mikael Abrahamsson
@ 2012-12-06  3:47 ` Jack Wang
  1 sibling, 0 replies; 3+ messages in thread
From: Jack Wang @ 2012-12-06  3:47 UTC (permalink / raw)
  To: EJ Vincent; +Cc: linux-raid mailing list

2012/12/5 EJ Vincent <ej@ejane.org>:
> Greetings,
>
> I currently have a md raid6 reshape (growth, adding a disk) under-way.
> Dmesg is beginning to fill up with:
>
> [23661.913556] ata15.00: status: { DRDY }
> [23661.914443] ata15.00: failed command: WRITE FPDMA QUEUED
> [23661.915326] ata15.00: cmd 61/80:f0:00:34:e9/00:00:45:00:00/40 tag 30 ncq
> 65536 out
> [23661.915329]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [23661.917110] ata15.00: status: { DRDY }
> [23661.917977] ata15: hard resetting link
> [23664.116087] ata15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> [23664.118667] ata15.00: configured for UDMA/100
> [23664.118685] ata15.00: device reported invalid CHS sector 0
>
>
> Also, any conjecture on what might lead to the above error(s)?

If you do not enable async scan, you may find the ataxx to sdx mapping
from looking into dmesg.

the error seems because disk not respouse ncq write command, lead to
command timeout by upper layer, which lead to error handler action.

I gusse the reason is the faulty disk not return respounse in time.

Jack

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-12-06  3:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-04 18:44 Manually failing a disk during a md raid6 reshape EJ Vincent
2012-12-05 13:24 ` Mikael Abrahamsson
2012-12-06  3:47 ` Jack Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).