linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Permanent disk shutdown instead of soft/hard reset?
@ 2007-10-11  0:12 Andrew Paprocki
  2007-10-23  8:18 ` Tejun Heo
  0 siblings, 1 reply; 2+ messages in thread
From: Andrew Paprocki @ 2007-10-11  0:12 UTC (permalink / raw)
  To: linux-ide

I'm currently running into a situation where I have 4 SATA drives in a
striped array where one of the drives is failing (/ has failed). The
single drive failure manifests itself as ext3 errors and libata SCSI
media errors which occur non-stop as software attempts to read/write
to the mounted array. Because libata is seeing media errors, the bad
drive endlessly soft resets while the software is still running and
attempting to access the drive. This winds up hanging the entire
system because the software (consider it a 'find' command running on
the drive) occurs in the init.d boot scripts. The end result is that a
login prompt is never reached until the software finishes what it is
doing and hours of soft resets have occurred.

Is there any way that this behavior can be stopped by permanently
disconnecting the drive after a configurable number of errors that
would otherwise soft reset? Does the layer allow for the concept of a
full disk shutdown rather than a reset? I assume this would have to
forcefully unmount any active mounts which use the drive/array to
ensure that no subsequent cmds would cause libata to attempt to
reconnect to the bad drive(s). Is this even possible?

Using smartd is invaluable for detecting failing drives, but when the
failed drive prevents the system from booting, it is hard to recover
remotely. It may not be possible to "recover" (e.g. If the failed
drive is the boot drive), but that should be up to the system
designer. In my case, I would still want to boot into the system (I do
not boot from the array), establish network connectivity, and "phone
home" that a permanent hardware failure has occurred in the array.

Thanks,
-Andrew

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Permanent disk shutdown instead of soft/hard reset?
  2007-10-11  0:12 Permanent disk shutdown instead of soft/hard reset? Andrew Paprocki
@ 2007-10-23  8:18 ` Tejun Heo
  0 siblings, 0 replies; 2+ messages in thread
From: Tejun Heo @ 2007-10-23  8:18 UTC (permalink / raw)
  To: Andrew Paprocki; +Cc: linux-ide

Andrew Paprocki wrote:
> I'm currently running into a situation where I have 4 SATA drives in a
> striped array where one of the drives is failing (/ has failed). The
> single drive failure manifests itself as ext3 errors and libata SCSI
> media errors which occur non-stop as software attempts to read/write
> to the mounted array. Because libata is seeing media errors, the bad
> drive endlessly soft resets while the software is still running and
> attempting to access the drive. This winds up hanging the entire
> system because the software (consider it a 'find' command running on
> the drive) occurs in the init.d boot scripts. The end result is that a
> login prompt is never reached until the software finishes what it is
> doing and hours of soft resets have occurred.
> 
> Is there any way that this behavior can be stopped by permanently
> disconnecting the drive after a configurable number of errors that
> would otherwise soft reset? Does the layer allow for the concept of a
> full disk shutdown rather than a reset? I assume this would have to
> forcefully unmount any active mounts which use the drive/array to
> ensure that no subsequent cmds would cause libata to attempt to
> reconnect to the bad drive(s). Is this even possible?

If it's a RAID array, that's the job of md (or dm).  If it isn't a RAID
array, giving up on the drive doesn't really help anybody.  As long as
the drive is up and willing to talk, libata talks to the drive as
requested by upper layers.

-- 
tejun

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-10-23  8:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-11  0:12 Permanent disk shutdown instead of soft/hard reset? Andrew Paprocki
2007-10-23  8:18 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).