All of lore.kernel.org
 help / color / mirror / Atom feed
* Bad Hardware / Software Disk Detection for Production Systems
@ 2006-06-02 12:21 Bill Rees
  2006-06-02 12:40 ` Mark Nipper
  0 siblings, 1 reply; 4+ messages in thread
From: Bill Rees @ 2006-06-02 12:21 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1436 bytes --]

Hi,

This is a general question but I thought I'd post it to the list to see if
anyone has any suggestions.

We have some production systems running SUSE 9.3 and reiserfs that are very
IO intensive using 3ware controllers in JBOD mode for the high throughput.
On occasion we will have a disk problem that bringst the system to a virtual
standstill. Ususally the problems cause an eventual system lockup that can't
even be resolved with a software reboot. One has to hit the reset switch to
get the system back and then take the offending disk offline. Usually there
are tons of errors from the kernel indicating drive problems. 90% of the
time the failure is due to a hardware issue with the drive running out of
spare sectors but sometime it is a filesystem corruption issue. I am looking
for a way to prevent the system lockup.

Is there a way to accomplish this without RAID 5? We have the smartmon utils
installed on all of our systems and most times even setting a drive to be
fsck'ed on reboot does nothing when a system boots. Ideally, I'd just like
to recognize when a disk might be having issues, stop using it, and then
notify someone to manually check into it.

Is there a quick and dirty check to determine if a reiserfs disk is hosed on
boot? Setting a flag in fstab doesn't seem to do the trick. I'd be willing
to write a custom mount script to accomplish this as well.

Any input would be appreciated.

thanks,
Bill Rees

[-- Attachment #2: Type: text/html, Size: 1482 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-06-02 16:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-02 12:21 Bad Hardware / Software Disk Detection for Production Systems Bill Rees
2006-06-02 12:40 ` Mark Nipper
2006-06-02 13:10   ` Bill Rees
2006-06-02 16:47     ` Hans Reiser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.