All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konstantin Olchanski <olchansk@triumf.ca>
To: Carlos Carvalho <carlos@fisica.ufpr.br>
Cc: linux-raid@vger.kernel.org
Subject: Re: disks becoming slow but not explicitly failing anyone?
Date: Wed, 26 Apr 2006 20:31:45 -0700	[thread overview]
Message-ID: <20060427033145.GE2828@send.triumf.ca> (raw)
In-Reply-To: <17482.35982.996526.551753@fisica.ufpr.br>

On Sat, Apr 22, 2006 at 05:05:34PM -0300, Carlos Carvalho wrote:
> We've been hit by a strange problem for about 9 months already. Our
> main server suddenly becomes very unresponsive, the load skyrockets
> and if demand is high enough it collapses. top shows many processes
> stuck in D state. There are no raid or disk error messages, either in
> the console or logs.

Yes, I see similar behaviour with IDE and SATA disks, on random
interfaces, including one 3ware 8506-12 SATA 12 port unit.

I am using a disk testing program that basically
does "dd if=/dev/disk of=/dev/null" but does not give up
on i/o errors and that also measures and reports
the response time of every read() system call. I run
this program on all my disks every Tuesday.

Most disks respond to every read in under 1 second. (There
is always some variation and delays caused by other
programs accessing the disks while the test is running).

Sometimes, some disks take 5-10 seconds to respond and
I now consider this "normal". It's "just" "hard to read" sectors.

Sometimes, some disks take 30-40 seconds to respond
and sometimes result in i/o errors to the user code (timeout + reset
on the hardware side). Sometimes SMART errors errors would be logged,
but not always. The "md" driver does not like these errors
and causes RAID5 and "RAID1/mirror" faults. "RAID0/striped" arrays
seem to survive. I consider these disks as "defective" and replace
them as soon as possible. They usually fail vendor diagnostics
and I do warranty exchanges.

I once had a disk that one some days does all reads in under 1 sec,
but on other days, takes more than 30 seconds (ide timeout + reset +
i/o error). It is probably correlated to the disk temperature.

I now have two SATA disks in the same enclosure: one consistently
gives i/o errors (there is one unreadable bad sector, also
reported by SMART), the other one gives errors maybe every other
time (i.e. it has "hard to read" sector). (For logistics reasons
I am slow at replacing both disks).

K.O.


> 
> The machine has 4 IDE disks in a software raid5 array, connected to a
> 3Ware 7506. Only once I saw warnings of scsi resets of the 3Ware due
> to timeouts.
> 
> This 3Ware card has leds which are on when there's activity in the IDE
> channel. As expected, all leds turn on and off almost simultaneously
> during normal operation of the raid5, however when the problem appears
> one of the leds stays on much longer than the others for each burst of
> activity. This shows that the disk is getting much slower than the
> others, holding the whole array.
> 
> Several times a smart test of the disk shows read failures but not
> always. I've changed cables, 3Ware card and even connected the slow
> disk in the IDE channel of the motherboard to no avail. Changing the
> disk and reconstructing the array restores normal operation.
> 
> This has happened with 7 (seven!!) disks already, 80GB and 120GB,
> Maxtor and Seagate. Has anyone else seen this?
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

      parent reply	other threads:[~2006-04-27  3:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-22 20:05 disks becoming slow but not explicitly failing anyone? Carlos Carvalho
2006-04-23  0:45 ` Mark Hahn
2006-04-23 13:38   ` Nix
2006-04-23 18:04     ` Mark Hahn
2006-04-24 19:20       ` Nix
2006-05-05  0:45         ` Bill Davidsen
2006-04-27  3:31 ` Konstantin Olchanski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060427033145.GE2828@send.triumf.ca \
    --to=olchansk@triumf.ca \
    --cc=carlos@fisica.ufpr.br \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.