From: Arto Jantunen <viiru@debian.org>
To: Rogier Wolff <R.E.Wolff@BitWizard.nl>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Slow disks.
Date: Tue, 21 Dec 2010 14:29:47 +0200 [thread overview]
Message-ID: <871v5bnwas.fsf@viiru.iki.fi> (raw)
In-Reply-To: <fa.C+PyZdFdHUxRFDJDF3KlrfaJASk@ifi.uio.no> (Rogier Wolff's message of "Mon\, 20 Dec 2010 14\:16\:01 UTC")
Rogier Wolff <R.E.Wolff@BitWizard.nl> writes:
> Hi,
>
> A friend of mine has a server in a datacenter somewhere. His machine
> is not working properly: most of his disks take 10-100 times longer
> to process each IO request than normal.
>
> iostat -kx 10 output:
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
> sdd 0.30 0.00 0.40 1.20 2.80 1.10 4.88 0.43 271.50 271.44 43.43
>
> shows that in this 10 second period, the disk was busy for 4.3 seconds
> and serviced 15-16 requests during that time.
>
> Normal disks show "svctm" of around 10-20ms.
>
> Now you might say: It's his disk that's broken.
> Well no: I don't believe that all four of his disks are broken.
> (I just showed you output about one disk, but there are 4 disks in there
> all behaving similar, but some are worse than others.)
>
> Or you might say: It's his controller that's broken. So we thought
> too. We replaced the onboard sata controller with a 4-port sata
> card. Now they are running off the external sata card... Slightly
> better, but not by much.
>
> Or you might say: it's hardware. But suppose the disk doesn't properly
> transfer the data 9 times out of 10, wouldn't the driver tell us
> SOMETHING in the syslog that things are not fine and dandy? Moreover,
> In the case above, 12kb were transferred in 4.3 seconds. If CRC errors
> were happening, the interface would've been able to transfer over
> 400Mb during that time. So every transfer would need to be retried on
> average 30000 times... Not realistic. If that were the case, we'd
> surely hit a maximum retry limit every now and then?
I had something somewhat similar happen on an Areca RAID card with four disks
in RAID5. The first symptom was that the machine was extremely slow, that
tracked down to IO being slow. By looking at the IO pattern it became apparent
that it was very bursty, it did a few requests and then froze for about 30
seconds and then did a few requests again.
It was tracked down to one of the disks being faulty in a way that did not get
it dropped out of the array. In this case when the machine was frozen and not
doing any IO the activity led on the faulty disk was constantly on, when it
came off a burst of IO happened.
I'm not sure what kind of a disk failure this was caused by, but you could
test for it either by simply monitoring the activity leds (may not show
anything in all cases, I don't know) or removing the disks one by one and
testing if the problem disappears. I didn't get much log output in this case,
I think the Areca driver was occasionally complaining about timeouts while
communicating with the controller.
--
Arto Jantunen
next parent reply other threads:[~2010-12-21 12:40 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <fa.C+PyZdFdHUxRFDJDF3KlrfaJASk@ifi.uio.no>
2010-12-21 12:29 ` Arto Jantunen [this message]
2010-12-20 14:15 Slow disks Rogier Wolff
2010-12-20 18:06 ` Bruno Prémont
2010-12-20 18:32 ` Greg Freemyer
2010-12-22 10:43 ` Rogier Wolff
2010-12-22 15:59 ` Greg Freemyer
2010-12-22 16:27 ` Jeff Moyer
2010-12-22 22:44 ` Rogier Wolff
2010-12-23 14:40 ` Jeff Moyer
2010-12-23 17:01 ` Rogier Wolff
2010-12-23 17:47 ` Jeff Moyer
2010-12-23 18:51 ` Greg Freemyer
2010-12-23 19:10 ` Jaap Crezee
2010-12-23 22:09 ` Greg Freemyer
2010-12-24 11:40 ` Rogier Wolff
2010-12-26 23:05 ` Greg Freemyer
2010-12-27 0:27 ` Rogier Wolff
2010-12-24 10:45 ` Rogier Wolff
2010-12-23 17:05 ` Jaap Crezee
2010-12-26 23:38 ` Mark Knecht
2010-12-27 0:34 ` Rogier Wolff
2010-12-27 3:12 ` Mark Knecht
2010-12-27 18:20 ` Krzysztof Halasa
2010-12-24 13:01 ` Krzysztof Halasa
2010-12-24 15:24 ` Michael Tokarev
2010-12-24 20:58 ` Krzysztof Halasa
2010-12-25 12:14 ` Rogier Wolff
2010-12-25 12:19 ` Mikael Abrahamsson
2010-12-25 18:12 ` Jaap Crezee
2010-12-25 21:28 ` Michael Tokarev
2010-12-26 21:40 ` Rogier Wolff
2010-12-26 23:17 ` Greg Freemyer
2010-12-26 23:49 ` Rogier Wolff
2010-12-26 22:07 ` Niels
2010-12-27 10:56 ` Tejun Heo
2010-12-20 19:09 ` Jeff Moyer
2010-12-22 20:52 ` David Rees
2010-12-22 22:46 ` Rogier Wolff
2010-12-22 23:13 ` David Rees
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871v5bnwas.fsf@viiru.iki.fi \
--to=viiru@debian.org \
--cc=R.E.Wolff@BitWizard.nl \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox