Linux RAID subsystem development
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: Andrei Banu <andrei.banu@redhost.ro>
Cc: linux-raid@vger.kernel.org
Subject: Re: Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO
Date: Mon, 22 Apr 2013 21:51:06 -0500	[thread overview]
Message-ID: <5175F71A.8030805@hardwarefreak.com> (raw)
In-Reply-To: <5a749bb56c88b6d6b8a806694b163bda@redhost.ro>

On 4/22/2013 5:19 AM, Andrei Banu wrote:
> Hello!
> 
> First off allow me to apologize if my rumbling sent you in a wrong
> direction and thank you for assisting.

No harm done, and you're welcome.

> The actual problem is that when I write any larger file hundreds of MB
> or more to the server (from network or from the same server) the server
> starts to overload. The server can overload to over 100 for files of ~
> 5GB. I mean this server has an average load of 0.52 (sar -q) but it can
> spike to 3 digit server loads in a few minutes from making or
> downloading a larger cPanel backup file. I have to rely only on R1Soft
> for backups right now because the normal cPanel backups make the server
> unstable when it backs up accounts over 1GB (many).

Describing this problem in terms of load average isn't very helpful.
What would be is 'perf top -U' output so we can see what is eating cpu,
simultaneously with 'iotop' so we see what's eating IO.

> So I concluded this is due to very low write speeds so I ran the 'dd'

It's most likely that the low disk throughput is a symptom of the
problem, which is lurking elsewhere awaiting discovery.

> 1. Some said the low write speed might be due to a bad cable. 

Very unlikely, but possible.  This is easy to verify.  Does dmesg show
hundreds of "hard resetting link" messages.

> 2. I have observed a very big difference between /dev/sda and /dev/sdb
> and I thought it might me indicative of a problem somewhere. If I run
> hdparm -t /dev/sda I get about 215MB/s but on /dev/sdb I get about
> 80-90MB/s. Only if I add --direct flag I get 260MB/s for /dev/sda.
> Previously when I added --direct for /dev/sdb I was getting about
> 180MB/s but now I get ~85MB/s with or without --direct.

I simply chalked up the difference to IO load variance between test runs
of hdparm.  If one SSD is always that much slower there may be a problem
with the drive or controller but it's not likely.  If you haven't
already, swap the cable on the slow drive with new one.  In fact, SATA
cables are cheap as dirt so I'd swap them both just for piece of mind.

> root [/]# hdparm -t /dev/sdb
> Timing buffered disk reads:  262 MB in  3.01 seconds =  86.92 MB/sec
> 
> root [/]# hdparm --direct -t /dev/sdb
> Timing O_DIRECT disk reads:  264 MB in  3.08 seconds =  85.74 MB/sec
...
> This is something new. /dev/sdb no longer gets to nearly 200MB/s (with
> --direct) but stays under 100MB/s in all cases. Maybe indeed it's a
> problem with the cable or with the device itself.
...
> And a 30 minutes later update: /dev/sdb returned to 90MB/s read speed
> WITHOUT --direct and 180MB/s WITH --direct. /dev/sda is constant (215
> without --direct and 260 with --direct). What do you make of this?

Show your partition tables again.  My gut instinct tells me you have a
swap partition on /dev/sdb, and/or some other partition that is not part
of the RAID1, nor equally present on /dev/sda, that is/are being
accessed heavily at some times and not others, thus the throughput
discrepancy.

If this is the case, and the kernel is low on RAM due to an application
memory leak or just normal process load, that swap partition may become
critical.  When when you start $big_file copy, the kernel goes into
overdrive swapping and/or dropping cache to make room for $big_file in
the write buffers.  This could explain both your triple digit system
load and the decreased throughput on /dev/sdb.

The fdisk output you provided previously showed only 3 partitions per
SSD, all RAID autodetect, all in md/RAID1 I assume.  However, the
symptoms you're reporting tend to suggest the partition layout I just
described, and could be responsible for the odd up/down throughput on sdb.

-- 
Stan


  reply	other threads:[~2013-04-23  2:51 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-19 22:58 Incredibly poor performance of mdraid-1 with 2 SSD Samsung 840 PRO Andrei Banu
     [not found] ` <CAH3kUhEaZGON=fAyVMZOz5fH_DcfKv=hCa96UCeK4pN7k81c_Q@mail.gmail.com>
2013-04-20 23:26   ` Andrei Banu
     [not found]   ` <51725458.7020109@redhost.ro>
     [not found]     ` <CAH3kUhHxBiqugFQm=PPJNNe9jOdKy0etUjQNsoDz_LJNUCLCCQ@mail.gmail.com>
2013-04-20 23:25       ` Andrei Banu
2013-04-20 23:26       ` Andrei Banu
2013-04-21  2:48         ` Stan Hoeppner
2013-04-21 12:23           ` Tommy Apel
2013-04-21 16:48             ` Tommy Apel
2013-04-21 19:33             ` Stan Hoeppner
2013-04-21 19:56               ` Tommy Apel
2013-04-22  0:47                 ` Stan Hoeppner
2013-04-22  7:51                   ` Tommy Apel
2013-04-22  8:29                     ` Tommy Apel
2013-04-22 10:26                     ` Andrei Banu
2013-04-22 12:02                       ` Tommy Apel
2013-04-23  2:59                         ` Stan Hoeppner
2013-04-22 23:21                     ` Stan Hoeppner
2013-04-25 11:38         ` Thomas Jarosch
2013-04-21  0:10 ` Stan Hoeppner
     [not found] ` <51732E2B.6090607@hardwarefreak.com>
2013-04-21 20:46   ` Andrei Banu
2013-04-21 23:17     ` Stan Hoeppner
2013-04-22 10:19       ` Andrei Banu
2013-04-23  2:51         ` Stan Hoeppner [this message]
2013-04-23 10:17           ` Andrei Banu
2013-04-24  3:24             ` Stan Hoeppner
2013-04-24  8:26               ` Andrei Banu
2013-04-24  9:12                 ` Adam Goryachev
2013-04-24 10:24                   ` Tommy Apel
2013-04-24 21:42                     ` Andrei Banu
2013-04-24 21:40                   ` Andrei Banu
2013-04-24 16:37                 ` Stan Hoeppner
2013-04-24 21:46                   ` Andrei Banu
     [not found]                     ` <CAH3kUhHnF0imY=CAHfzaQy4XJuOMgOtbHNp17EYzeSJR2en7Fg@mail.gmail.com>
2013-04-25 10:11                       ` Andrei Banu
2013-04-25 10:56                     ` Stan Hoeppner
2013-04-22 23:11       ` Andrei Banu
2013-04-23  4:39         ` Stan Hoeppner
2013-04-22 23:25       ` Stan Hoeppner
2013-04-23  4:49         ` Mikael Abrahamsson
2013-04-23  6:01 ` Stan Hoeppner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5175F71A.8030805@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=andrei.banu@redhost.ro \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox