linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Keld Jørn Simonsen" <keld@dkuug.dk>
To: David Rees <drees76@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: raind-1 resync speed slow down to 50% by the time it finishes
Date: Sat, 1 Aug 2009 19:57:34 +0200	[thread overview]
Message-ID: <20090801175734.GB27831@rap.rap.dk> (raw)
In-Reply-To: <72dbd3150908010813n3aed3804s3a0c0265cb2c2c6d@mail.gmail.com>

On Sat, Aug 01, 2009 at 08:13:45AM -0700, David Rees wrote:
> 2009/8/1 Keld Jørn Simonsen <keld@dkuug.dk>:
> > On Fri, Jul 31, 2009 at 01:10:37PM -0700, David Rees wrote:
> >> Let's use some data from a real disk, the Velociraptor and a 2-disk
> >> array and streaming reads/writes.  At the beginning of the disk you
> >> can read about 120MB/s.  At the end of the disk, you can read about
> >> 80MB/s.
> >
> > This is not actual figures from some benchmarking you did, true?
> 
> Those are actual numbers from a Velociraptor, but the numbers are just
> estimates.
> 
> >> Data on the "beginning" of array, RAID0 = 240MB/s
> >> Data on the "end" of array, RAID0 = 160MB/s.
> >> Data on the "beginning" of array, RAID10,n2 = 120MB/s
> >> Data on the "end" of array, RAID10,n2 = 80MB/s.
> >> Data on the "beginning" of array, RAID10,f2 = 200MB/s
> >
> > Should be:
> >
> > Data on the "beginning" of array, RAID10,f2 = 230MB/s
> 
> No - you're getting 120 MB/s from one disk and 80MB/s from another.
> How that would add up to 230MB/s defies logic...

Why only 80 MB/ when reading? reading from both disks with raid10,f2 are done at the
beginning of both disks, thus getting about 115 MB/s from both of them.

> >> With a f2 setup you'll read at something less than 120+80 = 200MB/s.
> >
> > When? at the beginning or the end?
> 
> The whole thing, on average.  But the whole point of f2 is to even out
> performance from beginning of the array and let you stripe reads.
> 
> > Random read performance on a single disk, in one test, was 34 MB/s while
> > seq read on same disk was 82. In raid10,f2 with 2 disks random read was
> > 79 MB/s. This is 235 % of the random read on one disk (34 MB/s). This is
> > a likely result as you should expect a doubling in speed from the 2
> > disks, and then som additional speed from the faster sectors of the
> > outer disks, and then the shorter access times on the oyuter disk
> > sectors. Geometry says that on average the transfer speeds are 17 %
> > shorter on the outer half part of the disk, compared to the whole disk.
> > So that gives some 235 % speed improvement (2 * 1.17). The head
> > improvement should also give a little, but maybe the elevator algorithm
> > of the file system eliminates most of that factor.
> 
> Sorry - I'm having a hard time wrapping my head around that you can
> simply ignore access to the slow half the disk in a multi-threaded
> random IO test. 

reading in raid10,f2 is restricted to the faster half of the disk, by
design.

It is different when writing. there both halves, fast and slow, are
used.

> The only way I might believe that you can get 235%
> improvement is in a single threaded test with a queue depth of 1 which
> lets the f2 setup only use the fast half the disks. 

The test was for a multi-threaded test, with many processes running, say
about 200 processes. The test was set up to mimick a ftp mirror.

> If that is your
> assumption, then, OK.  But then getting 34MB/s isn't out of a rotating
> disk isn't random IO, either.  Random IO on a rotating disk is
> normally an order of magnitude slower.

Agreed. The 34 MB/s is random io in a multi-thread environment. and an elevator
algorithm is in operation. 

If you only do the individual random reading in a single thread, it
would be much slower. However, the same speedups will occur for
raid10,f2. There will be a double up from reading from 2 disks at the
same time, and only using the faster half of the disks will both make a
better overall transfer rate, and quicker access times.

> >> >> 1. The array mostly sees write activity, streaming reads aren't that common.
> >> >> 2. I can only get about 120 MB/s out of the external enclosure because
> >> >> of the PCIe card [1] , so being able to stripe reads wouldn't help get
> >> >> any extra performance out of those disks.
> >> >> [1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124
> >> >
> >> > Hmm, a pci-e x1 should be able to get 2.5 Mbit/s = about 300 MB/s.
> >> > Wikipedia says 250 MB/s. It is strange that you only can get 120 MB/s.
> >> > That is the speed of a PCI 32 bit bus. I looked at your reference [1]
> >> > for the 3132 model. Have you tried it out in practice?
> >>
> >> Yes, in practice, IO reached exactly 120MB/s out of the controller.  I
> >> ran dd read/write tests on individual disks and found that overall
> >> throughput peaked exactly at 120MB/s.
> >
> > Hmm, get another controller, then. A cheap PCIe contoller should be able
> > to do about 300 MB/s on a x1 PCIe.
> 
> Please read my reference again.  It's a motherboard limitation.  I
> already _have_ a good, cheap PCIe controller.

OK, I read:
[1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124
as being the description of the PCIe controller, especially SIL 3132 -
the PCIe controller. And that this was restricted to 120 MB/s - not the
mobo. Anuway, yuo could get a new mobo, they are  cheap these days and
many of them come with either 4 or 8 SATA interfaces. If you have bought
Velociraptors then it must be for the speed, and quite cheap mobos could
enhance your performance considerably.

> >> But the primary reason I built it was to handle lots of small random
> >> writes/reads, so being limited to 120MB/s out of the enclosure isn't
> >> noticeable most of the time in practice as you say.
> >
> > Yes, for random read/write you only get something like 45 % out of the
> > max transfer bandwidth. So 120 MB/s would be close to the max that your
> > 5 disks on the PCIe controller can deliver. With a faster PCIe
> > controller you should be able to get better performance on random reads
> > with raid10,f2. Anyway 180 MB/s may be fast enough for your application.
> 
> Again - your idea of "random" IO is completely different than mine.
> My random IO workloads can only get a couple MB/s out of a single
> disk.

yes, it seems we have different usage scenarios. I am serving reasonably
big files, say 700 MB ISO images, or .rpm packages of several MBs, you are
probably doing some database access.

> Here's a benchmark which tests SSDs and rotational disks.  All the
> rotational disks are getting less than 1MB/s in the random IO test.
> http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=25  It's a
> worst case scenario, but not far from my workloads which obviously
> read a bit more data on each read.

What are your average read or write block sizes? Is it some database
usage?

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2009-08-01 17:57 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-30  6:25 raind-1 resync speed slow down to 50% by the time it finishes Tirumala Reddy Marri
2009-07-30  7:35 ` Robin Hill
2009-07-30 10:18   ` Keld Jørn Simonsen
2009-07-30 20:11     ` David Rees
2009-07-31 17:54       ` Keld Jørn Simonsen
2009-07-31 18:10         ` Keld Jørn Simonsen
2009-07-31 20:10         ` David Rees
2009-08-01 13:00           ` Keld Jørn Simonsen
2009-08-01 15:13             ` David Rees
2009-08-01 17:57               ` Keld Jørn Simonsen [this message]
2009-08-04 22:21                 ` David Rees
2009-08-04 23:18                   ` John Robinson
2009-08-04 23:42                     ` David Rees
2009-08-05  8:20                       ` Keld Jørn Simonsen
2009-08-05  8:08                     ` Keld Jørn Simonsen
2009-08-05  7:44                   ` Keld Jørn Simonsen
2009-08-05  8:18                     ` NeilBrown
2009-07-30  8:44 ` Mikael Abrahamsson
2009-07-30 18:35 ` Tracy Reed
2009-07-30 20:28   ` David Rees

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090801175734.GB27831@rap.rap.dk \
    --to=keld@dkuug.dk \
    --cc=drees76@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).