public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* RE: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors
       [not found]       ` <20080609142717.GB24950@rap.rap.dk>
@ 2008-06-09 14:56         ` David Lethe
  2008-06-09 23:15           ` Keld Jørn Simonsen
  0 siblings, 1 reply; 2+ messages in thread
From: David Lethe @ 2008-06-09 14:56 UTC (permalink / raw)
  To: Keld Jørn Simonsen
  Cc: thomas62186218, dan.j.williams, jpiszcz, linux-kernel, linux-raid,
	xfs, ap



-----Original Message-----
From: Keld Jørn Simonsen [mailto:keld@dkuug.dk] 
Sent: Monday, June 09, 2008 9:27 AM
To: David Lethe
Cc: thomas62186218@aol.com; dan.j.williams@gmail.com; jpiszcz@lucidpixels.com; linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org; xfs@oss.sgi.com; ap@solarrain.com
Subject: Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors

On Mon, Jun 09, 2008 at 08:41:18AM -0500, David Lethe wrote:
> For faster random I/O:
>  * Decrease chunk size
>  * Migrate files that have higher random I/O to a RAID1 set, using disks
> with the lowest access time/latency
>  * If possible, use the /dev/shm file system 
>  * Determine I/O size of apps that produce most of the random I/O, and
> make sure that md+filesystem matches. If most random I/O is 32KB, then
> don't waste bandwidth by making md read 256KB at a time, or making it
> read 2x16KB I/Os. Also don't build md sets like 4-drive RAID5, (Do a
> 5-drive RAID5 set), because non-parity data isn't a multiple of 2. A
> 10-drive RAID5 set with heavy random I/O is also profoundly wrong
> because you are just removing the opportunity to have all of those heads
> processing random I/O. 
>  * If you only have one partition on a md set, then partition it into a
> few file systems. This may provide greater opportunity for caching I/Os.
>  * Experiment with different file systems, and optimize accordingly.  
>  * Turn of journaling, or at least move journals to RAID1 devices.
>  * Add RAM and try to increase buffer cache in attempt to improve cache
> hit percentage (this works up to a point)
>  * Buy a small SSD and migrate files that get pounded with random I/O to
> that device. (Make sure you don't get a flash SSD, but a DRAM based SSD
> that satisfies random I/O in nanoseconds instead of millisecs). They are
> expensive, but the appropriate device.  This is how companies such as
> Google & Ebay manage to get things done. 
> The biggest thing to remember about random I/O, is that they are
> expensive, so just step back and think about ways to minimize the I/O
> requests to disk in the first place, and/or to spread the I/O across
> multiple raidsets that can work independently to satisfy your load.  All
> suggestions above will not work for everybody. You must understand the
> nature of the bottleneck. 


For faster random IO I would suggest to use raid10,f2 for the random
reading, it performs like raid0, something like more than double the
speed of a normal single-drive file system. For random writes raid10,f2
performs like most other mirrorred raids, given that data needs to be
written twice.

Try and see if you can gat any HW raids to match that performance.

best regards
keld

--------------------------------------------------------------------------------
Keld:
That is counter-intuitive. The issue is random IOPs, not throughput. I do not 
understand how a RAID10 would provide more IOs per sec than RAID1. Or, since
you are using RAID10, then how could RAID10 serve more random I/Os then a pair
of RAID1 filesystems?  RAID0 dictates that each disk will supply half 
of the data you want per application I/O request. At least with RAID1, then each
disk can get all the data you want with a single request, and dual-porting/load balancing 
will allow both disks to work independently of each other on reads so the disk with
the least amount of load at any time can work on the request. That is why RAID1 can be
faster than JBOD.

Granted writes are handled differently, but with any RAID0 implementation you still have to write
Half of the data to each disk requiring 2 I/Os + journaling & housekeeping.


David

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors
  2008-06-09 14:56         ` Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors David Lethe
@ 2008-06-09 23:15           ` Keld Jørn Simonsen
  0 siblings, 0 replies; 2+ messages in thread
From: Keld Jørn Simonsen @ 2008-06-09 23:15 UTC (permalink / raw)
  To: David Lethe
  Cc: thomas62186218, dan.j.williams, jpiszcz, linux-kernel, linux-raid,
	xfs, ap

On Mon, Jun 09, 2008 at 09:56:14AM -0500, David Lethe wrote:
> 
> 
> From: Keld Jørn Simonsen [mailto:keld@dkuug.dk] 
> Sent: Monday, June 09, 2008 9:27 AM
> To: David Lethe
> Cc: thomas62186218@aol.com; dan.j.williams@gmail.com; jpiszcz@lucidpixels.com; linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org; xfs@oss.sgi.com; ap@solarrain.com
> Subject: Re: Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors
> 
> For faster random IO I would suggest to use raid10,f2 for the random
> reading, it performs like raid0, something like more than double the
> speed of a normal single-drive file system. For random writes raid10,f2
> performs like most other mirrorred raids, given that data needs to be
> written twice.
> 
> Try and see if you can gat any HW raids to match that performance.
> 
> best regards
> keld
> 
> --------------------------------------------------------------------------------
> Keld:
> That is counter-intuitive. The issue is random IOPs, not throughput.

That probably depends on your use. I run Linux mirrors, and for that
purpose thruputi of random IO, especially reading, is key.

For data bases it is probably something else, probably IOP. here I also
think that Linux MD raid has good performance. Once again I think my pet
RAID type, raid10,f2 has something to offer, especially with lower
random seek rates, as the track span is shorter, and on the outer,
faster tracks.

And other uses may have other bottlenecks. In general I think that
thruput is an important figure, as it shows how fast a system can
process a given amount of data. Areas where this may count include web servers,
file servers, print servers, ordinary workstations.

I actually think those 2 measures for random IO: IO thruput, and IO transactions per
second, for read and write, are the two most important measures. 

For the IO transacions per second I agree that your suggestions are good
advice.

I would like to have good benchmarking tools for this, and also I would
like figures on how Linux MD compares to different HW RAID.

> I do not 
> understand how a RAID10 would provide more IOs per sec than RAID1. Or, since
> you are using RAID10, then how could RAID10 serve more random I/Os then a pair
> of RAID1 filesystems? 

In theory you are right. The MD implementation of RAID1 does not seem to
handle random seeks so well, AFAIK. Then the seeks are confined with
raid10,f2 to less than half of the disk arm movement, taht does speed
things up a little.

> RAID0 dictates that each disk will supply half 
> of the data you want per application I/O request. At least with RAID1, then each
> disk can get all the data you want with a single request, and dual-porting/load balancing 
> will allow both disks to work independently of each other on reads so the disk with
> the least amount of load at any time can work on the request. That is why RAID1 can be
> faster than JBOD.
> 
> Granted writes are handled differently, but with any RAID0 implementation you still have to write
> Half of the data to each disk requiring 2 I/Os + journaling & housekeeping.

yes, indeed.

best regards
keld

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-06-09 23:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <alpine.DEB.1.10.0806071015110.23323@p34.internal.lan>
     [not found] ` <e9c3a7c20806071846w60b74aeegdf3e39afb36b5d9a@mail.gmail.com>
     [not found]   ` <8CA981CB5C2B4D6-E68-18E2@MBLK-M14.sysops.aol.com>
     [not found]     ` <A20315AE59B5C34585629E258D76A97CA535E9@34093-C3-EVS3.exchange.rackspace.com>
     [not found]       ` <20080609142717.GB24950@rap.rap.dk>
2008-06-09 14:56         ` Linux MD RAID 5 Benchmarks Across (3 to 10) 300 Gigabyte Veliciraptors David Lethe
2008-06-09 23:15           ` Keld Jørn Simonsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox