Re: LVM on raid10,f2 performance issues

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Keld Jørn Simonsen" <keld@dkuug.dk>
To: thomas62186218@aol.com
Cc: soltys@ziu.info, mauermann@gmail.com, linux-raid@vger.kernel.org
Subject: Re: LVM on raid10,f2 performance issues
Date: Mon, 19 Jan 2009 13:17:24 +0100	[thread overview]
Message-ID: <20090119121724.GA23623@rap.rap.dk> (raw)
In-Reply-To: <8CB47EBD70FE9E8-AC4-981@WEBMAIL-DG06.sim.aol.com>

Hmm, 

Why is the command

 blockdev --setra 65536 /dev/md0

really needed? I think the kernel should set a reasonable default here.

What is the logic? I am in the following trying to discuss what would be
reasonable for a kernel patch to achieve.

The command sets readahead to 32 MiB . Is that really wanted?
I understand that it really is important for our benchmarks to give good
results. But is it useful in real operation? Or can a smaller value
solve the problem?

reading 32 MB takes about 300 - 500 ms - and this needs to be done for
every read, even for small reads. That is a lot. For database operations
this would limit operations to 2 to 3 transactions per second. A normal
7200 rpm drive is capable of say about 100 tps, so this would slow such
transactions down with a factor of 30 to 50...

maybe a parameter to blockdev of 16384 - or 8 MiB - would be sufficient?
This would limit the time spent on each transaction to about 100 ms.

And this could be dependent on the relevant parameters, say drive
numbers and chunk size. Maybe the trick is to read a full stripe set,
that is number of drives times chunk size. For a 4 drive array with
chunk size 256 KiB this would be 1 MiB or a --setra paramaeter of 2048.

Maybe the trick is to read more stripe sets at the same time.
For raid5 and raid6 reads the parity chunks need not be read so it
would be a waiste to read the full stripe set.
I am not fully sure what is going on. Maybe somebody can enlighten me.

Or maybe the readahead is not the real parameter that needs to be set
correctly - but maybe something else needs to be fixed, maybe some
logic.

best regards
keld

On Sun, Jan 18, 2009 at 08:24:42PM -0500, thomas62186218@aol.com wrote:
> Hi everyone,
> 
> I too was seeing miserable read-performance with LVM2 volumes on top of 
> md RAID 10's on my Ubuntu 8.04 64-bit machine. My RAID 10 has 12 x 
> 300GB 15K SAS drives on a 4-port LSI PCIe SAS controller.
> 
> I use:
> blockdev --setra 65536 /dev/md0
> 
> And this dramatically increased my RAID 10 read performance.
> 
> You MUST do the same for your LVM2 volumes for them to see a comparable 
> performance boost.
> 
> blockdev --setra 65536 /dev/mapper/raid10-testvol
> 
> Otherwise, your LVM will default to 256 read-ahead value, which stinks. 
> I increased my read performance by 3.5x with this one change! See below:
> 
> root@b410:~# dd if=/dev/raid10twelve256k/testvol of=/dev/null bs=1M 
> count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 50.8923 s, 206 MB/s
> 
> root@b410:~# blockdev --setra 65536 /dev/mapper/raid10twelve256k-testvol
> 
> root@b410:~# dd if=/dev/raid10twelve256k/testvol of=/dev/null bs=1M 
> count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 14.4057 s, 728 MB/s
> 
> 
> Enjoy!
> -Thomas
> 
> -----Original Message-----
> From: Michal Soltys <soltys@ziu.info>
> To: Holger Mauermann <mauermann@gmail.com>
> Cc: Keld Jørn Simonsen <keld@dkuug.dk>; linux-raid@vger.kernel.org
> Sent: Wed, 3 Dec 2008 1:43 am
> Subject: Re: LVM on raid10,f2 performance issues
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Holger Mauermann wrote: 
> 
> >Keld Jørn Simonsen schrieb: 
> 
> >>How is it if you use t
> he raid10,f2 without lvm? 
> 
> >>What are the numbers? 
> 
> >
> >After a fresh installation LVM performance is now somewhat better. I 
> 
> >don't know what was wrong before. However, it is still not as fast as 
> 
> >the raid10... 
> 
> >
> >dd on raw devices 
> 
> >----------------- 
> 
> >
> >raid10,f2: 
> 
> >  read : 409 MB/s 
> 
> >  write: 212 MB/s 
> 
> >
> >raid10,f2 + lvm: 
> 
> >  read : 249 MB/s 
> 
> >  write: 158 MB/s 
> 
> >
> >
> >sda:  sdb:  sdc:  sdd: 
> 
> >---------------------- 
> 
> >YYYY  ....  ....  XXXX 
> 
> >....  ....  ....  .... 
> 
> >XXXX  YYYY  ....  .... 
> 
> >....  ....  ....  .... 
>  
> 
>  
> 
> Regarding the layout from your first mail - this is how it's supposed 
> to
> be. LVM's header took 3*64KB (you can control that with --metadatasize,
> and verify with e.g. pvs -o+pe_start), and then the first 4MB extent
> (controlled with --physicalextentsize) of the first logical volume
> started - on sdd and continued on sda. Mirrored data was set "far" from
> that, and shifted one disk to the right - as expected from raid10,f2. 
>  
> 
> As for performance, hmmm. Overally - there're few things to consider
> when doing lvm on top of the raid: 
>  
> 
> - stripe vs. extent alignment 
> 
> - stride vs. stripe vs. extent size 
> 
> - filesystem's awareness 
> that there's also raid a layer below 
> 
> - lvm's readahead (iirc, only uppermost layer matters - functioning as 
> a
> hint for the filesystem) 
>  
> 
> But this is particulary important for raid with parities. Here
> everything is aligned already, and parity doesn't exist. 
>  
> 
> But the last point can be relevant - and you did test with filesystem
> after all. Try setting readahead with blockdev or lvchange (the latter
> will be permananet across lv activations). E.g. 
>  
> 
> #lvchange -r 2048 /dev/mapper... 
>  
> 
> and compare to raw raid10: 
>  
> 
> #blockedv --setra 2048 /dev/md... 
>  
> 
> If you did your tests with ext2/3, also try to create it with -E 
> stride=
> stripe-width= option in both cases. Similary to sunit/swidth if you 
> used
> xfs. 
>  
> 
> You might also create volume group with larger extent - such as 512MB
> (as 4MB granularity is often an overkill). Performance wise it 
> shouldn't
> matter in this case though. 
>  
> 
> -- 
> 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" 
> in 
> 
> the body of a message to majordomo@vger.kernel.org 
> 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 
> 
> 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2009-01-19 12:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-01  0:00 LVM on raid10,f2 performance issues Holger Mauermann
2008-12-01 16:42 ` Keld Jørn Simonsen
2008-12-02 23:28   ` Holger Mauermann
2008-12-03  7:15     ` Keld Jørn Simonsen
2008-12-03  9:43     ` Michal Soltys
2009-01-19  1:24       ` thomas62186218
2009-01-19  7:28         ` Peter Rabbitson
2009-01-26 19:06           ` Bill Davidsen
2009-01-19  7:30         ` Michal Soltys
2009-01-19 12:17         ` Keld Jørn Simonsen [this message]
2009-01-19 12:24           ` Peter Rabbitson
2009-01-19 13:59             ` Keld Jørn Simonsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090119121724.GA23623@rap.rap.dk \
    --to=keld@dkuug.dk \
    --cc=linux-raid@vger.kernel.org \
    --cc=mauermann@gmail.com \
    --cc=soltys@ziu.info \
    --cc=thomas62186218@aol.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).