linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* LVM on raid10,f2 performance issues
@ 2008-12-01  0:00 Holger Mauermann
  2008-12-01 16:42 ` Keld Jørn Simonsen
  0 siblings, 1 reply; 12+ messages in thread
From: Holger Mauermann @ 2008-12-01  0:00 UTC (permalink / raw)
  To: linux-raid

When I setup LVM on top of a 4 disk RAID 10 with f2 layout read/write
performance is really worse - it's far below single drive speed...

I did some testing with dd and hexdump and I noticed that after writing
a 128k file (with 64k "X" and 64k "Y") to the lvol the data on the raw
disks looks somewhat weird:

sda:  sdb:  sdc:  sdd:
----------------------
YYYY  ....  ....  XXXX
....  ....  ....  ....
XXXX  YYYY  ....  ....
....  ....  ....  ....

Any hints how to tune this? With LVM on a standard RAID 10 (n2) the data
appears as expected on the disks:

sda:  sdb:  sdc:  sdd:
----------------------
YYYY  YYYY  XXXX  XXXX
....  ....  ....  ....
....  ....  ....  ....
....  ....  ....  ....



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LVM on raid10,f2 performance issues
  2008-12-01  0:00 LVM on raid10,f2 performance issues Holger Mauermann
@ 2008-12-01 16:42 ` Keld Jørn Simonsen
  2008-12-02 23:28   ` Holger Mauermann
  0 siblings, 1 reply; 12+ messages in thread
From: Keld Jørn Simonsen @ 2008-12-01 16:42 UTC (permalink / raw)
  To: Holger Mauermann; +Cc: linux-raid

On Mon, Dec 01, 2008 at 01:00:32AM +0100, Holger Mauermann wrote:
> When I setup LVM on top of a 4 disk RAID 10 with f2 layout read/write
> performance is really worse - it's far below single drive speed...

How is it if you use the raid10,f2 without lvm?


What are the numbers?

Did you use it with a live file system, or with just the raw raid?

Live fs performance is much better than raw fs, in practice the raw fs
performance is not really related to real life for raid10,f2.
This because the elevator enhances operation quite a lot.

best regards
keld


> I did some testing with dd and hexdump and I noticed that after writing
> a 128k file (with 64k "X" and 64k "Y") to the lvol the data on the raw
> disks looks somewhat weird:
> 
> sda:  sdb:  sdc:  sdd:
> ----------------------
> YYYY  ....  ....  XXXX
> ....  ....  ....  ....
> XXXX  YYYY  ....  ....
> ....  ....  ....  ....
> 
> Any hints how to tune this? With LVM on a standard RAID 10 (n2) the data
> appears as expected on the disks:
> 
> sda:  sdb:  sdc:  sdd:
> ----------------------
> YYYY  YYYY  XXXX  XXXX
> ....  ....  ....  ....
> ....  ....  ....  ....
> ....  ....  ....  ....
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LVM on raid10,f2 performance issues
  2008-12-01 16:42 ` Keld Jørn Simonsen
@ 2008-12-02 23:28   ` Holger Mauermann
  2008-12-03  7:15     ` Keld Jørn Simonsen
  2008-12-03  9:43     ` Michal Soltys
  0 siblings, 2 replies; 12+ messages in thread
From: Holger Mauermann @ 2008-12-02 23:28 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

Keld Jørn Simonsen schrieb:
> How is it if you use the raid10,f2 without lvm?
> What are the numbers?

After a fresh installation LVM performance is now somewhat better. I
don't know what was wrong before. However, it is still not as fast as
the raid10...

dd on raw devices
-----------------

raid10,f2:
  read : 409 MB/s
  write: 212 MB/s

raid10,f2 + lvm:
  read : 249 MB/s
  write: 158 MB/s


Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LVM on raid10,f2 performance issues
  2008-12-02 23:28   ` Holger Mauermann
@ 2008-12-03  7:15     ` Keld Jørn Simonsen
  2008-12-03  9:43     ` Michal Soltys
  1 sibling, 0 replies; 12+ messages in thread
From: Keld Jørn Simonsen @ 2008-12-03  7:15 UTC (permalink / raw)
  To: Holger Mauermann; +Cc: linux-raid

On Wed, Dec 03, 2008 at 12:28:51AM +0100, Holger Mauermann wrote:
> Keld Jørn Simonsen schrieb:
> > How is it if you use the raid10,f2 without lvm?
> > What are the numbers?
> 
> After a fresh installation LVM performance is now somewhat better. I
> don't know what was wrong before. However, it is still not as fast as
> the raid10...
> 
> dd on raw devices
> -----------------
> 
> raid10,f2:
>   read : 409 MB/s
>   write: 212 MB/s
> 
> raid10,f2 + lvm:
>   read : 249 MB/s
>   write: 158 MB/s


What is the performance with raid0?

What is the performance with a filesystem?

Did you try 

blockdev --setra 65536 /dev/md3

How many disks are you using?

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LVM on raid10,f2 performance issues
  2008-12-02 23:28   ` Holger Mauermann
  2008-12-03  7:15     ` Keld Jørn Simonsen
@ 2008-12-03  9:43     ` Michal Soltys
  2009-01-19  1:24       ` thomas62186218
  1 sibling, 1 reply; 12+ messages in thread
From: Michal Soltys @ 2008-12-03  9:43 UTC (permalink / raw)
  To: Holger Mauermann; +Cc: Keld Jørn Simonsen, linux-raid

Holger Mauermann wrote:
> Keld Jørn Simonsen schrieb:
>> How is it if you use the raid10,f2 without lvm?
>> What are the numbers?
> 
> After a fresh installation LVM performance is now somewhat better. I
> don't know what was wrong before. However, it is still not as fast as
> the raid10...
> 
> dd on raw devices
> -----------------
> 
> raid10,f2:
>   read : 409 MB/s
>   write: 212 MB/s
> 
> raid10,f2 + lvm:
>   read : 249 MB/s
>   write: 158 MB/s
> 
> 
> sda:  sdb:  sdc:  sdd:
> ----------------------
> YYYY  ....  ....  XXXX
> ....  ....  ....  ....
> XXXX  YYYY  ....  ....
> ....  ....  ....  ....



Regarding the layout from your first mail - this is how it's supposed to 
be. LVM's header took 3*64KB (you can control that with --metadatasize, 
and verify with e.g. pvs -o+pe_start), and then the first 4MB extent 
(controlled with --physicalextentsize) of the first logical volume 
started - on sdd and continued on sda. Mirrored data was set "far" from 
that, and shifted one disk to the right - as expected from raid10,f2.

As for performance, hmmm. Overally - there're few things to consider 
when doing lvm on top of the raid:

- stripe vs. extent alignment
- stride vs. stripe vs. extent size
- filesystem's awareness that there's also raid a layer below
- lvm's readahead (iirc, only uppermost layer matters - functioning as a 
hint for the filesystem)

But this is particulary important for raid with parities. Here 
everything is aligned already, and parity doesn't exist.

But the last point can be relevant - and you did test with filesystem 
after all. Try setting readahead with blockdev or lvchange (the latter 
will be permananet across lv activations). E.g.

#lvchange -r 2048 /dev/mapper...

and compare to raw raid10:

#blockedv --setra 2048 /dev/md...

If you did your tests with ext2/3, also try to create it with -E stride= 
stripe-width= option in both cases. Similary to sunit/swidth if you used 
xfs.

You might also create volume group with larger extent - such as 512MB 
(as 4MB granularity is often an overkill). Performance wise it shouldn't 
matter in this case though.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LVM on raid10,f2 performance issues
  2008-12-03  9:43     ` Michal Soltys
@ 2009-01-19  1:24       ` thomas62186218
  2009-01-19  7:28         ` Peter Rabbitson
                           ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: thomas62186218 @ 2009-01-19  1:24 UTC (permalink / raw)
  To: soltys, mauermann; +Cc: keld, linux-raid

Hi everyone,

I too was seeing miserable read-performance with LVM2 volumes on top of 
md RAID 10's on my Ubuntu 8.04 64-bit machine. My RAID 10 has 12 x 
300GB 15K SAS drives on a 4-port LSI PCIe SAS controller.

I use:
blockdev --setra 65536 /dev/md0

And this dramatically increased my RAID 10 read performance.

You MUST do the same for your LVM2 volumes for them to see a comparable 
performance boost.

blockdev --setra 65536 /dev/mapper/raid10-testvol

Otherwise, your LVM will default to 256 read-ahead value, which stinks. 
I increased my read performance by 3.5x with this one change! See below:

root@b410:~# dd if=/dev/raid10twelve256k/testvol of=/dev/null bs=1M 
count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 50.8923 s, 206 MB/s

root@b410:~# blockdev --setra 65536 /dev/mapper/raid10twelve256k-testvol

root@b410:~# dd if=/dev/raid10twelve256k/testvol of=/dev/null bs=1M 
count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 14.4057 s, 728 MB/s


Enjoy!
-Thomas

-----Original Message-----
From: Michal Soltys <soltys@ziu.info>
To: Holger Mauermann <mauermann@gmail.com>
Cc: Keld Jørn Simonsen <keld@dkuug.dk>; linux-raid@vger.kernel.org
Sent: Wed, 3 Dec 2008 1:43 am
Subject: Re: LVM on raid10,f2 performance issues









Holger Mauermann wrote: 

> Keld Jørn Simonsen schrieb: 

>> How is it if you use t
he raid10,f2 without lvm? 

>> What are the numbers? 

>
> After a fresh installation LVM performance is now somewhat better. I 

> don't know what was wrong before. However, it is still not as fast as 

> the raid10... 

>
> dd on raw devices 

> ----------------- 

>
> raid10,f2: 

>   read : 409 MB/s 

>   write: 212 MB/s 

>
> raid10,f2 + lvm: 

>   read : 249 MB/s 

>   write: 158 MB/s 

>
>
> sda:  sdb:  sdc:  sdd: 

> ---------------------- 

> YYYY  ....  ....  XXXX 

> ....  ....  ....  .... 

> XXXX  YYYY  ....  .... 

> ....  ....  ....  .... 
 

 

Regarding the layout from your first mail - this is how it's supposed 
to
be. LVM's header took 3*64KB (you can control that with --metadatasize,
and verify with e.g. pvs -o+pe_start), and then the first 4MB extent
(controlled with --physicalextentsize) of the first logical volume
started - on sdd and continued on sda. Mirrored data was set "far" from
that, and shifted one disk to the right - as expected from raid10,f2. 
 

As for performance, hmmm. Overally - there're few things to consider
when doing lvm on top of the raid: 
 

- stripe vs. extent alignment 

- stride vs. stripe vs. extent size 

- filesystem's awareness 
that there's also raid a layer below 

- lvm's readahead (iirc, only uppermost layer matters - functioning as 
a
hint for the filesystem) 
 

But this is particulary important for raid with parities. Here
everything is aligned already, and parity doesn't exist. 
 

But the last point can be relevant - and you did test with filesystem
after all. Try setting readahead with blockdev or lvchange (the latter
will be permananet across lv activations). E.g. 
 

#lvchange -r 2048 /dev/mapper... 
 

and compare to raw raid10: 
 

#blockedv --setra 2048 /dev/md... 
 

If you did your tests with ext2/3, also try to create it with -E 
stride=
stripe-width= option in both cases. Similary to sunit/swidth if you 
used
xfs. 
 

You might also create volume group with larger extent - such as 512MB
(as 4MB granularity is often an overkill). Performance wise it 
shouldn't
matter in this case though. 
 

-- 

To unsubscribe from this list: send the line "unsubscribe linux-raid" 
in 

the body of a message to majordomo@vger.kernel.org 

More majordomo info at  http://vger.kernel.org/majordomo-info.html 





--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LVM on raid10,f2 performance issues
  2009-01-19  1:24       ` thomas62186218
@ 2009-01-19  7:28         ` Peter Rabbitson
  2009-01-26 19:06           ` Bill Davidsen
  2009-01-19  7:30         ` Michal Soltys
  2009-01-19 12:17         ` Keld Jørn Simonsen
  2 siblings, 1 reply; 12+ messages in thread
From: Peter Rabbitson @ 2009-01-19  7:28 UTC (permalink / raw)
  To: thomas62186218; +Cc: soltys, mauermann, keld, linux-raid

thomas62186218@aol.com wrote:
> Hi everyone,
> 
> I too was seeing miserable read-performance with LVM2 volumes on top of
> md RAID 10's on my Ubuntu 8.04 64-bit machine. My RAID 10 has 12 x 300GB
> 15K SAS drives on a 4-port LSI PCIe SAS controller.
> 
> I use:
> blockdev --setra 65536 /dev/md0
> 
> And this dramatically increased my RAID 10 read performance.
> 
> You MUST do the same for your LVM2 volumes for them to see a comparable
> performance boost.
> 
> blockdev --setra 65536 /dev/mapper/raid10-testvol
> 

This is incorrect. Only the readahead setting of the _last_ block device
matters. So in case you have a raid6 of 10 drives, with LUKS on top,
with LVM on top - only the readahead settings of the individual LVs
matter, nothing further down the chain is consulted.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LVM on raid10,f2 performance issues
  2009-01-19  1:24       ` thomas62186218
  2009-01-19  7:28         ` Peter Rabbitson
@ 2009-01-19  7:30         ` Michal Soltys
  2009-01-19 12:17         ` Keld Jørn Simonsen
  2 siblings, 0 replies; 12+ messages in thread
From: Michal Soltys @ 2009-01-19  7:30 UTC (permalink / raw)
  To: thomas62186218; +Cc: mauermann, keld, linux-raid

thomas62186218@aol.com wrote:
> Hi everyone,
> 
> I too was seeing miserable read-performance with LVM2 volumes on top of 
> md RAID 10's on my Ubuntu 8.04 64-bit machine. My RAID 10 has 12 x 300GB 
> 15K SAS drives on a 4-port LSI PCIe SAS controller.
> 
> I use:
> blockdev --setra 65536 /dev/md0
> 
> And this dramatically increased my RAID 10 read performance.
> 
> You MUST do the same for your LVM2 volumes for them to see a comparable 
> performance boost.
> 
> blockdev --setra 65536 /dev/mapper/raid10-testvol
> 
> Otherwise, your LVM will default to 256 read-ahead value, which stinks. 
> I increased my read performance by 3.5x with this one change! See below:
> 

Yea, that's the general idea, besides proper alignment. It wokred nice 
for Holger as well. Specifying (popular for some reason) 65536 is quite 
an overkill though, imho.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LVM on raid10,f2 performance issues
  2009-01-19  1:24       ` thomas62186218
  2009-01-19  7:28         ` Peter Rabbitson
  2009-01-19  7:30         ` Michal Soltys
@ 2009-01-19 12:17         ` Keld Jørn Simonsen
  2009-01-19 12:24           ` Peter Rabbitson
  2 siblings, 1 reply; 12+ messages in thread
From: Keld Jørn Simonsen @ 2009-01-19 12:17 UTC (permalink / raw)
  To: thomas62186218; +Cc: soltys, mauermann, linux-raid

Hmm, 

Why is the command

 blockdev --setra 65536 /dev/md0

really needed? I think the kernel should set a reasonable default here.

What is the logic? I am in the following trying to discuss what would be
reasonable for a kernel patch to achieve.

The command sets readahead to 32 MiB . Is that really wanted?
I understand that it really is important for our benchmarks to give good
results. But is it useful in real operation? Or can a smaller value
solve the problem?

reading 32 MB takes about 300 - 500 ms - and this needs to be done for
every read, even for small reads. That is a lot. For database operations
this would limit operations to 2 to 3 transactions per second. A normal
7200 rpm drive is capable of say about 100 tps, so this would slow such
transactions down with a factor of 30 to 50...

maybe a parameter to blockdev of 16384 - or 8 MiB - would be sufficient?
This would limit the time spent on each transaction to about 100 ms.

And this could be dependent on the relevant parameters, say drive
numbers and chunk size. Maybe the trick is to read a full stripe set,
that is number of drives times chunk size. For a 4 drive array with
chunk size 256 KiB this would be 1 MiB or a --setra paramaeter of 2048.

Maybe the trick is to read more stripe sets at the same time.
For raid5 and raid6 reads the parity chunks need not be read so it
would be a waiste to read the full stripe set.
I am not fully sure what is going on. Maybe somebody can enlighten me.

Or maybe the readahead is not the real parameter that needs to be set
correctly - but maybe something else needs to be fixed, maybe some
logic.

best regards
keld

On Sun, Jan 18, 2009 at 08:24:42PM -0500, thomas62186218@aol.com wrote:
> Hi everyone,
> 
> I too was seeing miserable read-performance with LVM2 volumes on top of 
> md RAID 10's on my Ubuntu 8.04 64-bit machine. My RAID 10 has 12 x 
> 300GB 15K SAS drives on a 4-port LSI PCIe SAS controller.
> 
> I use:
> blockdev --setra 65536 /dev/md0
> 
> And this dramatically increased my RAID 10 read performance.
> 
> You MUST do the same for your LVM2 volumes for them to see a comparable 
> performance boost.
> 
> blockdev --setra 65536 /dev/mapper/raid10-testvol
> 
> Otherwise, your LVM will default to 256 read-ahead value, which stinks. 
> I increased my read performance by 3.5x with this one change! See below:
> 
> root@b410:~# dd if=/dev/raid10twelve256k/testvol of=/dev/null bs=1M 
> count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 50.8923 s, 206 MB/s
> 
> root@b410:~# blockdev --setra 65536 /dev/mapper/raid10twelve256k-testvol
> 
> root@b410:~# dd if=/dev/raid10twelve256k/testvol of=/dev/null bs=1M 
> count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 14.4057 s, 728 MB/s
> 
> 
> Enjoy!
> -Thomas
> 
> -----Original Message-----
> From: Michal Soltys <soltys@ziu.info>
> To: Holger Mauermann <mauermann@gmail.com>
> Cc: Keld Jørn Simonsen <keld@dkuug.dk>; linux-raid@vger.kernel.org
> Sent: Wed, 3 Dec 2008 1:43 am
> Subject: Re: LVM on raid10,f2 performance issues
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Holger Mauermann wrote: 
> 
> >Keld Jørn Simonsen schrieb: 
> 
> >>How is it if you use t
> he raid10,f2 without lvm? 
> 
> >>What are the numbers? 
> 
> >
> >After a fresh installation LVM performance is now somewhat better. I 
> 
> >don't know what was wrong before. However, it is still not as fast as 
> 
> >the raid10... 
> 
> >
> >dd on raw devices 
> 
> >----------------- 
> 
> >
> >raid10,f2: 
> 
> >  read : 409 MB/s 
> 
> >  write: 212 MB/s 
> 
> >
> >raid10,f2 + lvm: 
> 
> >  read : 249 MB/s 
> 
> >  write: 158 MB/s 
> 
> >
> >
> >sda:  sdb:  sdc:  sdd: 
> 
> >---------------------- 
> 
> >YYYY  ....  ....  XXXX 
> 
> >....  ....  ....  .... 
> 
> >XXXX  YYYY  ....  .... 
> 
> >....  ....  ....  .... 
>  
> 
>  
> 
> Regarding the layout from your first mail - this is how it's supposed 
> to
> be. LVM's header took 3*64KB (you can control that with --metadatasize,
> and verify with e.g. pvs -o+pe_start), and then the first 4MB extent
> (controlled with --physicalextentsize) of the first logical volume
> started - on sdd and continued on sda. Mirrored data was set "far" from
> that, and shifted one disk to the right - as expected from raid10,f2. 
>  
> 
> As for performance, hmmm. Overally - there're few things to consider
> when doing lvm on top of the raid: 
>  
> 
> - stripe vs. extent alignment 
> 
> - stride vs. stripe vs. extent size 
> 
> - filesystem's awareness 
> that there's also raid a layer below 
> 
> - lvm's readahead (iirc, only uppermost layer matters - functioning as 
> a
> hint for the filesystem) 
>  
> 
> But this is particulary important for raid with parities. Here
> everything is aligned already, and parity doesn't exist. 
>  
> 
> But the last point can be relevant - and you did test with filesystem
> after all. Try setting readahead with blockdev or lvchange (the latter
> will be permananet across lv activations). E.g. 
>  
> 
> #lvchange -r 2048 /dev/mapper... 
>  
> 
> and compare to raw raid10: 
>  
> 
> #blockedv --setra 2048 /dev/md... 
>  
> 
> If you did your tests with ext2/3, also try to create it with -E 
> stride=
> stripe-width= option in both cases. Similary to sunit/swidth if you 
> used
> xfs. 
>  
> 
> You might also create volume group with larger extent - such as 512MB
> (as 4MB granularity is often an overkill). Performance wise it 
> shouldn't
> matter in this case though. 
>  
> 
> -- 
> 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" 
> in 
> 
> the body of a message to majordomo@vger.kernel.org 
> 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 
> 
> 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LVM on raid10,f2 performance issues
  2009-01-19 12:17         ` Keld Jørn Simonsen
@ 2009-01-19 12:24           ` Peter Rabbitson
  2009-01-19 13:59             ` Keld Jørn Simonsen
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Rabbitson @ 2009-01-19 12:24 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: thomas62186218, soltys, mauermann, linux-raid

Keld Jørn Simonsen wrote:
> Hmm, 
> 
> Why is the command
> 
>  blockdev --setra 65536 /dev/md0
> 
> really needed? I think the kernel should set a reasonable default here.

The in-kernel default for a block device is 256 (128k) which is way too
low. the MD subsystems tries to be a bit smarter and assigns the md
device readahead according to the number of devices/raid level. For
streaming (i.e. file sever) these values are also too low. LVs can take
a readahead specification at creation time and use that, but this is
manual.

It is arguable what the typical workload is, but I would lean towards
big long linear reads (fileserver) vs short scattered ones (database).

The real solution to the problem was proposed a long time ago, and it
seems it got lost in the attic: http://lwn.net/Articles/155510/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LVM on raid10,f2 performance issues
  2009-01-19 12:24           ` Peter Rabbitson
@ 2009-01-19 13:59             ` Keld Jørn Simonsen
  0 siblings, 0 replies; 12+ messages in thread
From: Keld Jørn Simonsen @ 2009-01-19 13:59 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: thomas62186218, soltys, mauermann, linux-raid

On Mon, Jan 19, 2009 at 01:24:39PM +0100, Peter Rabbitson wrote:
> Keld Jørn Simonsen wrote:
> > Hmm, 
> > 
> > Why is the command
> > 
> >  blockdev --setra 65536 /dev/md0
> > 
> > really needed? I think the kernel should set a reasonable default here.
> 
> The in-kernel default for a block device is 256 (128k) which is way too
> low. the MD subsystems tries to be a bit smarter and assigns the md
> device readahead according to the number of devices/raid level. For
> streaming (i.e. file sever) these values are also too low. LVs can take
> a readahead specification at creation time and use that, but this is
> manual.

I would like to have something done automatically in the kernel, so that
you do not need to do it manually. People tend to not know that you need
to add the blockdev statement, eg in /etc/rc.local, to get decent
performance. And this is needed even for simpler arrays, such as a 4
drive raid10,f2 , which can be set up on many recent motherboards with 
sata-II support directly off the mobo.

> It is arguable what the typical workload is, but I would lean towards
> big long linear reads (fileserver) vs short scattered ones (database).

My understanding is that the readahead is only done when the kernel
thinks it is doing sequential reads. his prpbalu is not the case whan
doing database operations. So we are kind of safe here, IMHO.
> 
> The real solution to the problem was proposed a long time ago, and it
> seems it got lost in the attic: http://lwn.net/Articles/155510/

Yes, interesting.

The patch may nt be ready for inclusion for some time due to complexity
and lack of testing.

So I am wondering if we could come up with a formula to set the readahead
for raid. It seems like a big readahead would not affect random reading.
It would then only be overkill for sequential reading of smallish files.

So how does the kernel detect that it is doing sequential reading?
Maybe it detects that the new block to read or a specific file
descriptor is the follower to the previous read on the same FD?

And then we normally read a full chunk for the raid, which is at least
something like 64 KiB? This would take care of most database
transactions. 

I would think we then should find the smallest readahead value for a
given array, from chunk size and drive count, that gets the array to
perform as expected.

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LVM on raid10,f2 performance issues
  2009-01-19  7:28         ` Peter Rabbitson
@ 2009-01-26 19:06           ` Bill Davidsen
  0 siblings, 0 replies; 12+ messages in thread
From: Bill Davidsen @ 2009-01-26 19:06 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: thomas62186218, soltys, mauermann, keld, linux-raid

Peter Rabbitson wrote:
> thomas62186218@aol.com wrote:
>   
>> Hi everyone,
>>
>> I too was seeing miserable read-performance with LVM2 volumes on top of
>> md RAID 10's on my Ubuntu 8.04 64-bit machine. My RAID 10 has 12 x 300GB
>> 15K SAS drives on a 4-port LSI PCIe SAS controller.
>>
>> I use:
>> blockdev --setra 65536 /dev/md0
>>
>> And this dramatically increased my RAID 10 read performance.
>>
>> You MUST do the same for your LVM2 volumes for them to see a comparable
>> performance boost.
>>
>> blockdev --setra 65536 /dev/mapper/raid10-testvol
>>
>>     
>
> This is incorrect. Only the readahead setting of the _last_ block device
> matters. So in case you have a raid6 of 10 drives, with LUKS on top,
> with LVM on top - only the readahead settings of the individual LVs
> matter, nothing further down the chain is consulted.

I have some old numbers which indicate that with ext3 the use of 
"stride=" can improve performance, although I was measuring write and 
just got the read numbers without really caring about them.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-01-26 19:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-01  0:00 LVM on raid10,f2 performance issues Holger Mauermann
2008-12-01 16:42 ` Keld Jørn Simonsen
2008-12-02 23:28   ` Holger Mauermann
2008-12-03  7:15     ` Keld Jørn Simonsen
2008-12-03  9:43     ` Michal Soltys
2009-01-19  1:24       ` thomas62186218
2009-01-19  7:28         ` Peter Rabbitson
2009-01-26 19:06           ` Bill Davidsen
2009-01-19  7:30         ` Michal Soltys
2009-01-19 12:17         ` Keld Jørn Simonsen
2009-01-19 12:24           ` Peter Rabbitson
2009-01-19 13:59             ` Keld Jørn Simonsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).