* [linux-lvm] More information on my LV with bad read performance..
@ 2001-10-26 0:02 Robert Macaulay
2001-10-26 2:06 ` Andreas Dilger
0 siblings, 1 reply; 6+ messages in thread
From: Robert Macaulay @ 2001-10-26 0:02 UTC (permalink / raw)
To: linux-lvm
I realized I didn't include a lvdisplay -v of my volume. Here it is.
The disks are spread out over 4 scsi busses.
Thanks again.
--- Logical volume ---
LV Name /dev/vgOracle/foo
VG Name vgOracle
LV Write Access read/write
LV Status available
LV # 52
# open 0
LV Size 9.04 GB
Current LE 2314
Allocated LE 2314
Stripes 26
Stripe size (KByte) 64
Allocation next free
Read ahead sectors 120
Block device 58:51
--- Distribution of logical volume on 26 physical volumes ---
PV Name PE on PV reads writes
/dev/sdh1 89 13629 173625
/dev/sdi1 89 13616 173386
/dev/sdj1 89 13630 173372
/dev/sdl1 89 13619 173354
/dev/sdm1 89 13625 173369
/dev/sdn1 89 13619 173384
/dev/sdo1 89 13635 173391
/dev/sdp1 89 13632 173387
/dev/sdq1 89 13641 173401
/dev/sdr1 89 13633 173386
/dev/sds1 89 13639 173398
/dev/sdt1 89 13633 173388
/dev/sdu1 89 13625 173367
/dev/sdv1 89 13617 173357
/dev/sdw1 89 13625 173367
/dev/sdx1 89 13617 173358
/dev/sdy1 89 13624 173366
/dev/sdz1 89 13618 173354
/dev/sdaa1 89 13606 173366
/dev/sdab1 89 13600 173388
/dev/sdac1 89 13609 173366
/dev/sdad1 89 13603 173356
/dev/sdae1 89 13609 173364
/dev/sdaf1 89 13600 173361
/dev/sdag1 89 13607 173366
/dev/sdah1 89 13602 173354
--- logical volume i/o statistic ---
354113 reads 4507931 writes
--cut--
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linux-lvm] More information on my LV with bad read performance..
2001-10-26 0:02 [linux-lvm] More information on my LV with bad read performance Robert Macaulay
@ 2001-10-26 2:06 ` Andreas Dilger
2001-10-26 3:13 ` Heinz J . Mauelshagen
2001-10-26 8:26 ` Robert Macaulay
0 siblings, 2 replies; 6+ messages in thread
From: Andreas Dilger @ 2001-10-26 2:06 UTC (permalink / raw)
To: linux-lvm
On Oct 26, 2001 00:03 -0500, Robert Macaulay wrote:
> I realized I didn't include a lvdisplay -v of my volume. Here it is.
> The disks are spread out over 4 scsi busses.
>
> --- Logical volume ---
> LV Name /dev/vgOracle/foo
> VG Name vgOracle
> LV Write Access read/write
> LV Status available
> LV # 52
> # open 0
> LV Size 9.04 GB
> Current LE 2314
> Allocated LE 2314
> Stripes 26
> Stripe size (KByte) 64
> Allocation next free
> Read ahead sectors 120
> Block device 58:51
Well, there was a patch in 2.4.13 to the LVM code to change the readahead
code. First off, it makes the default readahead 1024 sectors (512kB)
which may be the maximum SCSI request size (don't know the details
exactly). It also sets a global read_ahead array, so this may impact
it also. See above, you have a "read ahead" that is smaller than a
single stripe, so it isn't really doing you much good.
However, it is also possible that striping across 26 disks is kind of
pointless, especially for Oracle. You are far better off to do some
intelligent allocation of the disks depending on known usage patterns
(e.g. put tables and their indexes on separate disks, put rollback
files on separate disks, put heavily used tables on their own disks,
put temporary tablespaces on their own disks).
With LVM, you can easily monitor which PVs/PEs are busiest, and even out
the I/O load by moving LVs/PEs with pvmove (although you CANNOT do this
while the database is active).
Make sure you keep backups of your LVM metadata (both vgcfgbackup, and
also save the text output of "pvdata -avP" and "lvdisplay -v").
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linux-lvm] More information on my LV with bad read performance..
2001-10-26 2:06 ` Andreas Dilger
@ 2001-10-26 3:13 ` Heinz J . Mauelshagen
2001-10-26 8:26 ` Robert Macaulay
1 sibling, 0 replies; 6+ messages in thread
From: Heinz J . Mauelshagen @ 2001-10-26 3:13 UTC (permalink / raw)
To: linux-lvm
On Fri, Oct 26, 2001 at 01:06:56AM -0600, Andreas Dilger wrote:
> On Oct 26, 2001 00:03 -0500, Robert Macaulay wrote:
> > I realized I didn't include a lvdisplay -v of my volume. Here it is.
> > The disks are spread out over 4 scsi busses.
> >
> > --- Logical volume ---
> > LV Name /dev/vgOracle/foo
> > VG Name vgOracle
> > LV Write Access read/write
> > LV Status available
> > LV # 52
> > # open 0
> > LV Size 9.04 GB
> > Current LE 2314
> > Allocated LE 2314
> > Stripes 26
> > Stripe size (KByte) 64
> > Allocation next free
> > Read ahead sectors 120
> > Block device 58:51
>
> Well, there was a patch in 2.4.13 to the LVM code to change the readahead
> code.
Andreas,
to what patch are your reffering to.
Still see the per major read_ahead code in 2.4.13 which is partially
usefull in the best case.
Heinz
> First off, it makes the default readahead 1024 sectors (512kB)
> which may be the maximum SCSI request size (don't know the details
> exactly). It also sets a global read_ahead array, so this may impact
> it also. See above, you have a "read ahead" that is smaller than a
> single stripe, so it isn't really doing you much good.
>
> However, it is also possible that striping across 26 disks is kind of
> pointless, especially for Oracle. You are far better off to do some
> intelligent allocation of the disks depending on known usage patterns
> (e.g. put tables and their indexes on separate disks, put rollback
> files on separate disks, put heavily used tables on their own disks,
> put temporary tablespaces on their own disks).
>
> With LVM, you can easily monitor which PVs/PEs are busiest, and even out
> the I/O load by moving LVs/PEs with pvmove (although you CANNOT do this
> while the database is active).
>
> Make sure you keep backups of your LVM metadata (both vgcfgbackup, and
> also save the text output of "pvdata -avP" and "lvdisplay -v").
>
> Cheers, Andreas
> --
> Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
> \ would they cancel out, leaving him still hungry?"
> http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
>
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@sistina.com
> http://lists.sistina.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://www.sistina.com/lvm/Pages/howto.html
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Heinz Mauelshagen Sistina Software Inc.
Senior Consultant/Developer Am Sonnenhang 11
56242 Marienrachdorf
Germany
Mauelshagen@Sistina.com +49 2626 141200
FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linux-lvm] More information on my LV with bad read performance..
2001-10-26 2:06 ` Andreas Dilger
2001-10-26 3:13 ` Heinz J . Mauelshagen
@ 2001-10-26 8:26 ` Robert Macaulay
2001-10-26 8:38 ` Robert Macaulay
1 sibling, 1 reply; 6+ messages in thread
From: Robert Macaulay @ 2001-10-26 8:26 UTC (permalink / raw)
To: linux-lvm
On Fri, 26 Oct 2001, Andreas Dilger wrote:
> However, it is also possible that striping across 26 disks is kind of
> pointless, especially for Oracle. You are far better off to do some
> intelligent allocation of the disks depending on known usage patterns
> (e.g. put tables and their indexes on separate disks, put rollback
> files on separate disks, put heavily used tables on their own disks,
> put temporary tablespaces on their own disks).
True. I have done that. This is a "let's see if it goes really fast with a
lot of disks" test. We have the disks divided up in stripe sets no bigger
than 8 typically, all separated out by function. I was just playing around
with a massive stripe, and ran into this oddity.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linux-lvm] More information on my LV with bad read performance..
2001-10-26 8:26 ` Robert Macaulay
@ 2001-10-26 8:38 ` Robert Macaulay
2001-10-26 12:28 ` Andreas Dilger
0 siblings, 1 reply; 6+ messages in thread
From: Robert Macaulay @ 2001-10-26 8:38 UTC (permalink / raw)
To: linux-lvm
On Fri, 26 Oct 2001, Macaulay, Robert wrote:
>
> True. I have done that. This is a "let's see if it goes really fast with a
> lot of disks" test. We have the disks divided up in stripe sets no bigger
> than 8 typically, all separated out by function. I was just playing around
> with a massive stripe, and ran into this oddity.
I made 2 12-way stripes with mdtools, then another layer of raid0 on top
of that, just to see if it would make any difference. The md got about the
same write performance, but the read was about the same as the writes
were, much higher than LVM.
Where is the patch you referred to to increase the read ahead? Part of our
Oracle testing(with volumes separated by use) involves sequential table
scans, which sound like they could benefit greatly from this patch. Thx
Robert
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linux-lvm] More information on my LV with bad read performance..
2001-10-26 8:38 ` Robert Macaulay
@ 2001-10-26 12:28 ` Andreas Dilger
0 siblings, 0 replies; 6+ messages in thread
From: Andreas Dilger @ 2001-10-26 12:28 UTC (permalink / raw)
To: linux-lvm
On Oct 26, 2001 08:39 -0500, Robert Macaulay wrote:
> Where is the patch you referred to to increase the read ahead? Part of our
> Oracle testing(with volumes separated by use) involves sequential table
> scans, which sound like they could benefit greatly from this patch. Thx
I'm not sure of the exact time when the changes went in, but when I
updated to 2.4.13 and wanted to update the LVM code also, I see (note
patch whitespace may be broken because of cut-n-paste):
There was also a small discussion about read ahead on the kernel mailing
list, so this may be a result of that. Something along the lines of "all
readahead is broken because ..."
Cheers, Andreas
=============================================================================
--- kernel/lvm.c 2001/10/15 09:23:27
+++ kernel/lvm.c 2001/10/26 17:21:47
@@ -270,9 +270,13 @@
#include "lvm-internal.h"
-#define LVM_CORRECT_READ_AHEAD( a) \
- if ( a < LVM_MIN_READ_AHEAD || \
- a > LVM_MAX_READ_AHEAD) a = LVM_MAX_READ_AHEAD;
+#define LVM_CORRECT_READ_AHEAD(a) \
+do { \
+ if ((a) < LVM_MIN_READ_AHEAD || \
+ (a) > LVM_MAX_READ_AHEAD) \
+ (a) = LVM_DEFAULT_READ_AHEAD; \
+ read_ahead[MAJOR_NR] = (a); \
+} while(0)
#ifndef WRITEA
# define WRITEA WRITE
@@ -1040,6 +1045,7 @@
(long) arg > LVM_MAX_READ_AHEAD)
return -EINVAL;
lv_ptr->lv_read_ahead = (long) arg;
+ read_ahead[MAJOR_NR] = lv_ptr->lv_read_ahead;
break;
--- kernel/lvm.h 2001/10/03 14:46:47 1.34
+++ kernel/lvm.h 2001/10/26 17:24:16
@@ -274,8 +274,9 @@
#define LVM_MAX_STRIPES 128 /* max # of stripes */
#define LVM_MAX_SIZE ( 1024LU * 1024 / SECTOR_SIZE * 1024 *
1024) /* 1TB[sectors] */
#define LVM_MAX_MIRRORS 2 /* future use */
-#define LVM_MIN_READ_AHEAD 2 /* minimum read ahead sectors */
-#define LVM_MAX_READ_AHEAD 120 /* maximum read ahead sectors */
+#define LVM_MIN_READ_AHEAD 0 /* minimum read ahead sectors */
+#define LVM_DEFAULT_READ_AHEAD 1024 /* sectors for 512k scsi segments */
+#define LVM_MAX_READ_AHEAD 10000 /* maximum read ahead sectors */
#define LVM_MAX_LV_IO_TIMEOUT 60 /* seconds I/O timeout (future use) */
#define LVM_PARTITION 0xfe /* LVM partition id */
#define LVM_NEW_PARTITION 0x8e /* new LVM partition id (10/09/1999) */
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2001-10-26 12:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-26 0:02 [linux-lvm] More information on my LV with bad read performance Robert Macaulay
2001-10-26 2:06 ` Andreas Dilger
2001-10-26 3:13 ` Heinz J . Mauelshagen
2001-10-26 8:26 ` Robert Macaulay
2001-10-26 8:38 ` Robert Macaulay
2001-10-26 12:28 ` Andreas Dilger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).