* [linux-lvm] More information on my LV with bad read performance.. @ 2001-10-26 0:02 Robert Macaulay 2001-10-26 2:06 ` Andreas Dilger 0 siblings, 1 reply; 6+ messages in thread From: Robert Macaulay @ 2001-10-26 0:02 UTC (permalink / raw) To: linux-lvm I realized I didn't include a lvdisplay -v of my volume. Here it is. The disks are spread out over 4 scsi busses. Thanks again. --- Logical volume --- LV Name /dev/vgOracle/foo VG Name vgOracle LV Write Access read/write LV Status available LV # 52 # open 0 LV Size 9.04 GB Current LE 2314 Allocated LE 2314 Stripes 26 Stripe size (KByte) 64 Allocation next free Read ahead sectors 120 Block device 58:51 --- Distribution of logical volume on 26 physical volumes --- PV Name PE on PV reads writes /dev/sdh1 89 13629 173625 /dev/sdi1 89 13616 173386 /dev/sdj1 89 13630 173372 /dev/sdl1 89 13619 173354 /dev/sdm1 89 13625 173369 /dev/sdn1 89 13619 173384 /dev/sdo1 89 13635 173391 /dev/sdp1 89 13632 173387 /dev/sdq1 89 13641 173401 /dev/sdr1 89 13633 173386 /dev/sds1 89 13639 173398 /dev/sdt1 89 13633 173388 /dev/sdu1 89 13625 173367 /dev/sdv1 89 13617 173357 /dev/sdw1 89 13625 173367 /dev/sdx1 89 13617 173358 /dev/sdy1 89 13624 173366 /dev/sdz1 89 13618 173354 /dev/sdaa1 89 13606 173366 /dev/sdab1 89 13600 173388 /dev/sdac1 89 13609 173366 /dev/sdad1 89 13603 173356 /dev/sdae1 89 13609 173364 /dev/sdaf1 89 13600 173361 /dev/sdag1 89 13607 173366 /dev/sdah1 89 13602 173354 --- logical volume i/o statistic --- 354113 reads 4507931 writes --cut-- ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linux-lvm] More information on my LV with bad read performance.. 2001-10-26 0:02 [linux-lvm] More information on my LV with bad read performance Robert Macaulay @ 2001-10-26 2:06 ` Andreas Dilger 2001-10-26 3:13 ` Heinz J . Mauelshagen 2001-10-26 8:26 ` Robert Macaulay 0 siblings, 2 replies; 6+ messages in thread From: Andreas Dilger @ 2001-10-26 2:06 UTC (permalink / raw) To: linux-lvm On Oct 26, 2001 00:03 -0500, Robert Macaulay wrote: > I realized I didn't include a lvdisplay -v of my volume. Here it is. > The disks are spread out over 4 scsi busses. > > --- Logical volume --- > LV Name /dev/vgOracle/foo > VG Name vgOracle > LV Write Access read/write > LV Status available > LV # 52 > # open 0 > LV Size 9.04 GB > Current LE 2314 > Allocated LE 2314 > Stripes 26 > Stripe size (KByte) 64 > Allocation next free > Read ahead sectors 120 > Block device 58:51 Well, there was a patch in 2.4.13 to the LVM code to change the readahead code. First off, it makes the default readahead 1024 sectors (512kB) which may be the maximum SCSI request size (don't know the details exactly). It also sets a global read_ahead array, so this may impact it also. See above, you have a "read ahead" that is smaller than a single stripe, so it isn't really doing you much good. However, it is also possible that striping across 26 disks is kind of pointless, especially for Oracle. You are far better off to do some intelligent allocation of the disks depending on known usage patterns (e.g. put tables and their indexes on separate disks, put rollback files on separate disks, put heavily used tables on their own disks, put temporary tablespaces on their own disks). With LVM, you can easily monitor which PVs/PEs are busiest, and even out the I/O load by moving LVs/PEs with pvmove (although you CANNOT do this while the database is active). Make sure you keep backups of your LVM metadata (both vgcfgbackup, and also save the text output of "pvdata -avP" and "lvdisplay -v"). Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linux-lvm] More information on my LV with bad read performance.. 2001-10-26 2:06 ` Andreas Dilger @ 2001-10-26 3:13 ` Heinz J . Mauelshagen 2001-10-26 8:26 ` Robert Macaulay 1 sibling, 0 replies; 6+ messages in thread From: Heinz J . Mauelshagen @ 2001-10-26 3:13 UTC (permalink / raw) To: linux-lvm On Fri, Oct 26, 2001 at 01:06:56AM -0600, Andreas Dilger wrote: > On Oct 26, 2001 00:03 -0500, Robert Macaulay wrote: > > I realized I didn't include a lvdisplay -v of my volume. Here it is. > > The disks are spread out over 4 scsi busses. > > > > --- Logical volume --- > > LV Name /dev/vgOracle/foo > > VG Name vgOracle > > LV Write Access read/write > > LV Status available > > LV # 52 > > # open 0 > > LV Size 9.04 GB > > Current LE 2314 > > Allocated LE 2314 > > Stripes 26 > > Stripe size (KByte) 64 > > Allocation next free > > Read ahead sectors 120 > > Block device 58:51 > > Well, there was a patch in 2.4.13 to the LVM code to change the readahead > code. Andreas, to what patch are your reffering to. Still see the per major read_ahead code in 2.4.13 which is partially usefull in the best case. Heinz > First off, it makes the default readahead 1024 sectors (512kB) > which may be the maximum SCSI request size (don't know the details > exactly). It also sets a global read_ahead array, so this may impact > it also. See above, you have a "read ahead" that is smaller than a > single stripe, so it isn't really doing you much good. > > However, it is also possible that striping across 26 disks is kind of > pointless, especially for Oracle. You are far better off to do some > intelligent allocation of the disks depending on known usage patterns > (e.g. put tables and their indexes on separate disks, put rollback > files on separate disks, put heavily used tables on their own disks, > put temporary tablespaces on their own disks). > > With LVM, you can easily monitor which PVs/PEs are busiest, and even out > the I/O load by moving LVs/PEs with pvmove (although you CANNOT do this > while the database is active). > > Make sure you keep backups of your LVM metadata (both vgcfgbackup, and > also save the text output of "pvdata -avP" and "lvdisplay -v"). > > Cheers, Andreas > -- > Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, > \ would they cancel out, leaving him still hungry?" > http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert > > > _______________________________________________ > linux-lvm mailing list > linux-lvm@sistina.com > http://lists.sistina.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://www.sistina.com/lvm/Pages/howto.html =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Sistina Software Inc. Senior Consultant/Developer Am Sonnenhang 11 56242 Marienrachdorf Germany Mauelshagen@Sistina.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linux-lvm] More information on my LV with bad read performance.. 2001-10-26 2:06 ` Andreas Dilger 2001-10-26 3:13 ` Heinz J . Mauelshagen @ 2001-10-26 8:26 ` Robert Macaulay 2001-10-26 8:38 ` Robert Macaulay 1 sibling, 1 reply; 6+ messages in thread From: Robert Macaulay @ 2001-10-26 8:26 UTC (permalink / raw) To: linux-lvm On Fri, 26 Oct 2001, Andreas Dilger wrote: > However, it is also possible that striping across 26 disks is kind of > pointless, especially for Oracle. You are far better off to do some > intelligent allocation of the disks depending on known usage patterns > (e.g. put tables and their indexes on separate disks, put rollback > files on separate disks, put heavily used tables on their own disks, > put temporary tablespaces on their own disks). True. I have done that. This is a "let's see if it goes really fast with a lot of disks" test. We have the disks divided up in stripe sets no bigger than 8 typically, all separated out by function. I was just playing around with a massive stripe, and ran into this oddity. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linux-lvm] More information on my LV with bad read performance.. 2001-10-26 8:26 ` Robert Macaulay @ 2001-10-26 8:38 ` Robert Macaulay 2001-10-26 12:28 ` Andreas Dilger 0 siblings, 1 reply; 6+ messages in thread From: Robert Macaulay @ 2001-10-26 8:38 UTC (permalink / raw) To: linux-lvm On Fri, 26 Oct 2001, Macaulay, Robert wrote: > > True. I have done that. This is a "let's see if it goes really fast with a > lot of disks" test. We have the disks divided up in stripe sets no bigger > than 8 typically, all separated out by function. I was just playing around > with a massive stripe, and ran into this oddity. I made 2 12-way stripes with mdtools, then another layer of raid0 on top of that, just to see if it would make any difference. The md got about the same write performance, but the read was about the same as the writes were, much higher than LVM. Where is the patch you referred to to increase the read ahead? Part of our Oracle testing(with volumes separated by use) involves sequential table scans, which sound like they could benefit greatly from this patch. Thx Robert ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linux-lvm] More information on my LV with bad read performance.. 2001-10-26 8:38 ` Robert Macaulay @ 2001-10-26 12:28 ` Andreas Dilger 0 siblings, 0 replies; 6+ messages in thread From: Andreas Dilger @ 2001-10-26 12:28 UTC (permalink / raw) To: linux-lvm On Oct 26, 2001 08:39 -0500, Robert Macaulay wrote: > Where is the patch you referred to to increase the read ahead? Part of our > Oracle testing(with volumes separated by use) involves sequential table > scans, which sound like they could benefit greatly from this patch. Thx I'm not sure of the exact time when the changes went in, but when I updated to 2.4.13 and wanted to update the LVM code also, I see (note patch whitespace may be broken because of cut-n-paste): There was also a small discussion about read ahead on the kernel mailing list, so this may be a result of that. Something along the lines of "all readahead is broken because ..." Cheers, Andreas ============================================================================= --- kernel/lvm.c 2001/10/15 09:23:27 +++ kernel/lvm.c 2001/10/26 17:21:47 @@ -270,9 +270,13 @@ #include "lvm-internal.h" -#define LVM_CORRECT_READ_AHEAD( a) \ - if ( a < LVM_MIN_READ_AHEAD || \ - a > LVM_MAX_READ_AHEAD) a = LVM_MAX_READ_AHEAD; +#define LVM_CORRECT_READ_AHEAD(a) \ +do { \ + if ((a) < LVM_MIN_READ_AHEAD || \ + (a) > LVM_MAX_READ_AHEAD) \ + (a) = LVM_DEFAULT_READ_AHEAD; \ + read_ahead[MAJOR_NR] = (a); \ +} while(0) #ifndef WRITEA # define WRITEA WRITE @@ -1040,6 +1045,7 @@ (long) arg > LVM_MAX_READ_AHEAD) return -EINVAL; lv_ptr->lv_read_ahead = (long) arg; + read_ahead[MAJOR_NR] = lv_ptr->lv_read_ahead; break; --- kernel/lvm.h 2001/10/03 14:46:47 1.34 +++ kernel/lvm.h 2001/10/26 17:24:16 @@ -274,8 +274,9 @@ #define LVM_MAX_STRIPES 128 /* max # of stripes */ #define LVM_MAX_SIZE ( 1024LU * 1024 / SECTOR_SIZE * 1024 * 1024) /* 1TB[sectors] */ #define LVM_MAX_MIRRORS 2 /* future use */ -#define LVM_MIN_READ_AHEAD 2 /* minimum read ahead sectors */ -#define LVM_MAX_READ_AHEAD 120 /* maximum read ahead sectors */ +#define LVM_MIN_READ_AHEAD 0 /* minimum read ahead sectors */ +#define LVM_DEFAULT_READ_AHEAD 1024 /* sectors for 512k scsi segments */ +#define LVM_MAX_READ_AHEAD 10000 /* maximum read ahead sectors */ #define LVM_MAX_LV_IO_TIMEOUT 60 /* seconds I/O timeout (future use) */ #define LVM_PARTITION 0xfe /* LVM partition id */ #define LVM_NEW_PARTITION 0x8e /* new LVM partition id (10/09/1999) */ Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2001-10-26 12:28 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-10-26 0:02 [linux-lvm] More information on my LV with bad read performance Robert Macaulay 2001-10-26 2:06 ` Andreas Dilger 2001-10-26 3:13 ` Heinz J . Mauelshagen 2001-10-26 8:26 ` Robert Macaulay 2001-10-26 8:38 ` Robert Macaulay 2001-10-26 12:28 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).