* stride / stripe alignment on LVM ? @ 2007-11-01 9:10 Janek Kozicki 2007-11-01 23:27 ` Neil Brown 2007-11-02 12:10 ` Michal Soltys 0 siblings, 2 replies; 11+ messages in thread From: Janek Kozicki @ 2007-11-01 9:10 UTC (permalink / raw) To: linux-raid Hello, I have raid5 /dev/md1, --chunk=128 --metadata=1.1. On it I have created LVM volume called 'raid5', and finally a logical volume 'backup'. Then I formatted it with command: mkfs.ext3 -b 4096 -E stride=32 -E resize=550292480 /dev/raid5/backup And because LVM is putting its own metadata on /dev/md1, the ext3 partition is shifted by some (unknown for me) amount of bytes from the beginning of /dev/md1. I was wondering, how big is the shift, and would it hurt the performance/safety if the `ext3 stride=32` didn't align perfectly with the physical stripes on HDD? PS: the resize option is to make sure that I can grow this fs in the future. PSS: I looked in the archive but didn't find this question asked before. I'm sorry if it really was asked. -- Janek Kozicki | ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: stride / stripe alignment on LVM ? 2007-11-01 9:10 stride / stripe alignment on LVM ? Janek Kozicki @ 2007-11-01 23:27 ` Neil Brown 2007-11-02 13:01 ` Bill Davidsen 2007-11-07 9:04 ` Goswin von Brederlow 2007-11-02 12:10 ` Michal Soltys 1 sibling, 2 replies; 11+ messages in thread From: Neil Brown @ 2007-11-01 23:27 UTC (permalink / raw) To: Janek Kozicki; +Cc: linux-raid On Thursday November 1, janek_listy@wp.pl wrote: > Hello, > > I have raid5 /dev/md1, --chunk=128 --metadata=1.1. On it I have > created LVM volume called 'raid5', and finally a logical volume > 'backup'. > > Then I formatted it with command: > > mkfs.ext3 -b 4096 -E stride=32 -E resize=550292480 /dev/raid5/backup > > And because LVM is putting its own metadata on /dev/md1, the ext3 > partition is shifted by some (unknown for me) amount of bytes from > the beginning of /dev/md1. > > I was wondering, how big is the shift, and would it hurt the > performance/safety if the `ext3 stride=32` didn't align perfectly > with the physical stripes on HDD? It is probably better to ask this question on an ext3 list as people there might know exactly what 'stride' does. I *think* it causes the inode tables to be offset in different block-groups so that they are not all on the same drive. If that is the case, then an offset causes by LVM isn't going to make any difference at all. NeilBrown > > PS: the resize option is to make sure that I can grow this fs > in the future. > > PSS: I looked in the archive but didn't find this question asked > before. I'm sorry if it really was asked. Thanks for trying! ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: stride / stripe alignment on LVM ? 2007-11-01 23:27 ` Neil Brown @ 2007-11-02 13:01 ` Bill Davidsen 2007-11-02 22:16 ` Janek Kozicki 2007-11-07 9:04 ` Goswin von Brederlow 1 sibling, 1 reply; 11+ messages in thread From: Bill Davidsen @ 2007-11-02 13:01 UTC (permalink / raw) To: Neil Brown; +Cc: Janek Kozicki, linux-raid Neil Brown wrote: > On Thursday November 1, janek_listy@wp.pl wrote: > >> Hello, >> >> I have raid5 /dev/md1, --chunk=128 --metadata=1.1. On it I have >> created LVM volume called 'raid5', and finally a logical volume >> 'backup'. >> >> Then I formatted it with command: >> >> mkfs.ext3 -b 4096 -E stride=32 -E resize=550292480 /dev/raid5/backup >> >> And because LVM is putting its own metadata on /dev/md1, the ext3 >> partition is shifted by some (unknown for me) amount of bytes from >> the beginning of /dev/md1. >> >> I was wondering, how big is the shift, and would it hurt the >> performance/safety if the `ext3 stride=32` didn't align perfectly >> with the physical stripes on HDD? >> > > It is probably better to ask this question on an ext3 list as people > there might know exactly what 'stride' does. > > I *think* it causes the inode tables to be offset in different > block-groups so that they are not all on the same drive. If that is > the case, then an offset causes by LVM isn't going to make any > difference at all. > Actually, I think that all of the performance evil Doug was mentioning will apply to LVM as well. So if things are poorly aligned, they will be poorly handled, a stripe-sized write will not go in a stripe, but will overlap chunks and cause all the data from all chunks to be read back for a new raid-5 calculation. So I would expect this to make a very large performance difference, so even if it work it would do so slowly. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: stride / stripe alignment on LVM ? 2007-11-02 13:01 ` Bill Davidsen @ 2007-11-02 22:16 ` Janek Kozicki 2007-11-03 18:40 ` Doug Ledford 0 siblings, 1 reply; 11+ messages in thread From: Janek Kozicki @ 2007-11-02 22:16 UTC (permalink / raw) Cc: linux-raid Bill Davidsen said: (by the date of Fri, 02 Nov 2007 09:01:05 -0400) > So I would expect this to make a very large performance difference, so > even if it work it would do so slowly. I was trying to find out the stripe layout for few hours, using hexedit and dd. And I'm baffled: md1 : active raid5 hda3[0] sda3[1] 969907968 blocks super 1.1 level 5, 128k chunk, algorithm 2 [3/2] [UU_] bitmap: 8/8 pages [32KB], 32768KB chunk I fill md1 with random data: # dd bs=128k count=64 if=/dev/urandom of=/dev/md1 # hexedit /dev/md1 I copy/paste (and remove formmatting) the first 32 bytes of /dev/md1, now I search for those 32 bytes in /dev/hda3 and in /dev/sda3: # hexedit /dev/hda3 # hexedit /dev/sda3 And no luck! I'd expect the first bytes of /dev/md1 to be on beginning of the first drive (hda3). I pick next 20 bytes from /dev/md1 and I can find them on /dev/hda3 starting just after address 0x10000. The bytes before and after those 20 bytes are similar to those on /dev/md1. So now I hexedit /dev/md1 and write by hand 32 bytes of 0xAA. Then I look at address 0x10000 on /dev/hda3 - and there is no 0xAA at all. Well.. it's not critical for me, so you can just ignore my mumbling, I was just wondering what obvious did I miss. There seems to be more XORing (or sth. else) involved than I expected. Maybe the disc did not flush writes, and what I see on /dev/md1 is not yet present on /dev/hda3 (how's that possible?) Nevertheless, I think that I will resign from LVM, and just put ext3 on /dev/md1, to avoid this stripe misalignment. I wanted LVM here only because I might wanted to use lvm-snapshot, but I can live without that. I can already grow /dev/md1 without LVM, but using mdadm grow. best regards -- Janek Kozicki | ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: stride / stripe alignment on LVM ? 2007-11-02 22:16 ` Janek Kozicki @ 2007-11-03 18:40 ` Doug Ledford 2007-11-03 20:21 ` Janek Kozicki 0 siblings, 1 reply; 11+ messages in thread From: Doug Ledford @ 2007-11-03 18:40 UTC (permalink / raw) To: Janek Kozicki; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2128 bytes --] On Fri, 2007-11-02 at 23:16 +0100, Janek Kozicki wrote: > Bill Davidsen said: (by the date of Fri, 02 Nov 2007 09:01:05 -0400) > > > So I would expect this to make a very large performance difference, so > > even if it work it would do so slowly. > > I was trying to find out the stripe layout for few hours, using > hexedit and dd. And I'm baffled: > > md1 : active raid5 hda3[0] sda3[1] > 969907968 blocks super 1.1 level 5, 128k chunk, algorithm 2 [3/2] [UU_] ^^^ You have the raid superblock in the front of hda3 and sda3, as well as a bitmap I assume, which means that any data you write to md0 will actually be written to sda3/hda3 *after* the superblock and bitmap. If you run mdadm -D /dev/md1 it will tell you the data offset (in sectors IIRC). When you hexedit hda3, you need to jump forward the same number of sectors to get at the beginning of the actual md1 data. Of course, being raid5 with one disk missing, the data may or may not be present in its non-parity format depending on exactly which block you are looking at. However, you don't really need to do anything to figure out the stripe size on your array, you have it already. All the talk about figuring out stripe layouts is for external raid devices that hide the raid layout from you. When you are talking about your own raid device that you created with mdadm, you specified the stripe layout when you created the array. In your case, the chunk size is 128K, and since you have a 3 disk raid5 array and one chunk in each stripe of a raid5 array is parity, the amount of data stored per stripe is chunk size * (number of disks - 1), so 256K in your case. But you don't even have to align the lvm to the stripe, just to a chunk, so you really only need to align the lvm superblock so that data starts at 128K offset into the raid array. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: stride / stripe alignment on LVM ? 2007-11-03 18:40 ` Doug Ledford @ 2007-11-03 20:21 ` Janek Kozicki 2007-11-04 1:02 ` Doug Ledford 2007-11-07 9:00 ` Goswin von Brederlow 0 siblings, 2 replies; 11+ messages in thread From: Janek Kozicki @ 2007-11-03 20:21 UTC (permalink / raw) To: linux-raid Doug Ledford said: (by the date of Sat, 03 Nov 2007 14:40:48 -0400) > so you really only need to align the > lvm superblock so that data starts at 128K offset into the raid array. Sorry, I thought that it will be easier to figure this out experimentally - put LVM here or there, write 128k of data to the disc (inside LVM partition), then see (with hexedit) if this data is really split across several discs or not. In fact I even managed to find where LVM superblock starts inside RAID, the problem for me was that I wasn't sure where it ends, and where the actual data, starts, and *THAT* data has to be aligned on 128K offset. Now I know that I should simply look more carefully at LVM manuals, to see exactly what is the size of LVM superblock. So I was unable to do that simple 128k test like that: # dd if=./128k_of_0xAA of=/dev/lvm_raid5/test then looking for 128k(or 64k or 32k) of 0xAA on hda3 and sda3. But most of the time was spent searching the search pattern (scanning the disc). So my efficiency was low, and in fact I should have simply used a smaller test partitions (eg. hda4, sda4 with just 20MB), so scanning would be faster. With smaller test partitions perhaps I'd have enough time to overcome the main difficulty - dealing with degraded array (and encoded data). Possibly I'll try this next time when I'll buy fourth disc to the array (next year), so I'll be able to have two degraded arrays of two discs at the same time. Then I could use LVM again and "dd" all data from old array to new one, then grow the new array to use all 4 HDDs. Currently I just formatted /dev/md1 with ext3, without LVM. Thanks, I got to remember that in 1.1 the superblock is on the front. And I shouldn't forget about the bitmap either :) > If you run mdadm -D /dev/md1 it will tell you the data offset > (in sectors IIRC). Uh, I don't see it: backup:~# mdadm -D /dev/md1 /dev/md1: Version : 01.01.03 Creation Time : Fri Nov 2 23:35:37 2007 Raid Level : raid5 Array Size : 966807296 (922.02 GiB 990.01 GB) Device Size : 966807296 (461.01 GiB 495.01 GB) Raid Devices : 3 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sat Nov 3 20:59:06 2007 State : active, degraded Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 128K Name : backup:1 (local to host backup) UUID : 22f22c35:99613d52:31d407a6:55bdeb84 Events : 39975 Number Major Minor RaidDevice State 0 3 3 0 active sync /dev/hda3 1 8 3 1 active sync /dev/sda3 2 0 0 2 removed thanks again for all your helpful responses! -- Janek Kozicki | ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: stride / stripe alignment on LVM ? 2007-11-03 20:21 ` Janek Kozicki @ 2007-11-04 1:02 ` Doug Ledford 2007-11-07 9:00 ` Goswin von Brederlow 1 sibling, 0 replies; 11+ messages in thread From: Doug Ledford @ 2007-11-04 1:02 UTC (permalink / raw) To: Janek Kozicki; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1222 bytes --] On Sat, 2007-11-03 at 21:21 +0100, Janek Kozicki wrote: > > If you run mdadm -D /dev/md1 it will tell you the data offset > > (in sectors IIRC). > > Uh, I don't see it: Sorry, it's part of mdadm -E instead: [root@firewall ~]# mdadm -E /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 1.1 Feature Map : 0x1 Array UUID : c746e4f5:b015ffac:7216dbbd:48d973a7 Name : firewall:home:2 Creation Time : Mon May 28 20:47:07 2007 Raid Level : raid1 Raid Devices : 2 Used Dev Size : 625137018 (298.09 GiB 320.07 GB) Array Size : 625137018 (298.09 GiB 320.07 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : 7efd05d5:dd921536:1d1a1750:6ba49303 Internal Bitmap : 8 sectors from superblock Update Time : Sat Nov 3 21:01:24 2007 Checksum : 27b3958f - correct Events : 2 Array Slot : 0 (0, 1) Array State : Uu [root@firewall ~]# -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: stride / stripe alignment on LVM ? 2007-11-03 20:21 ` Janek Kozicki 2007-11-04 1:02 ` Doug Ledford @ 2007-11-07 9:00 ` Goswin von Brederlow 2007-11-11 23:53 ` Alasdair G Kergon 1 sibling, 1 reply; 11+ messages in thread From: Goswin von Brederlow @ 2007-11-07 9:00 UTC (permalink / raw) To: Janek Kozicki; +Cc: linux-raid Janek Kozicki <janek_listy@wp.pl> writes: > Doug Ledford said: (by the date of Sat, 03 Nov 2007 14:40:48 -0400) > >> so you really only need to align the >> lvm superblock so that data starts at 128K offset into the raid array. > > Sorry, I thought that it will be easier to figure this out > experimentally - put LVM here or there, write 128k of data to the > disc (inside LVM partition), then see (with hexedit) if this data is > really split across several discs or not. > > In fact I even managed to find where LVM superblock starts inside > RAID, the problem for me was that I wasn't sure where it ends, and > where the actual data, starts, and *THAT* data has to be aligned on > 128K offset. Now I know that I should simply look more carefully at > LVM manuals, to see exactly what is the size of LVM superblock. I would just check with "dmsetup table dev" to which byte offsets the logical volume gets mapped to. MfG Goswin ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: stride / stripe alignment on LVM ? 2007-11-07 9:00 ` Goswin von Brederlow @ 2007-11-11 23:53 ` Alasdair G Kergon 0 siblings, 0 replies; 11+ messages in thread From: Alasdair G Kergon @ 2007-11-11 23:53 UTC (permalink / raw) To: Goswin von Brederlow, Janek Kozicki, linux-raid On Wed, Nov 07, 2007 at 10:00:39AM +0100, Goswin von Brederlow wrote: > I would just check with "dmsetup table dev" to which byte offsets the > logical volume gets mapped to. Or use: pvs -o+pe_start (optionally with --units) Alasdair -- agk@redhat.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: stride / stripe alignment on LVM ? 2007-11-01 23:27 ` Neil Brown 2007-11-02 13:01 ` Bill Davidsen @ 2007-11-07 9:04 ` Goswin von Brederlow 1 sibling, 0 replies; 11+ messages in thread From: Goswin von Brederlow @ 2007-11-07 9:04 UTC (permalink / raw) To: Neil Brown; +Cc: Janek Kozicki, linux-raid Neil Brown <neilb@suse.de> writes: > On Thursday November 1, janek_listy@wp.pl wrote: >> Hello, >> >> I have raid5 /dev/md1, --chunk=128 --metadata=1.1. On it I have >> created LVM volume called 'raid5', and finally a logical volume >> 'backup'. >> >> Then I formatted it with command: >> >> mkfs.ext3 -b 4096 -E stride=32 -E resize=550292480 /dev/raid5/backup >> >> And because LVM is putting its own metadata on /dev/md1, the ext3 >> partition is shifted by some (unknown for me) amount of bytes from >> the beginning of /dev/md1. >> >> I was wondering, how big is the shift, and would it hurt the >> performance/safety if the `ext3 stride=32` didn't align perfectly >> with the physical stripes on HDD? > > It is probably better to ask this question on an ext3 list as people > there might know exactly what 'stride' does. > > I *think* it causes the inode tables to be offset in different > block-groups so that they are not all on the same drive. If that is > the case, then an offset causes by LVM isn't going to make any > difference at all. > > NeilBrown Afaik that is true and I never found any significant speed difference in ext3 no matter what stripe size I select. The natural speed fluctuations of e.g. bonnie seem to be bigger than the difference the stripe size option makes. But then again I test for large files so inode operations are not that common. You probably have to test creating lots of dirs and small files and file deletion to see any effect. MfG Goswin ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: stride / stripe alignment on LVM ? 2007-11-01 9:10 stride / stripe alignment on LVM ? Janek Kozicki 2007-11-01 23:27 ` Neil Brown @ 2007-11-02 12:10 ` Michal Soltys 1 sibling, 0 replies; 11+ messages in thread From: Michal Soltys @ 2007-11-02 12:10 UTC (permalink / raw) To: linux-raid; +Cc: Janek Kozicki Janek Kozicki wrote: > > And because LVM is putting its own metadata on /dev/md1, the ext3 > partition is shifted by some (unknown for me) amount of bytes from > the beginning of /dev/md1. > It seems to be multiply of 64KiB. You can specify it during pvcreate, with --metadatasize option. It will be rounded to multiply of 64 KiB, and will add another 64 KiB on its own. Extents will follow directly after that. 4 sectors mentioned in pcvreate's man page are covered by that option as well. So i.e. if you have chunk 1MiB, then pvcreate ... --metadatasize 960K ... should give you chunk-aligned logical volumes, assuming you have actual extent size set appropriately as well. If you use default chunk size, you shouldn't need any extra options. Make sure if it really is this way, after pv/vg/first lv creation. I found it experimentally, so ymmv. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-11-11 23:53 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-01 9:10 stride / stripe alignment on LVM ? Janek Kozicki 2007-11-01 23:27 ` Neil Brown 2007-11-02 13:01 ` Bill Davidsen 2007-11-02 22:16 ` Janek Kozicki 2007-11-03 18:40 ` Doug Ledford 2007-11-03 20:21 ` Janek Kozicki 2007-11-04 1:02 ` Doug Ledford 2007-11-07 9:00 ` Goswin von Brederlow 2007-11-11 23:53 ` Alasdair G Kergon 2007-11-07 9:04 ` Goswin von Brederlow 2007-11-02 12:10 ` Michal Soltys
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).