* Alignment of RAID on specific boundary
@ 2010-01-04 19:30 Khelben Blackstaff
2010-01-04 21:14 ` David Rees
2010-01-05 6:57 ` Michael Evans
0 siblings, 2 replies; 11+ messages in thread
From: Khelben Blackstaff @ 2010-01-04 19:30 UTC (permalink / raw)
To: linux-raid
Hello and happy new year.
I browsed the archive to find similar threads and learned valuable info
but every thread i found was about aligning something on top of raid
(mostly lvm) not aligning the raid itself. I apologize if it is already answered
and i missed it.
A friend of mine has two SSDs and wants a setup of RAID1->LUKS->LVM.
I have done this setup before but never bothered to align them.
Instead of RAID1 i thought to setup RAID10,f2. I know it cannot be grown
but until the price of SSDs are lowered enough for him to buy larger ones
the grow support will probably be implemented.
I have tried the procedure in Virtualbox first to make sure i don't
make any mistakes.
Here are the details:
Disk /dev/sda: 83886080 sectors, 40.0 GiB
No Start End Size Name
1 40 255 108.0 KiB BIOS boot partition
2 256 262399 128.0 MiB Linux RAID
3 262400 83886046 39.9 GiB Linux RAID
I thought to align partitions on a 128K boundary to match the erase block,
so i aligned them to 256 sectors. The disks use a GPT label.
GPT doesn't provide a GUID for 0xDA Non-FS Data so i used Linux RAID.
I don't think there is a danger of a rescue cd to mess with the
partition because
few if any know about GPT. Otherwise, i can also use Linux Reserved.
I will install LILO but i created the BIOS boot partition in case he
later wants to use
grub2. The 128MB array will be RAID1 /boot and uses 0.90 metadata because
both lilo and grub2 can't boot from v1 metadata (unless the grub2 wiki is old).
My main concern is to align the large array.
% mdadm -V
mdadm - v3.1.1 - 19th November 2009
% mdadm -C /dev/md1 -l 10 -e 1.1 -p f2 -n 2 --name=vbmd /dev/sd[ab]3
% mdadm -E /dev/sda3
Version : 1.1
Avail Dev Size : 83623511
Array Size : 83621888
Used Dev Size : 83621888
Data Offset : 136 sectors
Super Offset : 0 sectors
Layout : far=2
Chunk Size : 512K
So i have 2 disks with 512K chunk (which is divisable by 128K so everything
is fine) so a full stripe is 1MB.
% cryptsetup -c aes-xts-plain -s 512 --align-payload=2048 luksFormat /dev/md1
% cryptsetup luksDump /dev/md1
Payload offset: 4096
% cryptsetup luksOpen /dev/md1 vbcrypt
% pvcreate /dev/mapper/vbcrypt
% pvs -o +pe_start
1st PE 1.00m
As you see LUKS payload starts at 2MB and LVM payload starts at 1MB offset.
I will use 16MB extent size in LVM which is divisible with the 1MB stripe.
So if i am not mistaken, everything is correctly aligned on top of RAID.
The only problem i have is the RAID alignment. "-D /dev/md1"
doesn't mention anything but "-E /dev/sda3" mentions the 136 sectors
data offset. Does that mean that the actual RAID data start at the 136
sectors ? If yes then the RAID isn't aligned with the SSD which in turn
messes every alignment on top of it. I tried to create the array with
an internal bitmap so that the bitmap occupies some space thus
increasing the offset but it didn't work. The offset is still 136 sectors.
Is there a way to make the RAID data start at a particular offset ?
(In my case 512 sectors) ?
If my thoughts are flawed then please correct me.
Thank you for your time.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment of RAID on specific boundary
2010-01-04 19:30 Alignment of RAID on specific boundary Khelben Blackstaff
@ 2010-01-04 21:14 ` David Rees
2010-01-04 21:51 ` Khelben Blackstaff
2010-01-05 6:57 ` Michael Evans
1 sibling, 1 reply; 11+ messages in thread
From: David Rees @ 2010-01-04 21:14 UTC (permalink / raw)
To: Khelben Blackstaff; +Cc: linux-raid
On Mon, Jan 4, 2010 at 11:30 AM, Khelben Blackstaff
<eye.of.the.8eholder@gmail.com> wrote:
> Is there a way to make the RAID data start at a particular offset ?
> (In my case 512 sectors) ?
http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/
May have to tweak your fdisk parameters a bit.
Don't worry about aligning your /boot partition since you don't
read/write from that much.
-Dave
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment of RAID on specific boundary
2010-01-04 21:14 ` David Rees
@ 2010-01-04 21:51 ` Khelben Blackstaff
2010-01-05 8:24 ` Antonio Perez
0 siblings, 1 reply; 11+ messages in thread
From: Khelben Blackstaff @ 2010-01-04 21:51 UTC (permalink / raw)
To: David Rees; +Cc: linux-raid
Hello Mr. Rees and thank you for replying.
I have read this post and it is very good but it mentions alignment
of the partitions using special CHS values and LVM/ext4 options.
My partitions and both LUKS/LVM are aligned properly (at least i think
they are). I am worried about the alignment of RAID itself.
2010/1/4 David Rees <drees76@gmail.com>:
> On Mon, Jan 4, 2010 at 11:30 AM, Khelben Blackstaff
> <eye.of.the.8eholder@gmail.com> wrote:
>> Is there a way to make the RAID data start at a particular offset ?
>> (In my case 512 sectors) ?
>
> http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/
>
> May have to tweak your fdisk parameters a bit.
>
> Don't worry about aligning your /boot partition since you don't
> read/write from that much.
>
> -Dave
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment of RAID on specific boundary
2010-01-04 19:30 Alignment of RAID on specific boundary Khelben Blackstaff
2010-01-04 21:14 ` David Rees
@ 2010-01-05 6:57 ` Michael Evans
2010-01-05 7:50 ` Michal Soltys
2010-01-05 13:44 ` Khelben Blackstaff
1 sibling, 2 replies; 11+ messages in thread
From: Michael Evans @ 2010-01-05 6:57 UTC (permalink / raw)
To: Khelben Blackstaff; +Cc: linux-raid
On Mon, Jan 4, 2010 at 11:30 AM, Khelben Blackstaff
<eye.of.the.8eholder@gmail.com> wrote:
He's using SDDs with 128KByte erase blocks.
> Is there a way to make the RAID data start at a particular offset ?
> (In my case 512 sectors) ?
>
> Thank you for your time.
> --
You want the RAID data chunks aligned so that they start on a 512byte
sector address that fulfills (sector & 0xFF) == 0.
As far as I am aware, using metadata 0.90 OR 1.0 will start the actual
data at the beginning of the partition, reserve some space near the
end, and use the very end for the metadata.
Format 1.1 will reserve the space at the beginning, and I have just checked;
00000400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00003000 (LVM2 header)
It starts the LVM2 header after 0x3000 bytes (64K chunk size); I'd
hope to see it around 0x10000.
It looks like the data isn't padded up to the desired offset.
I agree, it would be useful to have an option to specify the offset of
the first data chunk.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment of RAID on specific boundary
2010-01-05 6:57 ` Michael Evans
@ 2010-01-05 7:50 ` Michal Soltys
2010-01-05 13:44 ` Khelben Blackstaff
1 sibling, 0 replies; 11+ messages in thread
From: Michal Soltys @ 2010-01-05 7:50 UTC (permalink / raw)
To: Michael Evans; +Cc: Khelben Blackstaff, linux-raid
Michael Evans wrote:
>
> It starts the LVM2 header after 0x3000 bytes (64K chunk size); I'd
> hope to see it around 0x10000.
>
> It looks like the data isn't padded up to the desired offset.
>
> I agree, it would be useful to have an option to specify the offset of
> the first data chunk.
Regarding lvm, you can use --dataalignment and --metadatasize for that purpose.
Recent lvm versions, when creating directly on md raid (and assuming recent kernel),
should align themselves automatically:
http://www.redhat.com/archives/linux-lvm/2009-September/msg00092.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment of RAID on specific boundary
2010-01-04 21:51 ` Khelben Blackstaff
@ 2010-01-05 8:24 ` Antonio Perez
2010-01-05 13:02 ` Khelben Blackstaff
0 siblings, 1 reply; 11+ messages in thread
From: Antonio Perez @ 2010-01-05 8:24 UTC (permalink / raw)
To: linux-raid
Khelben Blackstaff wrote:
> Hello Mr. Rees and thank you for replying.
>
> I have read this post and it is very good but it mentions alignment
> of the partitions using special CHS values and LVM/ext4 options.
Yes, and that seems to be the only way to align partitions to a 128k
boundary. You do need to use such rules to create aligned partitions.
Then, you need to align RAID created from such partitions to 128k Blocks, by
setting the chunk-size to 128k: "mdadm --chunk=128k" ( well, a multiple of
that, anyway ).
And, then, align LVM ( with the --dataalignment option for pvcreate ) and
the Filesystem ( with the -E stripe-width= to the mkfs command ) to such
boundaries as well. That applies to ext3 AND ext4 filesystems AFAIK.
> My partitions and both LUKS/LVM are aligned properly (at least i think
> they are). I am worried about the alignment of RAID itself.
On that link I read this comment:
If your SSD has an 128k erase block size, and you are creating the file
system with the default 4k block size, you just have to specify a strip
width when you create the file system, like so:
# mke2fs -t ext4 -E stripe-width=32,resize=500G /dev/ssd/root
So, to align your Raid to the SSD boundary, you need to use the "stripe-
with" parameter while creating the filesystem. Assuming the partition start
has been set to a 128k boundary as commented before.
> 2010/1/4 David Rees <drees76@gmail.com>:
>> On Mon, Jan 4, 2010 at 11:30 AM, Khelben Blackstaff
>> <eye.of.the.8eholder@gmail.com> wrote:
>>> Is there a way to make the RAID data start at a particular offset ?
>>> (In my case 512 sectors) ?
>>
>> http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-
erase-block-size/
>>
>> May have to tweak your fdisk parameters a bit.
>>
>> Don't worry about aligning your /boot partition since you don't
>> read/write from that much.
>>
>> -Dave
--
Antonio Perez
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment of RAID on specific boundary
2010-01-05 8:24 ` Antonio Perez
@ 2010-01-05 13:02 ` Khelben Blackstaff
0 siblings, 0 replies; 11+ messages in thread
From: Khelben Blackstaff @ 2010-01-05 13:02 UTC (permalink / raw)
To: ap23563m; +Cc: linux-raid
2010/1/5 Michal Soltys <soltys@ziu.info>:
> Michael Evans wrote:
>
> Regarding lvm, you can use --dataalignment and --metadatasize for that
> purpose. Recent lvm versions, when creating directly on md raid (and
> assuming recent kernel), should align themselves automatically:
>
> http://www.redhat.com/archives/linux-lvm/2009-September/msg00092.html
>
Yes, i used the latest lvm and it aligned correctly automatically. I am
interested in aligning the RAID itself.
2010/1/5 Antonio Perez <ap23563m@gmx.com>:
>> I have read this post and it is very good but it mentions alignment
>> of the partitions using special CHS values and LVM/ext4 options.
>
> Yes, and that seems to be the only way to align partitions to a 128k
> boundary. You do need to use such rules to create aligned partitions.
>
> Then, you need to align RAID created from such partitions to 128k Blocks, by
> setting the chunk-size to 128k: "mdadm --chunk=128k" ( well, a multiple of
> that, anyway ).
>
> And, then, align LVM ( with the --dataalignment option for pvcreate ) and
> the Filesystem ( with the -E stripe-width= to the mkfs command ) to such
> boundaries as well. That applies to ext3 AND ext4 filesystems AFAIK.
>
These non-default CHS values are used with MBR partitioning scheme
to make aligning easy. fdisk (when it is ran without the -u option) aligns
partitions on a cylinder boundary and because the cylinder size is a multiple
of 128K all the partitions (except the first) are automatically aligned.
But i use GPT so i don't need them. If you read my original post i manually
created the partitions to be aligned with 128K.
As for the RAID, i chose the default 512K chunk which is multiple of 128K
so this is fine too. But i read in the "mdadm -E" output that there is a
"136 sectors data offset". If i understand this correctly the start of the
md device is 136 sectors from the start of the underlying partition. This
breaks the alignment with the partition and all correct alignments on top
of RAID.
There is of course possibility that i didn't understand it correctly and
this "data offset" means something else. Mr. Brown can clariffy that.
Thank you again for your time.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment of RAID on specific boundary
2010-01-05 6:57 ` Michael Evans
2010-01-05 7:50 ` Michal Soltys
@ 2010-01-05 13:44 ` Khelben Blackstaff
2010-01-05 15:35 ` John Robinson
1 sibling, 1 reply; 11+ messages in thread
From: Khelben Blackstaff @ 2010-01-05 13:44 UTC (permalink / raw)
To: Michael Evans; +Cc: linux-raid
2010/1/5 Michael Evans <mjevans1983@gmail.com>:
> On Mon, Jan 4, 2010 at 11:30 AM, Khelben Blackstaff
> <eye.of.the.8eholder@gmail.com> wrote:
>
> You want the RAID data chunks aligned so that they start on a 512byte
> sector address that fulfills (sector & 0xFF) == 0.
>
> Format 1.1 will reserve the space at the beginning, and I have just checked;
>
> 00000400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> *
> 00003000 (LVM2 header)
>
> It starts the LVM2 header after 0x3000 bytes (64K chunk size); I'd
> hope to see it around 0x10000.
>
> It looks like the data isn't padded up to the desired offset.
>
> I agree, it would be useful to have an option to specify the offset of
> the first data chunk.
>
Thank you for bringing hexdump to my attention. I had completely
forgot to run it. After reading your post i thought to run hexdump too.
I wrote 10MB of 1s (0x31 in HEX) to both partitions. So "0x31"
means "space untouched by mdadm". Then created the array
and wrote 10KB of 2s (0x32) to it. So "0x32" means "actual data".
The hexdump output is the following:
0000110 FFFF FFFF FFFF FFFF
*
0000400 3131 3131 3131 3131
*
0011000 3232 3232 3232 3232
The data written by mdadm (the superblock i guess) ends at 0x400
like you posted. So the v1.1 raid superblock has 1K size i guess.
In my case the payload starts at 0x11000 (the space between 0x400
and 0x11000 is maybe where the bitmap lives ?)
0x11000 = 69632 bytes = 136 sectors
Then my understanding of "data offset" is correct. I need to move
this offset from 136 sectors to 512 sectors.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment of RAID on specific boundary
2010-01-05 13:44 ` Khelben Blackstaff
@ 2010-01-05 15:35 ` John Robinson
2010-01-06 1:46 ` Michael Evans
0 siblings, 1 reply; 11+ messages in thread
From: John Robinson @ 2010-01-05 15:35 UTC (permalink / raw)
To: Khelben Blackstaff; +Cc: linux-raid
On 05/01/2010 13:44, Khelben Blackstaff wrote:
[...]
> Then my understanding of "data offset" is correct. I need to move
> this offset from 136 sectors to 512 sectors.
Or move your partition start back by 136 sectors, or use version 0.90 or
1.0 superblocks which have the superblock at the end.
Cheers,
John.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment of RAID on specific boundary
2010-01-05 15:35 ` John Robinson
@ 2010-01-06 1:46 ` Michael Evans
[not found] ` <7b3edae11001060500g2d63f2cav81bf8a90fc0b946b@mail.gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Michael Evans @ 2010-01-06 1:46 UTC (permalink / raw)
To: John Robinson; +Cc: Khelben Blackstaff, linux-raid
On Tue, Jan 5, 2010 at 7:35 AM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
> On 05/01/2010 13:44, Khelben Blackstaff wrote:
> [...]
>>> Then my understanding of "data offset" is correct. I need to move
>> this offset from 136 sectors to 512 sectors.
>
> Or move your partition start back by 136 sectors, or use version 0.90 or 1.0
> superblocks which have the superblock at the end.
>
> Cheers,
>
> John.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Indeed, that was one route I covered earlier in my message.
However it is still a good idea improvement for mdadm to allow the
user to specify a non-default offset for the start of the
data-partition (for a number of reasons).
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment of RAID on specific boundary
[not found] ` <7b3edae11001060500g2d63f2cav81bf8a90fc0b946b@mail.gmail.com>
@ 2010-01-06 14:44 ` Khelben Blackstaff
0 siblings, 0 replies; 11+ messages in thread
From: Khelben Blackstaff @ 2010-01-06 14:44 UTC (permalink / raw)
To: linux-raid
2010/1/6 Michael Evans <mjevans1983@gmail.com>:
> <john.robinson@anonymous.org.uk> wrote:
>> Or move your partition start back by 136 sectors, or use version 0.90 or 1.0
>> superblocks which have the superblock at the end.
> Indeed, that was one route I covered earlier in my message.
>
> However it is still a good idea improvement for mdadm to allow the
> user to specify a non-default offset for the start of the
> data-partition (for a number of reasons).
>
Yes, it will be nice if Mr. Brown implements this when he finds
the time. I will use v1.0 as you both suggested till then.
Since i am experimenting in Virtualbox, i tried to
hexedit the superblock. I modified 0x80 (offset)
from 136 to 256 and subtracted 256-136=120
from 0x88 (size). It seems to work fine :)
Of course i did this because it is a test array in VB.
I wouldn't mess with the real array. I just wanted
to see if it will work.
Thank you all for your help.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-01-06 14:44 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-04 19:30 Alignment of RAID on specific boundary Khelben Blackstaff
2010-01-04 21:14 ` David Rees
2010-01-04 21:51 ` Khelben Blackstaff
2010-01-05 8:24 ` Antonio Perez
2010-01-05 13:02 ` Khelben Blackstaff
2010-01-05 6:57 ` Michael Evans
2010-01-05 7:50 ` Michal Soltys
2010-01-05 13:44 ` Khelben Blackstaff
2010-01-05 15:35 ` John Robinson
2010-01-06 1:46 ` Michael Evans
[not found] ` <7b3edae11001060500g2d63f2cav81bf8a90fc0b946b@mail.gmail.com>
2010-01-06 14:44 ` Khelben Blackstaff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).