linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Alignment of RAID on specific boundary
@ 2010-01-04 19:30 Khelben Blackstaff
  2010-01-04 21:14 ` David Rees
  2010-01-05  6:57 ` Michael Evans
  0 siblings, 2 replies; 11+ messages in thread
From: Khelben Blackstaff @ 2010-01-04 19:30 UTC (permalink / raw)
  To: linux-raid

Hello and happy new year.

I browsed the archive to find similar threads and learned valuable info
but every thread i found was about aligning something on top of raid
(mostly lvm) not aligning the raid itself. I apologize if it is already answered
and i missed it.

A friend of mine has two SSDs and wants a setup of RAID1->LUKS->LVM.
I have done this setup before but never bothered to align them.
Instead of RAID1 i thought to setup RAID10,f2. I know it cannot be grown
but until the price of SSDs are lowered enough for him to buy larger ones
the grow support will probably be implemented.

I have tried the procedure in Virtualbox first to make sure i don't
make any mistakes.
Here are the details:

Disk /dev/sda: 83886080 sectors, 40.0 GiB

No   Start          End      Size            Name
1         40           255    108.0 KiB     BIOS boot partition
2       256     262399    128.0 MiB     Linux RAID
3 262400 83886046    39.9 GiB      Linux RAID

I thought to align partitions on a 128K boundary to match the erase block,
so i aligned them to 256 sectors. The disks use a GPT label.
GPT doesn't provide a GUID for 0xDA Non-FS Data so i used Linux RAID.
I don't think there is a danger of a rescue cd to mess with the
partition because
few if any know about GPT. Otherwise, i can also use Linux Reserved.

I will install LILO but i created the BIOS boot partition in case he
later wants to use
grub2. The 128MB array will be RAID1 /boot and uses 0.90 metadata because
both lilo and grub2 can't boot from v1 metadata (unless the grub2 wiki is old).

My main concern is to align the large array.

% mdadm -V
mdadm - v3.1.1 - 19th November 2009

% mdadm -C /dev/md1 -l 10 -e 1.1 -p f2 -n 2 --name=vbmd /dev/sd[ab]3
% mdadm -E /dev/sda3
Version : 1.1
Avail Dev Size : 83623511
Array Size : 83621888
Used Dev Size : 83621888
Data Offset : 136 sectors
Super Offset : 0 sectors
Layout : far=2
Chunk Size : 512K

So i have 2 disks with 512K chunk (which is divisable by 128K so everything
is fine) so a full stripe is 1MB.

% cryptsetup -c aes-xts-plain -s 512 --align-payload=2048 luksFormat /dev/md1
% cryptsetup luksDump /dev/md1
Payload offset: 4096


% cryptsetup luksOpen /dev/md1 vbcrypt
% pvcreate /dev/mapper/vbcrypt
% pvs -o +pe_start
1st PE 1.00m

As you see LUKS payload starts at 2MB and LVM payload starts at 1MB offset.
I will use 16MB extent size in LVM which is divisible with the 1MB stripe.
So if i am not mistaken, everything is correctly aligned on top of RAID.

The only problem i have is the RAID alignment. "-D /dev/md1"
doesn't mention anything but "-E /dev/sda3" mentions the 136 sectors
data offset. Does that mean that the actual RAID data start at the 136
sectors ? If yes then the RAID isn't aligned with the SSD which in turn
messes every alignment on top of it. I tried to create the array with
an internal bitmap so that the bitmap occupies some space thus
increasing the offset but it didn't work. The offset is still 136 sectors.

Is there a way to make the RAID data start at a particular offset ?
(In my case 512 sectors) ?

If my thoughts are flawed then please correct me.

Thank you for your time.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment of RAID on specific boundary
  2010-01-04 19:30 Alignment of RAID on specific boundary Khelben Blackstaff
@ 2010-01-04 21:14 ` David Rees
  2010-01-04 21:51   ` Khelben Blackstaff
  2010-01-05  6:57 ` Michael Evans
  1 sibling, 1 reply; 11+ messages in thread
From: David Rees @ 2010-01-04 21:14 UTC (permalink / raw)
  To: Khelben Blackstaff; +Cc: linux-raid

On Mon, Jan 4, 2010 at 11:30 AM, Khelben Blackstaff
<eye.of.the.8eholder@gmail.com> wrote:
> Is there a way to make the RAID data start at a particular offset ?
> (In my case 512 sectors) ?

http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/

May have to tweak your fdisk parameters a bit.

Don't worry about aligning your /boot partition since you don't
read/write from that much.

-Dave

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment of RAID on specific boundary
  2010-01-04 21:14 ` David Rees
@ 2010-01-04 21:51   ` Khelben Blackstaff
  2010-01-05  8:24     ` Antonio Perez
  0 siblings, 1 reply; 11+ messages in thread
From: Khelben Blackstaff @ 2010-01-04 21:51 UTC (permalink / raw)
  To: David Rees; +Cc: linux-raid

Hello Mr. Rees and thank you for replying.

I have read this post and it is very good but it mentions alignment
of the partitions using special CHS values and LVM/ext4 options.

My partitions and both LUKS/LVM are aligned properly (at least i think
they are). I am worried about the alignment of RAID itself.

2010/1/4 David Rees <drees76@gmail.com>:
> On Mon, Jan 4, 2010 at 11:30 AM, Khelben Blackstaff
> <eye.of.the.8eholder@gmail.com> wrote:
>> Is there a way to make the RAID data start at a particular offset ?
>> (In my case 512 sectors) ?
>
> http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/
>
> May have to tweak your fdisk parameters a bit.
>
> Don't worry about aligning your /boot partition since you don't
> read/write from that much.
>
> -Dave
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment of RAID on specific boundary
  2010-01-04 19:30 Alignment of RAID on specific boundary Khelben Blackstaff
  2010-01-04 21:14 ` David Rees
@ 2010-01-05  6:57 ` Michael Evans
  2010-01-05  7:50   ` Michal Soltys
  2010-01-05 13:44   ` Khelben Blackstaff
  1 sibling, 2 replies; 11+ messages in thread
From: Michael Evans @ 2010-01-05  6:57 UTC (permalink / raw)
  To: Khelben Blackstaff; +Cc: linux-raid

On Mon, Jan 4, 2010 at 11:30 AM, Khelben Blackstaff
<eye.of.the.8eholder@gmail.com> wrote:

He's using SDDs with 128KByte erase blocks.

> Is there a way to make the RAID data start at a particular offset ?
> (In my case 512 sectors) ?
>
> Thank you for your time.
> --

You want the RAID data chunks aligned so that they start on a 512byte
sector address that fulfills (sector & 0xFF) == 0.

As far as I am aware, using metadata 0.90 OR 1.0 will start the actual
data at the beginning of the partition, reserve some space near the
end, and use the very end for the metadata.

Format 1.1 will reserve the space at the beginning, and I have just checked;

00000400  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00003000  (LVM2 header)

It starts the LVM2 header after 0x3000 bytes (64K chunk size); I'd
hope to see it around 0x10000.

It looks like the data isn't padded up to the desired offset.

I agree, it would be useful to have an option to specify the offset of
the first data chunk.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment of RAID on specific boundary
  2010-01-05  6:57 ` Michael Evans
@ 2010-01-05  7:50   ` Michal Soltys
  2010-01-05 13:44   ` Khelben Blackstaff
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Soltys @ 2010-01-05  7:50 UTC (permalink / raw)
  To: Michael Evans; +Cc: Khelben Blackstaff, linux-raid

Michael Evans wrote:
> 
> It starts the LVM2 header after 0x3000 bytes (64K chunk size); I'd
> hope to see it around 0x10000.
> 
> It looks like the data isn't padded up to the desired offset.
> 
> I agree, it would be useful to have an option to specify the offset of
> the first data chunk.


Regarding lvm, you can use --dataalignment and --metadatasize for that purpose. 
Recent lvm versions, when creating directly on md raid (and assuming recent kernel), 
should align themselves automatically:

http://www.redhat.com/archives/linux-lvm/2009-September/msg00092.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment of RAID on specific boundary
  2010-01-04 21:51   ` Khelben Blackstaff
@ 2010-01-05  8:24     ` Antonio Perez
  2010-01-05 13:02       ` Khelben Blackstaff
  0 siblings, 1 reply; 11+ messages in thread
From: Antonio Perez @ 2010-01-05  8:24 UTC (permalink / raw)
  To: linux-raid

Khelben Blackstaff wrote:

> Hello Mr. Rees and thank you for replying.
> 
> I have read this post and it is very good but it mentions alignment
> of the partitions using special CHS values and LVM/ext4 options.

Yes, and that seems to be the only way to align partitions to a 128k 
boundary. You do need to use such rules to create aligned partitions.

Then, you need to align RAID created from such partitions to 128k Blocks, by 
setting the chunk-size to 128k: "mdadm --chunk=128k" ( well, a multiple of 
that, anyway ).

And, then, align LVM ( with the --dataalignment option for pvcreate ) and 
the Filesystem ( with the -E stripe-width= to the mkfs command ) to such 
boundaries as well. That applies to ext3 AND ext4 filesystems AFAIK.
 
> My partitions and both LUKS/LVM are aligned properly (at least i think
> they are). I am worried about the alignment of RAID itself.

On that link I read this comment:

	If your SSD has an 128k erase block size, and you are creating the file
	system with the default 4k block size, you just have to specify a strip
	width when you create the file system, like so:

	# mke2fs -t ext4 -E stripe-width=32,resize=500G /dev/ssd/root

So, to align your Raid to the SSD boundary, you need to use the "stripe-
with" parameter while creating the filesystem. Assuming the partition start 
has been set to a 128k boundary as commented before.

> 2010/1/4 David Rees <drees76@gmail.com>:
>> On Mon, Jan 4, 2010 at 11:30 AM, Khelben Blackstaff
>> <eye.of.the.8eholder@gmail.com> wrote:
>>> Is there a way to make the RAID data start at a particular offset ?
>>> (In my case 512 sectors) ?
>>
>> http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-
erase-block-size/
>>
>> May have to tweak your fdisk parameters a bit.
>>
>> Don't worry about aligning your /boot partition since you don't
>> read/write from that much.
>>
>> -Dave

-- 
Antonio Perez


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment of RAID on specific boundary
  2010-01-05  8:24     ` Antonio Perez
@ 2010-01-05 13:02       ` Khelben Blackstaff
  0 siblings, 0 replies; 11+ messages in thread
From: Khelben Blackstaff @ 2010-01-05 13:02 UTC (permalink / raw)
  To: ap23563m; +Cc: linux-raid

2010/1/5 Michal Soltys <soltys@ziu.info>:
> Michael Evans wrote:
>
> Regarding lvm, you can use --dataalignment and --metadatasize for that
> purpose. Recent lvm versions, when creating directly on md raid (and
> assuming recent kernel), should align themselves automatically:
>
> http://www.redhat.com/archives/linux-lvm/2009-September/msg00092.html
>

Yes, i used the latest lvm and it aligned correctly automatically. I am
interested in aligning the RAID itself.

2010/1/5 Antonio Perez <ap23563m@gmx.com>:
>> I have read this post and it is very good but it mentions alignment
>> of the partitions using special CHS values and LVM/ext4 options.
>
> Yes, and that seems to be the only way to align partitions to a 128k
> boundary. You do need to use such rules to create aligned partitions.
>
> Then, you need to align RAID created from such partitions to 128k Blocks, by
> setting the chunk-size to 128k: "mdadm --chunk=128k" ( well, a multiple of
> that, anyway ).
>
> And, then, align LVM ( with the --dataalignment option for pvcreate ) and
> the Filesystem ( with the -E stripe-width= to the mkfs command ) to such
> boundaries as well. That applies to ext3 AND ext4 filesystems AFAIK.
>

These non-default CHS values are used with MBR partitioning scheme
to make aligning easy. fdisk (when it is ran without the -u option) aligns
partitions on a cylinder boundary and because the cylinder size is a multiple
of 128K all the partitions (except the first) are automatically aligned.
But i use GPT so i don't need them. If you read my original post i manually
created the partitions to be aligned with 128K.

As for the RAID, i chose the default 512K chunk which is multiple of 128K
so this is fine too. But i read in the "mdadm -E" output that there is a
"136 sectors data offset". If i understand this correctly the start of the
md device is 136 sectors from the start of the underlying partition. This
breaks the alignment with the partition and all correct alignments on top
of RAID.

There is of course possibility that i didn't understand it correctly and
this "data offset" means something else. Mr. Brown can clariffy that.

Thank you again for your time.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment of RAID on specific boundary
  2010-01-05  6:57 ` Michael Evans
  2010-01-05  7:50   ` Michal Soltys
@ 2010-01-05 13:44   ` Khelben Blackstaff
  2010-01-05 15:35     ` John Robinson
  1 sibling, 1 reply; 11+ messages in thread
From: Khelben Blackstaff @ 2010-01-05 13:44 UTC (permalink / raw)
  To: Michael Evans; +Cc: linux-raid

2010/1/5 Michael Evans <mjevans1983@gmail.com>:
> On Mon, Jan 4, 2010 at 11:30 AM, Khelben Blackstaff
> <eye.of.the.8eholder@gmail.com> wrote:
>
> You want the RAID data chunks aligned so that they start on a 512byte
> sector address that fulfills (sector & 0xFF) == 0.
>
> Format 1.1 will reserve the space at the beginning, and I have just checked;
>
> 00000400  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> *
> 00003000  (LVM2 header)
>
> It starts the LVM2 header after 0x3000 bytes (64K chunk size); I'd
> hope to see it around 0x10000.
>
> It looks like the data isn't padded up to the desired offset.
>
> I agree, it would be useful to have an option to specify the offset of
> the first data chunk.
>

Thank you for bringing hexdump to my attention. I had completely
forgot to run it. After reading your post i thought to run hexdump too.

I wrote 10MB of 1s (0x31 in HEX) to both partitions. So "0x31"
means "space untouched by mdadm". Then created the array
and wrote 10KB of 2s (0x32) to it. So "0x32" means "actual data".
The hexdump output is the following:

0000110 FFFF FFFF FFFF FFFF
*
0000400 3131 3131 3131 3131
*
0011000 3232 3232 3232 3232

The data written by mdadm (the superblock i guess) ends at 0x400
like you posted. So the v1.1 raid superblock has 1K size i guess.
In my case the payload starts at 0x11000 (the space between 0x400
and 0x11000 is maybe where the bitmap lives ?)

0x11000 = 69632 bytes = 136 sectors

Then my understanding of "data offset" is correct. I need to move
this offset from 136 sectors to 512 sectors.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment of RAID on specific boundary
  2010-01-05 13:44   ` Khelben Blackstaff
@ 2010-01-05 15:35     ` John Robinson
  2010-01-06  1:46       ` Michael Evans
  0 siblings, 1 reply; 11+ messages in thread
From: John Robinson @ 2010-01-05 15:35 UTC (permalink / raw)
  To: Khelben Blackstaff; +Cc: linux-raid

On 05/01/2010 13:44, Khelben Blackstaff wrote:
[...]
> Then my understanding of "data offset" is correct. I need to move
> this offset from 136 sectors to 512 sectors.

Or move your partition start back by 136 sectors, or use version 0.90 or 
1.0 superblocks which have the superblock at the end.

Cheers,

John.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment of RAID on specific boundary
  2010-01-05 15:35     ` John Robinson
@ 2010-01-06  1:46       ` Michael Evans
       [not found]         ` <7b3edae11001060500g2d63f2cav81bf8a90fc0b946b@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Evans @ 2010-01-06  1:46 UTC (permalink / raw)
  To: John Robinson; +Cc: Khelben Blackstaff, linux-raid

On Tue, Jan 5, 2010 at 7:35 AM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
> On 05/01/2010 13:44, Khelben Blackstaff wrote:
> [...]
>>> Then my understanding of "data offset" is correct. I need to move
>> this offset from 136 sectors to 512 sectors.
>
> Or move your partition start back by 136 sectors, or use version 0.90 or 1.0
> superblocks which have the superblock at the end.
>
> Cheers,
>
> John.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Indeed, that was one route I covered earlier in my message.

However it is still a good idea improvement for mdadm to allow the
user to specify a non-default offset for the start of the
data-partition (for a number of reasons).
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Alignment of RAID on specific boundary
       [not found]         ` <7b3edae11001060500g2d63f2cav81bf8a90fc0b946b@mail.gmail.com>
@ 2010-01-06 14:44           ` Khelben Blackstaff
  0 siblings, 0 replies; 11+ messages in thread
From: Khelben Blackstaff @ 2010-01-06 14:44 UTC (permalink / raw)
  To: linux-raid

2010/1/6 Michael Evans <mjevans1983@gmail.com>:
> <john.robinson@anonymous.org.uk> wrote:
>> Or move your partition start back by 136 sectors, or use version 0.90 or 1.0
>> superblocks which have the superblock at the end.
> Indeed, that was one route I covered earlier in my message.
>
> However it is still a good idea improvement for mdadm to allow the
> user to specify a non-default offset for the start of the
> data-partition (for a number of reasons).
>

Yes, it will be nice if Mr. Brown implements this when he finds
the time. I will use v1.0 as you both suggested till then.

Since i am experimenting in Virtualbox, i tried to
hexedit the superblock. I modified 0x80 (offset)
from 136 to 256 and subtracted 256-136=120
from 0x88 (size). It seems to work fine :)

Of course i did this because it is a test array in VB.
I wouldn't mess with the real array. I just wanted
to see if it will work.

Thank you all for your help.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-01-06 14:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-04 19:30 Alignment of RAID on specific boundary Khelben Blackstaff
2010-01-04 21:14 ` David Rees
2010-01-04 21:51   ` Khelben Blackstaff
2010-01-05  8:24     ` Antonio Perez
2010-01-05 13:02       ` Khelben Blackstaff
2010-01-05  6:57 ` Michael Evans
2010-01-05  7:50   ` Michal Soltys
2010-01-05 13:44   ` Khelben Blackstaff
2010-01-05 15:35     ` John Robinson
2010-01-06  1:46       ` Michael Evans
     [not found]         ` <7b3edae11001060500g2d63f2cav81bf8a90fc0b946b@mail.gmail.com>
2010-01-06 14:44           ` Khelben Blackstaff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).