All of lore.kernel.org
 help / color / mirror / Atom feed
* Issue with md and 4K sector alignment
@ 2012-08-19 21:06 Kyle Brantley
  2012-08-20  5:36 ` Mikael Abrahamsson
  0 siblings, 1 reply; 5+ messages in thread
From: Kyle Brantley @ 2012-08-19 21:06 UTC (permalink / raw)
  To: linux-raid

I've got a set of 9x3TB drives that I'm trying to place in RAID6. These 
have the 512B/4096B logical/physical compatibility emulation:

Model: ATA ST3000DM001-1CH1 (scsi)
Disk /dev/sda: 3001GB
Sector size (logical/physical): 512B/4096B

This is being run on CentOS6:

mdadm - v3.2.3 - 23rd December 2011
Linux vmbox 2.6.32-279.5.1.el6.x86_64 #1 SMP Tue Aug 14 23:54:45 UTC 
2012 x86_64 x86_64 x86_64 GNU/Linux


In general, I'm having a hard time telling the md subsystem to align to 
4K sectors. This is evident in a few ways:

* resync speed / time

Default 512k chunk:
       20511855616 blocks super 1.2 level 6, 512k chunk, algorithm 2 
[9/9] [UUUUUUUUU]
       [>....................]  resync =  0.0% (187904/2930265088) 
finish=1559.2min speed=31317K/sec

4k chunk size (no functional change):
       20511857968 blocks super 1.2 level 6, 4k chunk, algorithm 2 [9/9] 
[UUUUUUUUU]
       [>....................]  resync =  0.0% (475820/2930265424) 
finish=1436.6min speed=33987K/sec


I rebuilt the array  with --assume-clean and default chunk size, and 
then ran some simple tests with dd.

* Read test, not 4K aligned:
[root@vmbox ~]# dd if=/dev/md127 of=/dev/zero
12228837376 bytes (12 GB) copied, 30.568 s, 400 MB/s
24344251904 bytes (24 GB) copied, 60.9207 s, 400 MB/s

* Read test, manually 4K aligned:
[root@vmbox ~]# dd if=/dev/md127 of=/dev/zero bs=4096
18783485952 bytes (19 GB) copied, 30.7766 s, 610 MB/s
37306327040 bytes (37 GB) copied, 61.1433 s, 610 MB/s

* Write test, not 4K aligned:
[root@vmbox ~]# dd if=/dev/zero of=/dev/md127
774734336 bytes (775 MB) copied, 31.1458 s, 24.9 MB/s
1438485504 bytes (1.4 GB) copied, 61.5351 s, 23.4 MB/s

* Write test, manually 4K aligned, and ran over a much longer period of 
time to ensure that the buffers don't get in the way:
30602686464 bytes (31 GB) copied, 121.036 s, 253 MB/s
63765032960 bytes (64 GB) copied, 301.284 s, 212 MB/s

Or, in other words, I'm seeing a 200MB/sec (+1.5x) read boost if I 
manually align the I/O, and a 200MB/sec (+10x) write boost if manually 
aligned. Note how the non-aligned dd run more or less matches the resync 
speeds listed above.

I understand that I may need to work on the higher layers (LVM, 
partitioning -- and if there is any insight here, it would be 
appreciated!) with respect to the alignment, but my concern is the 
resync times. I've tried building the array off of both the raw disks 
and 4K aligned partitions placed on the disks -- the resync performance 
is identical, and poor.

How exactly should I construct this array to fix the resync time / align 
the I/O? I've searched everywhere that I can find but have yet to find a 
solution.

Thanks for any insight!
--Kyle

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issue with md and 4K sector alignment
  2012-08-19 21:06 Issue with md and 4K sector alignment Kyle Brantley
@ 2012-08-20  5:36 ` Mikael Abrahamsson
  2012-08-20  7:28   ` David Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Mikael Abrahamsson @ 2012-08-20  5:36 UTC (permalink / raw)
  To: Kyle Brantley; +Cc: linux-raid

On Sun, 19 Aug 2012, Kyle Brantley wrote:

> I understand that I may need to work on the higher layers (LVM, 
> partitioning -- and if there is any insight here, it would be 
> appreciated!) with respect to the alignment, but my concern is the 
> resync times. I've tried building the array off of both the raw disks 
> and 4K aligned partitions placed on the disks -- the resync performance 
> is identical, and poor.

If you create the md directly on the device instead of doing partitions, 
you don't have to worry. LVM and md are already 4k aligned.

If you create partitions, then you have to make sure they're 4k aligned.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issue with md and 4K sector alignment
  2012-08-20  5:36 ` Mikael Abrahamsson
@ 2012-08-20  7:28   ` David Brown
  2012-09-06 20:34     ` H. Peter Anvin
  0 siblings, 1 reply; 5+ messages in thread
From: David Brown @ 2012-08-20  7:28 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Kyle Brantley, linux-raid

On 20/08/2012 07:36, Mikael Abrahamsson wrote:
> On Sun, 19 Aug 2012, Kyle Brantley wrote:
>
>> I understand that I may need to work on the higher layers (LVM,
>> partitioning -- and if there is any insight here, it would be
>> appreciated!) with respect to the alignment, but my concern is the
>> resync times. I've tried building the array off of both the raw disks
>> and 4K aligned partitions placed on the disks -- the resync
>> performance is identical, and poor.
>
> If you create the md directly on the device instead of doing partitions,
> you don't have to worry. LVM and md are already 4k aligned.
>
> If you create partitions, then you have to make sure they're 4k aligned.
>

It would be nice if LVM and md moved towards a bigger native alignment. 
  4K is good enough for hard disks, but for SSD's you want to align 
partitions on erase block boundaries.  The standard used by modern fdisk 
and gparted (and tools from the-OS-that-must-not-be-named) is 1 MB - a 
bigger alignment than is necessary for current SSDs, but one that will 
be good enough for the foreseeable future.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issue with md and 4K sector alignment
  2012-08-20  7:28   ` David Brown
@ 2012-09-06 20:34     ` H. Peter Anvin
  2012-09-07  6:19       ` Mikael Abrahamsson
  0 siblings, 1 reply; 5+ messages in thread
From: H. Peter Anvin @ 2012-09-06 20:34 UTC (permalink / raw)
  To: David Brown; +Cc: Mikael Abrahamsson, Kyle Brantley, linux-raid

On 08/20/2012 12:28 AM, David Brown wrote:
>
> It would be nice if LVM and md moved towards a bigger native alignment.
>   4K is good enough for hard disks, but for SSD's you want to align
> partitions on erase block boundaries.  The standard used by modern fdisk
> and gparted (and tools from the-OS-that-must-not-be-named) is 1 MB - a
> bigger alignment than is necessary for current SSDs, but one that will
> be good enough for the foreseeable future.
>

This is one advantage with superblock 1.0 -- the alignment of the 
underlying device is preserved since the metadata is at the end.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issue with md and 4K sector alignment
  2012-09-06 20:34     ` H. Peter Anvin
@ 2012-09-07  6:19       ` Mikael Abrahamsson
  0 siblings, 0 replies; 5+ messages in thread
From: Mikael Abrahamsson @ 2012-09-07  6:19 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: David Brown, Kyle Brantley, linux-raid

On Thu, 6 Sep 2012, H. Peter Anvin wrote:

> This is one advantage with superblock 1.0 -- the alignment of the 
> underlying device is preserved since the metadata is at the end.

This whole problem becomes quite complicated when one starts to take into 
account the whole block device stack, for instance 
drives->md->crypto->lvm->fs, when one wants to get the filesystem to 
understand where an md complete RAID6 stride (?) starts and ends, 
especially when this changes over time when drives are being added at a 
later date, or the LV being moved around, even when the filesystem is 
mounted.

I don't know how hard it would be to add infrastructure for an fs to find 
out this underlying information through all these layers, and if there is 
interest in doing so? I have just gotten accustomed to seeing 
approximately 10% reads when I do file writing to my RAID6. I tried to 
align it from the beginning, but now I just try to make sure it's 4k 
aligned (which is not hard, since I don't use partitions and cryptsetup, 
md and lvm are all 4k aligned already). I suspect that xfs doesn't 
correctly understand even where my 64k stripe size starts and ends even 
though I tried to align it, but it does know my stripe size, but I suspect 
that since I have added drives over time, xfs first block isn't aligned 
with the start of an even 1:st stripe location (so even though it knows 
that a complete stride set is X drives and Y k per stripe, this isn't 
correctly aligned to 1:st drive, so a complete stride write from xfs would 
span two different md strides).

PS. I am not sure I got the stripe/stride wording correct here, hope it 
can still be understood what I mean.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-09-07  6:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-19 21:06 Issue with md and 4K sector alignment Kyle Brantley
2012-08-20  5:36 ` Mikael Abrahamsson
2012-08-20  7:28   ` David Brown
2012-09-06 20:34     ` H. Peter Anvin
2012-09-07  6:19       ` Mikael Abrahamsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.