* Issue with md and 4K sector alignment
@ 2012-08-19 21:06 Kyle Brantley
2012-08-20 5:36 ` Mikael Abrahamsson
0 siblings, 1 reply; 5+ messages in thread
From: Kyle Brantley @ 2012-08-19 21:06 UTC (permalink / raw)
To: linux-raid
I've got a set of 9x3TB drives that I'm trying to place in RAID6. These
have the 512B/4096B logical/physical compatibility emulation:
Model: ATA ST3000DM001-1CH1 (scsi)
Disk /dev/sda: 3001GB
Sector size (logical/physical): 512B/4096B
This is being run on CentOS6:
mdadm - v3.2.3 - 23rd December 2011
Linux vmbox 2.6.32-279.5.1.el6.x86_64 #1 SMP Tue Aug 14 23:54:45 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux
In general, I'm having a hard time telling the md subsystem to align to
4K sectors. This is evident in a few ways:
* resync speed / time
Default 512k chunk:
20511855616 blocks super 1.2 level 6, 512k chunk, algorithm 2
[9/9] [UUUUUUUUU]
[>....................] resync = 0.0% (187904/2930265088)
finish=1559.2min speed=31317K/sec
4k chunk size (no functional change):
20511857968 blocks super 1.2 level 6, 4k chunk, algorithm 2 [9/9]
[UUUUUUUUU]
[>....................] resync = 0.0% (475820/2930265424)
finish=1436.6min speed=33987K/sec
I rebuilt the array with --assume-clean and default chunk size, and
then ran some simple tests with dd.
* Read test, not 4K aligned:
[root@vmbox ~]# dd if=/dev/md127 of=/dev/zero
12228837376 bytes (12 GB) copied, 30.568 s, 400 MB/s
24344251904 bytes (24 GB) copied, 60.9207 s, 400 MB/s
* Read test, manually 4K aligned:
[root@vmbox ~]# dd if=/dev/md127 of=/dev/zero bs=4096
18783485952 bytes (19 GB) copied, 30.7766 s, 610 MB/s
37306327040 bytes (37 GB) copied, 61.1433 s, 610 MB/s
* Write test, not 4K aligned:
[root@vmbox ~]# dd if=/dev/zero of=/dev/md127
774734336 bytes (775 MB) copied, 31.1458 s, 24.9 MB/s
1438485504 bytes (1.4 GB) copied, 61.5351 s, 23.4 MB/s
* Write test, manually 4K aligned, and ran over a much longer period of
time to ensure that the buffers don't get in the way:
30602686464 bytes (31 GB) copied, 121.036 s, 253 MB/s
63765032960 bytes (64 GB) copied, 301.284 s, 212 MB/s
Or, in other words, I'm seeing a 200MB/sec (+1.5x) read boost if I
manually align the I/O, and a 200MB/sec (+10x) write boost if manually
aligned. Note how the non-aligned dd run more or less matches the resync
speeds listed above.
I understand that I may need to work on the higher layers (LVM,
partitioning -- and if there is any insight here, it would be
appreciated!) with respect to the alignment, but my concern is the
resync times. I've tried building the array off of both the raw disks
and 4K aligned partitions placed on the disks -- the resync performance
is identical, and poor.
How exactly should I construct this array to fix the resync time / align
the I/O? I've searched everywhere that I can find but have yet to find a
solution.
Thanks for any insight!
--Kyle
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Issue with md and 4K sector alignment
2012-08-19 21:06 Issue with md and 4K sector alignment Kyle Brantley
@ 2012-08-20 5:36 ` Mikael Abrahamsson
2012-08-20 7:28 ` David Brown
0 siblings, 1 reply; 5+ messages in thread
From: Mikael Abrahamsson @ 2012-08-20 5:36 UTC (permalink / raw)
To: Kyle Brantley; +Cc: linux-raid
On Sun, 19 Aug 2012, Kyle Brantley wrote:
> I understand that I may need to work on the higher layers (LVM,
> partitioning -- and if there is any insight here, it would be
> appreciated!) with respect to the alignment, but my concern is the
> resync times. I've tried building the array off of both the raw disks
> and 4K aligned partitions placed on the disks -- the resync performance
> is identical, and poor.
If you create the md directly on the device instead of doing partitions,
you don't have to worry. LVM and md are already 4k aligned.
If you create partitions, then you have to make sure they're 4k aligned.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Issue with md and 4K sector alignment
2012-08-20 5:36 ` Mikael Abrahamsson
@ 2012-08-20 7:28 ` David Brown
2012-09-06 20:34 ` H. Peter Anvin
0 siblings, 1 reply; 5+ messages in thread
From: David Brown @ 2012-08-20 7:28 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: Kyle Brantley, linux-raid
On 20/08/2012 07:36, Mikael Abrahamsson wrote:
> On Sun, 19 Aug 2012, Kyle Brantley wrote:
>
>> I understand that I may need to work on the higher layers (LVM,
>> partitioning -- and if there is any insight here, it would be
>> appreciated!) with respect to the alignment, but my concern is the
>> resync times. I've tried building the array off of both the raw disks
>> and 4K aligned partitions placed on the disks -- the resync
>> performance is identical, and poor.
>
> If you create the md directly on the device instead of doing partitions,
> you don't have to worry. LVM and md are already 4k aligned.
>
> If you create partitions, then you have to make sure they're 4k aligned.
>
It would be nice if LVM and md moved towards a bigger native alignment.
4K is good enough for hard disks, but for SSD's you want to align
partitions on erase block boundaries. The standard used by modern fdisk
and gparted (and tools from the-OS-that-must-not-be-named) is 1 MB - a
bigger alignment than is necessary for current SSDs, but one that will
be good enough for the foreseeable future.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Issue with md and 4K sector alignment
2012-08-20 7:28 ` David Brown
@ 2012-09-06 20:34 ` H. Peter Anvin
2012-09-07 6:19 ` Mikael Abrahamsson
0 siblings, 1 reply; 5+ messages in thread
From: H. Peter Anvin @ 2012-09-06 20:34 UTC (permalink / raw)
To: David Brown; +Cc: Mikael Abrahamsson, Kyle Brantley, linux-raid
On 08/20/2012 12:28 AM, David Brown wrote:
>
> It would be nice if LVM and md moved towards a bigger native alignment.
> 4K is good enough for hard disks, but for SSD's you want to align
> partitions on erase block boundaries. The standard used by modern fdisk
> and gparted (and tools from the-OS-that-must-not-be-named) is 1 MB - a
> bigger alignment than is necessary for current SSDs, but one that will
> be good enough for the foreseeable future.
>
This is one advantage with superblock 1.0 -- the alignment of the
underlying device is preserved since the metadata is at the end.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Issue with md and 4K sector alignment
2012-09-06 20:34 ` H. Peter Anvin
@ 2012-09-07 6:19 ` Mikael Abrahamsson
0 siblings, 0 replies; 5+ messages in thread
From: Mikael Abrahamsson @ 2012-09-07 6:19 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: David Brown, Kyle Brantley, linux-raid
On Thu, 6 Sep 2012, H. Peter Anvin wrote:
> This is one advantage with superblock 1.0 -- the alignment of the
> underlying device is preserved since the metadata is at the end.
This whole problem becomes quite complicated when one starts to take into
account the whole block device stack, for instance
drives->md->crypto->lvm->fs, when one wants to get the filesystem to
understand where an md complete RAID6 stride (?) starts and ends,
especially when this changes over time when drives are being added at a
later date, or the LV being moved around, even when the filesystem is
mounted.
I don't know how hard it would be to add infrastructure for an fs to find
out this underlying information through all these layers, and if there is
interest in doing so? I have just gotten accustomed to seeing
approximately 10% reads when I do file writing to my RAID6. I tried to
align it from the beginning, but now I just try to make sure it's 4k
aligned (which is not hard, since I don't use partitions and cryptsetup,
md and lvm are all 4k aligned already). I suspect that xfs doesn't
correctly understand even where my 64k stripe size starts and ends even
though I tried to align it, but it does know my stripe size, but I suspect
that since I have added drives over time, xfs first block isn't aligned
with the start of an even 1:st stripe location (so even though it knows
that a complete stride set is X drives and Y k per stripe, this isn't
correctly aligned to 1:st drive, so a complete stride write from xfs would
span two different md strides).
PS. I am not sure I got the stripe/stride wording correct here, hope it
can still be understood what I mean.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-09-07 6:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-19 21:06 Issue with md and 4K sector alignment Kyle Brantley
2012-08-20 5:36 ` Mikael Abrahamsson
2012-08-20 7:28 ` David Brown
2012-09-06 20:34 ` H. Peter Anvin
2012-09-07 6:19 ` Mikael Abrahamsson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.