RAID5 alignment issues with 4K/AF drives (WD green ones)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID5 alignment issues with 4K/AF drives (WD green ones)
@ 2011-12-29 23:28 Michele Codutti
  2011-12-30  2:00 ` Zdenek Kaspar
  2011-12-30  6:24 ` Brad Campbell
  0 siblings, 2 replies; 12+ messages in thread
From: Michele Codutti @ 2011-12-29 23:28 UTC (permalink / raw)
  To: linux-raid

Hi all, I'm writing to this mailing list because I cannot figure out why I had some performance issues with my three WD20EARS (2TB Western Digital "Green" 4K/AF drive).
These drives has a (sequential) write throughput around 100MB/s. When I combine them in a RAID0 configuration the throughput is around 300 MB/s and in a RAID1 configuration they preserve a single drive performance of 100MB/s.
When I combine all three drives in a RAID5 configuration the (individual) performance falls around 40MB/s.
I get the same performance level when I do individual misaligned writes (ex: dd if=/dev/zero bs=6K of=/dev/sda).
The drives are not partitioned. I'm using the default chunk size (512K) and the default metadata superblock version (1.2).
I had not formatted the RAID or any single drive during my test i had directly used the raw devices.
I'm using a 11.10 ubuntu with 3.0.0 linux kernel and 3.1.4 mdadm.
The hardware is a HP microserver.

Could you give me some advice?
Thanks in advance.

Michele

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
  2011-12-29 23:28 RAID5 alignment issues with 4K/AF drives (WD green ones) Michele Codutti
@ 2011-12-30  2:00 ` Zdenek Kaspar
  2011-12-30  4:48   ` Marcus Sorensen
  2011-12-30  6:24 ` Brad Campbell
  1 sibling, 1 reply; 12+ messages in thread
From: Zdenek Kaspar @ 2011-12-30  2:00 UTC (permalink / raw)
  To: linux-raid

Dne 30.12.2011 0:28, Michele Codutti napsal(a):
> Hi all, I'm writing to this mailing list because I cannot figure out why I had some performance issues with my three WD20EARS (2TB Western Digital "Green" 4K/AF drive).
> These drives has a (sequential) write throughput around 100MB/s. When I combine them in a RAID0 configuration the throughput is around 300 MB/s and in a RAID1 configuration they preserve a single drive performance of 100MB/s.
> When I combine all three drives in a RAID5 configuration the (individual) performance falls around 40MB/s.
> I get the same performance level when I do individual misaligned writes (ex: dd if=/dev/zero bs=6K of=/dev/sda).
> The drives are not partitioned. I'm using the default chunk size (512K) and the default metadata superblock version (1.2).
> I had not formatted the RAID or any single drive during my test i had directly used the raw devices.
> I'm using a 11.10 ubuntu with 3.0.0 linux kernel and 3.1.4 mdadm.
> The hardware is a HP microserver.
> 
> Could you give me some advice?
> Thanks in advance.
> 
> Michele--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

There must be some misalignment somewhere :( Do all drives really report
as 4K to the OS - physical_block_size, logical_block_size under
/sys/block/sdX/queue/ ??

NB: how does it perform with partitions starting at sector 2048 (check
all disks with fdisk -lu /dev/sdX).

HTH, Z.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
  2011-12-30  2:00 ` Zdenek Kaspar
@ 2011-12-30  4:48   ` Marcus Sorensen
  2011-12-30  4:52     ` Mikael Abrahamsson
  0 siblings, 1 reply; 12+ messages in thread
From: Marcus Sorensen @ 2011-12-30  4:48 UTC (permalink / raw)
  To: Zdenek Kaspar; +Cc: linux-raid

No, those drives generally DON'T report 4k to the OS, even though they
are. If they were, there'd be fewer problems. They lie and say 512b
sectors for compatibility.

My only suggestion would be to experiment with various partitioning,
starting the first partition at 2048s or various points to see if you
can find a placement that aligns the partitions properly. I'm sure
there's an explanation, but I'm not in the mood to put on my thinking
hat to figure it out at the moment. May also be worth using a
different superblock version, as 1.2 is 4k from the start of the
drives, which might be messing with alignment (although I would expect
it on all arrays), worth trying the .9 which goes to the end of the
device.

On Thu, Dec 29, 2011 at 7:00 PM, Zdenek Kaspar <zkaspar82@gmail.com> wrote:
> Dne 30.12.2011 0:28, Michele Codutti napsal(a):
>> Hi all, I'm writing to this mailing list because I cannot figure out why I had some performance issues with my three WD20EARS (2TB Western Digital "Green" 4K/AF drive).
>> These drives has a (sequential) write throughput around 100MB/s. When I combine them in a RAID0 configuration the throughput is around 300 MB/s and in a RAID1 configuration they preserve a single drive performance of 100MB/s.
>> When I combine all three drives in a RAID5 configuration the (individual) performance falls around 40MB/s.
>> I get the same performance level when I do individual misaligned writes (ex: dd if=/dev/zero bs=6K of=/dev/sda).
>> The drives are not partitioned. I'm using the default chunk size (512K) and the default metadata superblock version (1.2).
>> I had not formatted the RAID or any single drive during my test i had directly used the raw devices.
>> I'm using a 11.10 ubuntu with 3.0.0 linux kernel and 3.1.4 mdadm.
>> The hardware is a HP microserver.
>>
>> Could you give me some advice?
>> Thanks in advance.
>>
>> Michele--
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> There must be some misalignment somewhere :( Do all drives really report
> as 4K to the OS - physical_block_size, logical_block_size under
> /sys/block/sdX/queue/ ??
>
> NB: how does it perform with partitions starting at sector 2048 (check
> all disks with fdisk -lu /dev/sdX).
>
> HTH, Z.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
  2011-12-30  4:48   ` Marcus Sorensen
@ 2011-12-30  4:52     ` Mikael Abrahamsson
  2011-12-30  5:45       ` Marcus Sorensen
  0 siblings, 1 reply; 12+ messages in thread
From: Mikael Abrahamsson @ 2011-12-30  4:52 UTC (permalink / raw)
  To: Marcus Sorensen; +Cc: Zdenek Kaspar, linux-raid

On Thu, 29 Dec 2011, Marcus Sorensen wrote:

> My only suggestion would be to experiment with various partitioning,

Poster already said they're not partitioned.

> On Thu, Dec 29, 2011 at 7:00 PM, Zdenek Kaspar <zkaspar82@gmail.com> wrote:
>> Dne 30.12.2011 0:28, Michele Codutti napsal(a):
>>> The drives are not partitioned. I'm using the default chunk size (512K) and the default metadata superblock version (1.2).

My recommendation would be to look into the stripe-cache settings and 
check iostat -x 5 output. What is most likely happening is that when 
writing to the raid5, it's reading some (to calculate parity most likely) 
and not just writing. iostat will confirm if this is indeed the case.

Also, using raid5 for 2TB drives or larger is not recommended, use RAID6 
<http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162>.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
  2011-12-30  4:52     ` Mikael Abrahamsson
@ 2011-12-30  5:45       ` Marcus Sorensen
  2011-12-30  6:09         ` Marcus Sorensen
  2011-12-31  3:12         ` Mikael Abrahamsson
  0 siblings, 2 replies; 12+ messages in thread
From: Marcus Sorensen @ 2011-12-30  5:45 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Zdenek Kaspar, linux-raid

On Thu, Dec 29, 2011 at 9:52 PM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
> On Thu, 29 Dec 2011, Marcus Sorensen wrote:
>
>> My only suggestion would be to experiment with various partitioning,
>
>
> Poster already said they're not partitioned.

Correct. using partitioning allows you to adjust the alignment, so for
example if the MD superblock at the front moves the start of the
exported MD device out of alignment with the base disks, you could
compensate for it by starting your partition on the correct offset.


>
>> On Thu, Dec 29, 2011 at 7:00 PM, Zdenek Kaspar <zkaspar82@gmail.com>
>> wrote:
>>>
>>> Dne 30.12.2011 0:28, Michele Codutti napsal(a):
>>>>
>>>> The drives are not partitioned. I'm using the default chunk size (512K)
>>>> and the default metadata superblock version (1.2).
>
>
> My recommendation would be to look into the stripe-cache settings and check
> iostat -x 5 output. What is most likely happening is that when writing to
> the raid5, it's reading some (to calculate parity most likely) and not just
> writing. iostat will confirm if this is indeed the case.
>
> Also, using raid5 for 2TB drives or larger is not recommended, use RAID6
> <http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162>.

If he's writing full stripes he doesn't need to calculate parity by
reading. I'm not sure how the MD layer determines this though, unless
he's adding a sync or o_direct flag to his test he should be writing
full stripes regardless of the blocksize he sets.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
  2011-12-30  5:45       ` Marcus Sorensen
@ 2011-12-30  6:09         ` Marcus Sorensen
  2011-12-31  3:12         ` Mikael Abrahamsson
  1 sibling, 0 replies; 12+ messages in thread
From: Marcus Sorensen @ 2011-12-30  6:09 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Zdenek Kaspar, linux-raid

I think we need more info on his test. If he's running the dd until he
exhausts his writeback to see what the disk speed is, then yes, he'll
run into having to read stripes to calculate parity since he'll be
forced to write 4k blocks synchronously (prior to kernel 3.1, where
his thread will still get to use dirty memory but just be forced to
sleep if the disk can't keep up). I have seen bumping the stripe cache
help significantly in these cases, and in the real world where you're
not writing large full-stripe files.

Instead of doing a monster sequential write to find my disk speed, I
generally find it more useful to add conv=fdatasync to a dd so that
the dirty buffers are utilized as they are in most real-world working
environments, but I don't get a result until the test is on-disk.

On Thu, Dec 29, 2011 at 10:45 PM, Marcus Sorensen <shadowsor@gmail.com> wrote:
> On Thu, Dec 29, 2011 at 9:52 PM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
>> On Thu, 29 Dec 2011, Marcus Sorensen wrote:
>>
>>> My only suggestion would be to experiment with various partitioning,
>>
>>
>> Poster already said they're not partitioned.
>
> Correct. using partitioning allows you to adjust the alignment, so for
> example if the MD superblock at the front moves the start of the
> exported MD device out of alignment with the base disks, you could
> compensate for it by starting your partition on the correct offset.
>
>
>>
>>> On Thu, Dec 29, 2011 at 7:00 PM, Zdenek Kaspar <zkaspar82@gmail.com>
>>> wrote:
>>>>
>>>> Dne 30.12.2011 0:28, Michele Codutti napsal(a):
>>>>>
>>>>> The drives are not partitioned. I'm using the default chunk size (512K)
>>>>> and the default metadata superblock version (1.2).
>>
>>
>> My recommendation would be to look into the stripe-cache settings and check
>> iostat -x 5 output. What is most likely happening is that when writing to
>> the raid5, it's reading some (to calculate parity most likely) and not just
>> writing. iostat will confirm if this is indeed the case.
>>
>> Also, using raid5 for 2TB drives or larger is not recommended, use RAID6
>> <http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162>.
>
> If he's writing full stripes he doesn't need to calculate parity by
> reading. I'm not sure how the MD layer determines this though, unless
> he's adding a sync or o_direct flag to his test he should be writing
> full stripes regardless of the blocksize he sets.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
  2011-12-30  5:45       ` Marcus Sorensen
  2011-12-30  6:09         ` Marcus Sorensen
@ 2011-12-31  3:12         ` Mikael Abrahamsson
  1 sibling, 0 replies; 12+ messages in thread
From: Mikael Abrahamsson @ 2011-12-31  3:12 UTC (permalink / raw)
  To: Marcus Sorensen; +Cc: Zdenek Kaspar, linux-raid

On Thu, 29 Dec 2011, Marcus Sorensen wrote:

>> Poster already said they're not partitioned.
>
> Correct. using partitioning allows you to adjust the alignment, so for
> example if the MD superblock at the front moves the start of the
> exported MD device out of alignment with the base disks, you could
> compensate for it by starting your partition on the correct offset.

Unless he has used the XP jumper, it's impossible to misalign MD if 
running without partitions, afaik.

> If he's writing full stripes he doesn't need to calculate parity by 
> reading. I'm not sure how the MD layer determines this though, unless 
> he's adding a sync or o_direct flag to his test he should be writing 
> full stripes regardless of the blocksize he sets.

I've seen MD do 10% reads in this situation, I believe the handling of 
this is not optimal and sometimes there will be reads. Niel can probably 
tell a lot more about what might be going on.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
  2011-12-29 23:28 RAID5 alignment issues with 4K/AF drives (WD green ones) Michele Codutti
  2011-12-30  2:00 ` Zdenek Kaspar
@ 2011-12-30  6:24 ` Brad Campbell
  2011-12-30 21:04   ` Michele Codutti
  1 sibling, 1 reply; 12+ messages in thread
From: Brad Campbell @ 2011-12-30  6:24 UTC (permalink / raw)
  To: Michele Codutti; +Cc: linux-raid

On 30/12/11 07:28, Michele Codutti wrote:
> Hi all, I'm writing to this mailing list because I cannot figure out why I had some performance issues with my three WD20EARS (2TB Western Digital "Green" 4K/AF drive).
> These drives has a (sequential) write throughput around 100MB/s. When I combine them in a RAID0 configuration the throughput is around 300 MB/s and in a RAID1 configuration they preserve a single drive performance of 100MB/s.
> When I combine all three drives in a RAID5 configuration the (individual) performance falls around 40MB/s.
> I get the same performance level when I do individual misaligned writes (ex: dd if=/dev/zero bs=6K of=/dev/sda).
> The drives are not partitioned. I'm using the default chunk size (512K) and the default metadata superblock version (1.2).
> I had not formatted the RAID or any single drive during my test i had directly used the raw devices.
> I'm using a 11.10 ubuntu with 3.0.0 linux kernel and 3.1.4 mdadm.
> The hardware is a HP microserver.
>
Just a thought, but do you have the "XP mode" jumper removed on all drives?

Regards,
Brad

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
  2011-12-30  6:24 ` Brad Campbell
@ 2011-12-30 21:04   ` Michele Codutti
  2011-12-30 23:17     ` Zdenek Kaspar
  2011-12-31 15:53     ` John Robinson
  0 siblings, 2 replies; 12+ messages in thread
From: Michele Codutti @ 2011-12-30 21:04 UTC (permalink / raw)
  To: linux-raid

Hi all, thanks for the tips I'll reply everyone in one aggregated message:
> Just a thought, but do you have the "XP mode" jumper removed on all drives?
Yes.

> Instead of doing a monster sequential write to find my disk speed, I
> generally find it more useful to add conv=fdatasync to a dd so that
> the dirty buffers are utilized as they are in most real-world working
> environments, but I don't get a result until the test is on-disk.
Done, same results (40 MB/s)

>>> My only suggestion would be to experiment with various partitioning,
>> 
>> 
>> Poster already said they're not partitioned.
> 
> Correct. using partitioning allows you to adjust the alignment, so for
> example if the MD superblock at the front moves the start of the
> exported MD device out of alignment with the base disks, you could
> compensate for it by starting your partition on the correct offset.
Done. I've created one big partition using parted with "-a optimal".
The partition layout is (fdisk friendly output):
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00077f06

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048  3907028991  1953513472   fd  Linux raid autodetect
Redone the test with the "conv=fdatasync" option as above: same results.

> My only suggestion would be to experiment with various partitioning,
> starting the first partition at 2048s or various points to see if you
> can find a placement that aligns the partitions properly. I'm sure
> there's an explanation, but I'm not in the mood to put on my thinking
> hat to figure it out at the moment. May also be worth using a
> different superblock version, as 1.2 is 4k from the start of the
> drives, which might be messing with alignment (although I would expect
> it on all arrays), worth trying the .9 which goes to the end of the
> device.
I've tried all the superblock versions 0, 0.9, 1, 1.1 and 1.2. Same results.

> No, those drives generally DON'T report 4k to the OS, even though they
> are. If they were, there'd be fewer problems. They lie and say 512b
> sectors for compatibility.
Yes they are dirty liars. It's the same also for the EADS series not only for the EARS ones.

> My recommendation would be to look into the stripe-cache settings and check
> iostat -x 5 output. What is most likely happening is that when writing to
> the raid5, it's reading some (to calculate parity most likely) and not just
> writing. iostat will confirm if this is indeed the case.
Could you explain how I could look into the stripe-cache settings?
This is one of many similar outputs from iostat -x 5 from the initial rebuilding phase:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   13.29    0.00    0.00   86.71
Device: rrqm/s  wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda    6585.60    0.00 4439.20    0.00 44099.20     0.00    19.87     6.14  1.38    1.38    0.00  0.09 39.28
sdb    6280.40    0.00 4746.60    0.00 44108.00     0.00    18.59     5.20  1.10    1.10    0.00  0.07 35.04
sdc       0.00 9895.40    0.00 1120.80     0.00 44152.80    78.79    12.03 10.73    0.00   10.73  0.82 92.32
I also build a RAID6 (with one drive missing): same results.

> There must be some misalignment somewhere :(
Yes, it's the same behavior.

> Do all drives really report as 4K to the OS - physical_block_size, logical_block_size under
> /sys/block/sdX/queue/ ??
No they lie about the block size as you can see also in the fdisk output above.

> NB: how does it perform with partitions starting at sector 2048 (check
> all disks with fdisk -lu /dev/sdX).
They perform the same.

Any other suggestion?

I almost forgot: I've also booted OpenSolaris and I've created a zfs pool (aligned with 4k sector) from the same three drives and they perform very well, individually and together. I know that I'm comparing apples and oranges but ... there must be a solution!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
  2011-12-30 21:04   ` Michele Codutti
@ 2011-12-30 23:17     ` Zdenek Kaspar
  2011-12-31 22:20       ` Marcus Sorensen
  2011-12-31 15:53     ` John Robinson
  1 sibling, 1 reply; 12+ messages in thread
From: Zdenek Kaspar @ 2011-12-30 23:17 UTC (permalink / raw)
  To: linux-raid

Dne 30.12.2011 22:04, Michele Codutti napsal(a):
> Hi all, thanks for the tips I'll reply everyone in one aggregated message:
>> Just a thought, but do you have the "XP mode" jumper removed on all drives?
> Yes.
> 
>> Instead of doing a monster sequential write to find my disk speed, I
>> generally find it more useful to add conv=fdatasync to a dd so that
>> the dirty buffers are utilized as they are in most real-world working
>> environments, but I don't get a result until the test is on-disk.
> Done, same results (40 MB/s)
> 
>>>> My only suggestion would be to experiment with various partitioning,
>>>
>>>
>>> Poster already said they're not partitioned.
>>
>> Correct. using partitioning allows you to adjust the alignment, so for
>> example if the MD superblock at the front moves the start of the
>> exported MD device out of alignment with the base disks, you could
>> compensate for it by starting your partition on the correct offset.
> Done. I've created one big partition using parted with "-a optimal".
> The partition layout is (fdisk friendly output):
> Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00077f06
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdc1            2048  3907028991  1953513472   fd  Linux raid autodetect
> Redone the test with the "conv=fdatasync" option as above: same results.
> 
>> My only suggestion would be to experiment with various partitioning,
>> starting the first partition at 2048s or various points to see if you
>> can find a placement that aligns the partitions properly. I'm sure
>> there's an explanation, but I'm not in the mood to put on my thinking
>> hat to figure it out at the moment. May also be worth using a
>> different superblock version, as 1.2 is 4k from the start of the
>> drives, which might be messing with alignment (although I would expect
>> it on all arrays), worth trying the .9 which goes to the end of the
>> device.
> I've tried all the superblock versions 0, 0.9, 1, 1.1 and 1.2. Same results.
> 
>> No, those drives generally DON'T report 4k to the OS, even though they
>> are. If they were, there'd be fewer problems. They lie and say 512b
>> sectors for compatibility.
> Yes they are dirty liars. It's the same also for the EADS series not only for the EARS ones.
> 
>> My recommendation would be to look into the stripe-cache settings and check
>> iostat -x 5 output. What is most likely happening is that when writing to
>> the raid5, it's reading some (to calculate parity most likely) and not just
>> writing. iostat will confirm if this is indeed the case.
> Could you explain how I could look into the stripe-cache settings?
> This is one of many similar outputs from iostat -x 5 from the initial rebuilding phase:
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.00    0.00   13.29    0.00    0.00   86.71
> Device: rrqm/s  wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
> sda    6585.60    0.00 4439.20    0.00 44099.20     0.00    19.87     6.14  1.38    1.38    0.00  0.09 39.28
> sdb    6280.40    0.00 4746.60    0.00 44108.00     0.00    18.59     5.20  1.10    1.10    0.00  0.07 35.04
> sdc       0.00 9895.40    0.00 1120.80     0.00 44152.80    78.79    12.03 10.73    0.00   10.73  0.82 92.32
> I also build a RAID6 (with one drive missing): same results.
> 
>> There must be some misalignment somewhere :(
> Yes, it's the same behavior.
> 
>> Do all drives really report as 4K to the OS - physical_block_size, logical_block_size under
>> /sys/block/sdX/queue/ ??
> No they lie about the block size as you can see also in the fdisk output above.
> 
>> NB: how does it perform with partitions starting at sector 2048 (check
>> all disks with fdisk -lu /dev/sdX).
> They perform the same.
> 
> Any other suggestion?
> 
> I almost forgot: I've also booted OpenSolaris and I've created a zfs pool (aligned with 4k sector) from the same three drives and they perform very well, individually and together. I know that I'm comparing apples and oranges but ... there must be a solution!--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

WTF is the jumper for then ? (on 512B drive)
Does it change somehow:
/sys/block/sdX/queue/physical_block_size
/sys/block/sdX/queue/logical_block_size
/sys/block/sdX/alignment_offset

If osol can handle it (enforcing 4k), it's good sign.. (you used
ashift=12 for the pool, right?)

Z.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
  2011-12-30 23:17     ` Zdenek Kaspar
@ 2011-12-31 22:20       ` Marcus Sorensen
  0 siblings, 0 replies; 12+ messages in thread
From: Marcus Sorensen @ 2011-12-31 22:20 UTC (permalink / raw)
  To: Zdenek Kaspar; +Cc: linux-raid

> WTF is the jumper for then ? (on 512B drive)
> Does it change somehow:
> /sys/block/sdX/queue/physical_block_size
> /sys/block/sdX/queue/logical_block_size
> /sys/block/sdX/alignment_offset

No, it doesn't change what the OS sees at all. Off the top of my head
I think it just changes how the drive maps internally, so an OS that
normally starts the 1st partition on sector 63 will align correctly,
hence XP mode. That might not be exactly what's going on but something
along those lines.

> Could you explain how I could look into the stripe-cache settings?

In /sys/block/md0/md/stripe_cache_size, this allows the system to keep
the contents of recently read stripes in memory, so if they need to be
modified again it doesn't have to read from disk to calculate parity.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
  2011-12-30 21:04   ` Michele Codutti
  2011-12-30 23:17     ` Zdenek Kaspar
@ 2011-12-31 15:53     ` John Robinson
  1 sibling, 0 replies; 12+ messages in thread
From: John Robinson @ 2011-12-31 15:53 UTC (permalink / raw)
  To: Michele Codutti; +Cc: linux-raid

On 30/12/2011 21:04, Michele Codutti wrote:
[...]
> This is one of many similar outputs from iostat -x 5 from the initial rebuilding phase:
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>             0.00    0.00   13.29    0.00    0.00   86.71
> Device: rrqm/s  wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
> sda    6585.60    0.00 4439.20    0.00 44099.20     0.00    19.87     6.14  1.38    1.38    0.00  0.09 39.28
> sdb    6280.40    0.00 4746.60    0.00 44108.00     0.00    18.59     5.20  1.10    1.10    0.00  0.07 35.04
> sdc       0.00 9895.40    0.00 1120.80     0.00 44152.80    78.79    12.03 10.73    0.00   10.73  0.82 92.32
> I also build a RAID6 (with one drive missing): same results.

Hang on, are you saying you see the 40MB/s speeds during the initial 
rebuilding phase? Yes, you will get those results. You are seeing 
degraded mode performance in the RAID5 just as you are in the RAID6 with 
a missing drive. When the array is fully built, which may well take a 
day or two, you can expect better. Check /proc/mdstat for the progress 
of the initial build.

If you happen to know that your array is already in sync (which three 
brand new all-zero drives would be for RAID5), or want to test without 
waiting for a rebuild, you can use --assume-clean when creating the array.

Cheers,

John.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-12-31 22:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-29 23:28 RAID5 alignment issues with 4K/AF drives (WD green ones) Michele Codutti
2011-12-30  2:00 ` Zdenek Kaspar
2011-12-30  4:48   ` Marcus Sorensen
2011-12-30  4:52     ` Mikael Abrahamsson
2011-12-30  5:45       ` Marcus Sorensen
2011-12-30  6:09         ` Marcus Sorensen
2011-12-31  3:12         ` Mikael Abrahamsson
2011-12-30  6:24 ` Brad Campbell
2011-12-30 21:04   ` Michele Codutti
2011-12-30 23:17     ` Zdenek Kaspar
2011-12-31 22:20       ` Marcus Sorensen
2011-12-31 15:53     ` John Robinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).