From: Zdenek Kaspar <zkaspar82@gmail.com>
To: linux-raid@vger.kernel.org
Subject: Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
Date: Sat, 31 Dec 2011 00:17:34 +0100 [thread overview]
Message-ID: <4EFE468E.3010007@gmail.com> (raw)
In-Reply-To: <4085BCC9-B901-4D96-B530-4D580A162D20@gmail.com>
Dne 30.12.2011 22:04, Michele Codutti napsal(a):
> Hi all, thanks for the tips I'll reply everyone in one aggregated message:
>> Just a thought, but do you have the "XP mode" jumper removed on all drives?
> Yes.
>
>> Instead of doing a monster sequential write to find my disk speed, I
>> generally find it more useful to add conv=fdatasync to a dd so that
>> the dirty buffers are utilized as they are in most real-world working
>> environments, but I don't get a result until the test is on-disk.
> Done, same results (40 MB/s)
>
>>>> My only suggestion would be to experiment with various partitioning,
>>>
>>>
>>> Poster already said they're not partitioned.
>>
>> Correct. using partitioning allows you to adjust the alignment, so for
>> example if the MD superblock at the front moves the start of the
>> exported MD device out of alignment with the base disks, you could
>> compensate for it by starting your partition on the correct offset.
> Done. I've created one big partition using parted with "-a optimal".
> The partition layout is (fdisk friendly output):
> Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00077f06
>
> Device Boot Start End Blocks Id System
> /dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect
> Redone the test with the "conv=fdatasync" option as above: same results.
>
>> My only suggestion would be to experiment with various partitioning,
>> starting the first partition at 2048s or various points to see if you
>> can find a placement that aligns the partitions properly. I'm sure
>> there's an explanation, but I'm not in the mood to put on my thinking
>> hat to figure it out at the moment. May also be worth using a
>> different superblock version, as 1.2 is 4k from the start of the
>> drives, which might be messing with alignment (although I would expect
>> it on all arrays), worth trying the .9 which goes to the end of the
>> device.
> I've tried all the superblock versions 0, 0.9, 1, 1.1 and 1.2. Same results.
>
>> No, those drives generally DON'T report 4k to the OS, even though they
>> are. If they were, there'd be fewer problems. They lie and say 512b
>> sectors for compatibility.
> Yes they are dirty liars. It's the same also for the EADS series not only for the EARS ones.
>
>> My recommendation would be to look into the stripe-cache settings and check
>> iostat -x 5 output. What is most likely happening is that when writing to
>> the raid5, it's reading some (to calculate parity most likely) and not just
>> writing. iostat will confirm if this is indeed the case.
> Could you explain how I could look into the stripe-cache settings?
> This is one of many similar outputs from iostat -x 5 from the initial rebuilding phase:
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.00 0.00 13.29 0.00 0.00 86.71
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
> sda 6585.60 0.00 4439.20 0.00 44099.20 0.00 19.87 6.14 1.38 1.38 0.00 0.09 39.28
> sdb 6280.40 0.00 4746.60 0.00 44108.00 0.00 18.59 5.20 1.10 1.10 0.00 0.07 35.04
> sdc 0.00 9895.40 0.00 1120.80 0.00 44152.80 78.79 12.03 10.73 0.00 10.73 0.82 92.32
> I also build a RAID6 (with one drive missing): same results.
>
>> There must be some misalignment somewhere :(
> Yes, it's the same behavior.
>
>> Do all drives really report as 4K to the OS - physical_block_size, logical_block_size under
>> /sys/block/sdX/queue/ ??
> No they lie about the block size as you can see also in the fdisk output above.
>
>> NB: how does it perform with partitions starting at sector 2048 (check
>> all disks with fdisk -lu /dev/sdX).
> They perform the same.
>
> Any other suggestion?
>
> I almost forgot: I've also booted OpenSolaris and I've created a zfs pool (aligned with 4k sector) from the same three drives and they perform very well, individually and together. I know that I'm comparing apples and oranges but ... there must be a solution!--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
WTF is the jumper for then ? (on 512B drive)
Does it change somehow:
/sys/block/sdX/queue/physical_block_size
/sys/block/sdX/queue/logical_block_size
/sys/block/sdX/alignment_offset
If osol can handle it (enforcing 4k), it's good sign.. (you used
ashift=12 for the pool, right?)
Z.
next prev parent reply other threads:[~2011-12-30 23:17 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-29 23:28 RAID5 alignment issues with 4K/AF drives (WD green ones) Michele Codutti
2011-12-30 2:00 ` Zdenek Kaspar
2011-12-30 4:48 ` Marcus Sorensen
2011-12-30 4:52 ` Mikael Abrahamsson
2011-12-30 5:45 ` Marcus Sorensen
2011-12-30 6:09 ` Marcus Sorensen
2011-12-31 3:12 ` Mikael Abrahamsson
2011-12-30 6:24 ` Brad Campbell
2011-12-30 21:04 ` Michele Codutti
2011-12-30 23:17 ` Zdenek Kaspar [this message]
2011-12-31 22:20 ` Marcus Sorensen
2011-12-31 15:53 ` John Robinson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EFE468E.3010007@gmail.com \
--to=zkaspar82@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.