linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Horrible mirror write performance, alignment?
@ 2010-04-28  6:41 Tracy Reed
  2010-04-28 17:49 ` Michael Evans
  2010-04-29  4:09 ` Neil Brown
  0 siblings, 2 replies; 3+ messages in thread
From: Tracy Reed @ 2010-04-28  6:41 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3210 bytes --]


Anyone know why my mirror devices would be doing (apparently)
unaligned writes to my ethernet SAN causing horrible performance and
massive seeking and lots of reading? BUT writing to the device
directly is very fast and no extra reads? I am measuring the
reads/writes on the SAN device in iostat as it is a Linux box.

I am running the 2.6.18-164.11.1.el5xen xen/kernel which came with
CentOS 5.4

After spending a lot of time banging my head on this I seem to have
finally tracked it down to mirroring.  I never would have thought it
would be this but it is extremely reproduceable. We're talking a
difference of 4-5x in write speed.  Reads are equally fast everywhere.

I am using AoE v72 kernel module (initiator) on a Dell R610's to talk
to vblade-19 (target) on Dell R710's all running CentOS 5.4. I have
striped two 7200 RPM SATA disks and exported the md with AoE (although
I have done these tests with individual disks also). Read performance
from a raw device is excellent:

# dd of=/dev/null if=/dev/xvdg1 bs=4096 count=3000000
3000000+0 records in
3000000+0 records out
12288000000 bytes (12 GB) copied, 106.749 seconds, 115 MB/s

or from a mirror:

# dd if=foo of=/dev/null bs=4096
1073916+0 records in
1073916+0 records out
4398759936 bytes (4.4 GB) copied, 37.7441 seconds, 117 MB/s

foo is a 4.4G file I created in the filesystem.

I always dropped the cache with:

echo 1 > /proc/sys/vm/drop_caches

on both target and initiator before starting the test. This is great
for just a single gig-e link. This suggests that the network/SAN is
fine.

And iostat shows only writes and no reads happening.

However, write performance to a mirror is odious. Typically around
20MB/s.  

# dd if=/dev/zero of=foo bs=4096 count=3000000
1724073+0 records in
1724073+0 records out
7061803008 bytes (7.1 GB) copied, 324.606 seconds, 21.8 MB/s

It should be more like 70MB/s per disk or better (7200rpm SATA) and
max out my gig-e with write performance similar to the above read
performance. I mentioned above that I suspect these are somewhow
unaligned writes because when running iostat on the target machine I
can see lots of reads happening which are surely causing seeks and
killing performance. Typical is something like 8MB/s of reads while
doing 16MB/s of writes.

I have tried manually aligning the disk by setting the beginning of
data on the partition from 63 to 64 (although I don't think this
should matter for a mirror as much as a stripe or raid5, right?)  and I have
tried changing the disk geometry to account for the extra partition
table which causes a half-block page-cache misalignment as described
by the ever insightful Kelsey Hudson in his writeup on the issue here:

http://copilotco.com/Virtualization/wiki/aoe-caching-alignment.pdf/at_download/file

All to no avail. It remains that whenever I write to the mirrored
disks performance is terrible but when I write to each individual
block device with a filesystem on it performance is good. It seems to
point to some sort of problem with the mirroring. 

Any ideas or suggestions would be very appreciated very much.

-- 
Tracy Reed
http://tracyreed.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-04-29  4:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-28  6:41 Horrible mirror write performance, alignment? Tracy Reed
2010-04-28 17:49 ` Michael Evans
2010-04-29  4:09 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).