From: Robert Petkus <rpetkus@bnl.gov>
To: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: xfs@oss.sgi.com
Subject: Re: Poor performance -- poor config?
Date: Wed, 20 Jun 2007 17:16:41 -0400 [thread overview]
Message-ID: <46799939.2080503@bnl.gov> (raw)
In-Reply-To: <Pine.LNX.4.64.0706201703310.27484@p34.internal.lan>
Justin Piszcz wrote:
>
>
> On Wed, 20 Jun 2007, Robert Petkus wrote:
>
>> Folks,
>> I'm trying to configure a system (server + DS4700 disk array) that
>> can offer the highest performance for our application. We will be
>> reading and writing multiple threads of 1-2GB files with 1MB block
>> sizes.
>> DS4700 config:
>> (16) 500 GB SATA disks
>> (3) 4+1 RAID 5 arrays and (1) hot spare == (3) 2TB LUNs.
>> (2) RAID arrays are on controller A, (1) RAID array is on controller B.
>> 512k segment size
>>
>> Server Config:
>> IBM x3550, 9GB RAM, RHEL 5 x86_64 (2.6.18)
>> The (3) LUNs are sdb, sdc {both controller A}, sdd {controller B}
>>
>> My original goal was to use XFS and create a highly optimized
>> config. Here is what I came up with:
>> Create separate partitions for XFS log files: sdd1, sdd2, sdd3 each
>> 150M -- 128MB is the maximum allowable XFS log size.
>> The XFS "stripe unit" (su) = 512k to match the DS4700 segment size
>> The "stripe width" ( (n-1)*sunit )= swidth=2048k = sw=4 (a multiple
>> of su)
>> 4k is the max block size allowable on x86_64 since 4k is the max
>> kernel page size
>>
>> [root@~]# mkfs.xfs -l logdev=/dev/sdd1,size=128m -d su=512k -d sw=4
>> -f /dev/sdb
>> [root@~]# mount -t xfs -o
>> context=system_u:object_r:unconfined_t,noatime,nodiratime,logbufs=8,logdev=/dev/sdd1
>> /dev/sdb /data0
>>
>> And the write performance is lousy compared to ext3 built like so:
>> [root@~]# mke2fs -j -m 1 -b4096 -E stride=128 /dev/sdc
>> [root@~]# mount -t ext3 -o
>> noatime,nodiratime,context="system_u:object_r:unconfined_t:s0",reservation
>> /dev/sdc /data1
>>
>> What am I missing?
>>
>> Thanks!
>>
>> --
>> Robert Petkus
>> RHIC/USATLAS Computing Facility
>> Brookhaven National Laboratory
>> Physics Dept. - Bldg. 510A
>> Upton, New York 11973
>>
>> http://www.bnl.gov/RHIC
>> http://www.acf.bnl.gov
>>
>>
>
> What speeds are you getting?
dd if=/dev/zero of=/data0/bigfile bs=1024k count=5000
5242880000 bytes (5.2 GB) copied, 149.296 seconds, 35.1 MB/s
dd if=/data0/bigfile of=/dev/null bs=1024k count=5000
5242880000 bytes (5.2 GB) copied, 26.3148 seconds, 199 MB/s
iozone.linux -w -r 1m -s 1g -i0 -t 4 -e -w -f /data0/test1
Children see throughput for 4 initial writers = 28528.59 KB/sec
Parent sees throughput for 4 initial writers = 25212.79 KB/sec
Min throughput per process = 6259.05 KB/sec
Max throughput per process = 7548.29 KB/sec
Avg throughput per process = 7132.15 KB/sec
iozone.linux -w -r 1m -s 1g -i1 -t 4 -e -w -f /data0/test1
Children see throughput for 4 readers = 3059690.19 KB/sec
Parent sees throughput for 4 readers = 3055307.71 KB/sec
Min throughput per process = 757151.81 KB/sec
Max throughput per process = 776032.62 KB/sec
Avg throughput per process = 764922.55 KB/sec
>
> Have you tried a SW RAID with the 16 drives, if you do that, XFS will
> auto-optimize per the physical characteristics of the md array.
No because this would waste an expensive disk array. I've done this
with various JBODs, even a SUN Thumper, with OK results...
>
> Also, most of those mount options besides the logdev/noatime don't do
> much with XFS from my personal benchmarks, you're better off with the
> defaults+noatime.
The security context stuff is in there since I run a strict SELinux
policy. Otherwise, I need logdev since it's on a different disk. BTW,
the same filesystem w/out a separate log disk made no difference in
performance.
>
> What speed are you getting reads/writes, what do you expect? How are
> the drives attached/what type of controller? PCI?
I can get ~3x write performance with ext3. I have a dual-port FC-4 PCIe
HBA connected to (2) IBM DS4700 FC-4 controllers. There is lots of
headroom.
--
Robert Petkus
RHIC/USATLAS Computing Facility
Brookhaven National Laboratory
Physics Dept. - Bldg. 510A
Upton, New York 11973
http://www.bnl.gov/RHIC
http://www.acf.bnl.gov
next prev parent reply other threads:[~2007-06-20 21:16 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-20 20:59 Poor performance -- poor config? Robert Petkus
2007-06-20 21:04 ` Justin Piszcz
2007-06-20 21:16 ` Robert Petkus [this message]
2007-06-20 21:23 ` Justin Piszcz
2007-06-21 6:37 ` Sebastian Brings
2007-06-21 23:59 ` David Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46799939.2080503@bnl.gov \
--to=rpetkus@bnl.gov \
--cc=jpiszcz@lucidpixels.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.