From: Joe Williams <jwilliams315@gmail.com>
To: linux-raid@vger.kernel.org
Subject: increasing stripe_cache_size decreases RAID-6 read throughput
Date: Sat, 24 Apr 2010 16:36:20 -0700 [thread overview]
Message-ID: <h2y11f0870e1004241636z1f3e302g913be494ec0aefa5@mail.gmail.com> (raw)
I am new to mdadm, and I just set up an mdadm v3.1.2 RAID-6 of five 2
TB Samsung Spinpoint F3EGs. I created the RAID-6 with the default
parameters, including a 512 KB chunk size. It took about 6 hours to
initialize, then I created an XFS filesystem:
# mkfs.xfs -f -d su=512k,sw=3 -l su=256k -l lazy-count=1 -L raidvol /dev/md0
meta-data=/dev/md0 isize=256 agcount=32, agsize=45776384 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=1464843648, imaxpct=5
= sunit=128 swidth=384 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=64 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Note that 256k is the maximum allowed by mkfs.xfs for the log stripe unit.
Then it was time to optimize the performance. First I ran a benchmark
with the default settings (from a recent Arch linux install) for the
following parameters:
# cat /sys/block/md0/md/stripe_cache_size
256
# cat /sys/block/md0/queue/read_ahead_kb
3072
# cat /sys/block/sdb/queue/read_ahead_kb
128
# cat /sys/block/md0/queue/scheduler
none
# cat /sys/block/sdb/queue/scheduler
noop deadline [cfq]
# cat /sys/block/md0/queue/nr_requests
128
# cat /sys/block/sdb/queue/nr_requests
128
# cat /sys/block/md0/device/queue_depth
cat: /sys/block/md0/device/queue_depth: No such file or directory
# cat /sys/block/sdb/device/queue_depth
31
# cat /sys/block/md0/queue/max_sectors_kb
127
# cat /sys/block/sdb/queue/max_sectors_kb
512
Note that sdb is one of the 5 drives for the RAID volume, and the
other 4 have the same settings.
First question, is it normal for the md0 scheduler to be "none"? I
cannot change it by writing, eg., "deadline" into the file.
Next question, is it normal for md0 to have no queue_depth setting?
Are there any other parameters that are important to performance that
I should be looking at?
I started the kernel with mem=1024M so that the buffer cache wasn't
too large (this machine has 72G of RAM), and ran an iozone benchmark:
Iozone: Performance Test of File I/O
Version $Revision: 3.338 $
Compiled for 64 bit mode.
Build: linux-AMD64
Auto Mode
Using Minimum Record Size 64 KB
Using Maximum Record Size 16384 KB
File size set to 4194304 KB
Include fsync in write timing
Command line used: iozone -a -y64K -q16M -s4G -e -f iotest -i0 -i1 -i2
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random
random
KB reclen write rewrite read reread read write
4194304 64 133608 114920 191367 191559 7772
14718
4194304 128 142748 113722 165832 161023 14055
20728
4194304 256 127493 108110 165142 175396 24156
23300
4194304 512 136022 112711 171146 165466 36147
25698
4194304 1024 140618 110196 153134 148925 57498
39864
4194304 2048 137110 108872 177201 193416 98759
50106
4194304 4096 138723 113352 130858 129940 78636
64615
4194304 8192 140100 114089 175240 168807 109858
84656
4194304 16384 130633 116475 131867 142958 115147
102795
I was expecting a little faster sequential reads, but 191 MB/s is not
too bad. I'm not sure why it decreases to 130-131 MB/s at larger
record sizes.
But the writes were disappointing. So the first thing I tried tuning
was stripe_cache_size
# echo 16384 > /sys/block/md0/md/stripe_cache_size
I re-ran the iozone benchmark:
random
random
KB reclen write rewrite read reread read write
4194304 64 219206 264113 104751 108118 7240
12372
4194304 128 232713 255337 153990 142872 13209
21979
4194304 256 229446 242155 132753 131009 20858
32286
4194304 512 236389 245713 144280 149283 32024
44119
4194304 1024 234205 243135 141243 141604 53539
70459
4194304 2048 219163 224379 134043 131765 84428
90394
4194304 4096 226037 225588 143682 146620 60171
125360
4194304 8192 214487 231506 135311 140918 78868
156935
4194304 16384 210671 215078 138466 129098 96340
178073
And now the sequential writes are quite satisfactory, but the reads
are low. Next I tried 2560 for stipe_cache_size, since that is
random
random
KB reclen write rewrite read reread read write
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next reply other threads:[~2010-04-24 23:36 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-24 23:36 Joe Williams [this message]
2010-04-24 23:45 ` increasing stripe_cache_size decreases RAID-6 read throughput Joe Williams
2010-04-27 6:41 ` Neil Brown
2010-04-27 17:18 ` Joe Williams
2010-04-27 21:24 ` Neil Brown
2010-04-28 20:40 ` Joe Williams
2010-04-29 4:34 ` Neil Brown
2010-05-04 0:06 ` Joe Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=h2y11f0870e1004241636z1f3e302g913be494ec0aefa5@mail.gmail.com \
--to=jwilliams315@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).