From: Stan Hoeppner <stan@hardwarefreak.com>
To: "Arkadiusz Miśkiewicz" <arekm@maven.pl>
Cc: linux-raid@vger.kernel.org, "xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint
Date: Sat, 21 Dec 2013 05:18:42 -0600 [thread overview]
Message-ID: <52B57912.5080000@hardwarefreak.com> (raw)
In-Reply-To: <201312202343.47895.arekm@maven.pl>
I renamed the subject as your question doesn't really apply to XFS, or
the OP, but to md-RAID.
On 12/20/2013 4:43 PM, Arkadiusz Miśkiewicz wrote:
> I wonder why kernel is giving defaults that everyone repeatly recommends to
> change/increase? Has anyone tried to bugreport that for stripe_cache_size
> case?
The answer is balancing default md-RAID5/6 write performance against
kernel RAM consumption, with more weight given to the latter. The formula:
((4096*stripe_cache_size)*num_drives)= RAM consumed for stripe cache
High stripe_cache_size values will cause the kernel to eat non trivial
amounts of RAM for the stripe cache buffer. This table demonstrates the
effect today for typical RAID5/6 disk counts.
stripe_cache_size drives RAM consumed
256 4 4 MB
8 8 MB
16 16 MB
512 4 8 MB
8 16 MB
16 32 MB
1024 4 16 MB
8 32 MB
16 64 MB
2048 4 32 MB
8 64 MB
16 128 MB
4096 4 64 MB
8 128 MB
16 256 MB
The powers that be, Linus in particular, are not fond of default
settings that create a lot of kernel memory structures. The default
md-RAID5/6 stripe_cache-size yields 1MB consumed per member device.
With SSDs becoming mainstream, and becoming ever faster, at some point
the md-RAID5/6 architecture will have to be redesigned because of the
memory footprint required for performance. Currently the required size
of the stripe cache appears directly proportional to the aggregate write
throughput of the RAID devices. Thus the optimal value will vary
greatly from one system to another depending on the throughput of the
drives.
For example, I assisted a user with 5x Intel SSDs back in January and
his system required 4096, or 80MB of RAM for stripe cache, to reach
maximum write throughput of the devices. This yielded 600MB/s or 60%
greater throughput than 2048, or 40MB RAM for cache. In his case 60MB
more RAM than the default was well worth the increase as the machine was
an iSCSI target server with 8GB RAM.
In the previous case with 5x rust RAID6 the 2048 value seemed optimal
(though not yet verified), requiring 40MB less RAM than the 5x Intel
SSDs. For a 3 modern rust RAID5 the default of 256, or 3MB, is close to
optimal but maybe a little low. Consider that 256 has been the default
for a very long time, and was selected back when average drive
throughput was much much lower, as in 50MB/s or less, SSDs hadn't yet
been invented, and system memories were much smaller.
Due to the massive difference in throughput between rust and SSD, any
meaningful change in the default really requires new code to sniff out
what type of devices constitute the array, if that's possible, and it
probably isn't, and set a lowish default accordingly. Again, SSDs
didn't exist when md-RAID was coded, nor when this default was set, and
this throws a big monkey wrench into these spokes.
--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-12-21 11:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <52B102FF.8040404@pzystorm.de>
[not found] ` <52B2FE9E.50307@hardwarefreak.com>
[not found] ` <52B41B67.9030308@pzystorm.de>
2013-12-20 22:43 ` XFS blocked task in xlog_cil_force_lsn Arkadiusz Miśkiewicz
2013-12-21 11:18 ` Stan Hoeppner [this message]
2013-12-21 12:20 ` md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint Piergiorgio Sartor
2013-12-22 1:41 ` Stan Hoeppner
2013-12-26 8:55 ` Christoph Hellwig
2013-12-26 9:24 ` Stan Hoeppner
2013-12-26 22:14 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52B57912.5080000@hardwarefreak.com \
--to=stan@hardwarefreak.com \
--cc=arekm@maven.pl \
--cc=linux-raid@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).