All of lore.kernel.org
 help / color / mirror / Atom feed
From: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: "Arkadiusz Miśkiewicz" <arekm@maven.pl>,
	linux-raid@vger.kernel.org, "xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint
Date: Sat, 21 Dec 2013 13:20:14 +0100	[thread overview]
Message-ID: <20131221122014.GA3909@lazy.lzy> (raw)
In-Reply-To: <52B57912.5080000@hardwarefreak.com>

On Sat, Dec 21, 2013 at 05:18:42AM -0600, Stan Hoeppner wrote:
> I renamed the subject as your question doesn't really apply to XFS, or
> the OP, but to md-RAID.
> 
> On 12/20/2013 4:43 PM, Arkadiusz Miśkiewicz wrote:
> 
> > I wonder why kernel is giving defaults that everyone repeatly recommends to 
> > change/increase? Has anyone tried to bugreport that for stripe_cache_size 
> > case?
> 
> The answer is balancing default md-RAID5/6 write performance against
> kernel RAM consumption, with more weight given to the latter.  The formula:
> 
> ((4096*stripe_cache_size)*num_drives)= RAM consumed for stripe cache
> 
> High stripe_cache_size values will cause the kernel to eat non trivial
> amounts of RAM for the stripe cache buffer.  This table demonstrates the
> effect today for typical RAID5/6 disk counts.
> 
> stripe_cache_size	drives	RAM consumed
> 256			 4	  4 MB
> 			 8	  8 MB
> 			16	 16 MB
> 512			 4	  8 MB
> 			 8	 16 MB
> 			16	 32 MB
> 1024			 4	 16 MB
> 			 8	 32 MB
> 			16	 64 MB
> 2048			 4	 32 MB
> 			 8	 64 MB
> 			16	128 MB
> 4096			 4	 64 MB
> 			 8	128 MB
> 			16	256 MB
> 
> The powers that be, Linus in particular, are not fond of default
> settings that create a lot of kernel memory structures.  The default
> md-RAID5/6 stripe_cache-size yields 1MB consumed per member device.
> 
> With SSDs becoming mainstream, and becoming ever faster, at some point
> the md-RAID5/6 architecture will have to be redesigned because of the
> memory footprint required for performance.  Currently the required size
> of the stripe cache appears directly proportional to the aggregate write
> throughput of the RAID devices.  Thus the optimal value will vary
> greatly from one system to another depending on the throughput of the
> drives.
> 
> For example, I assisted a user with 5x Intel SSDs back in January and
> his system required 4096, or 80MB of RAM for stripe cache, to reach
> maximum write throughput of the devices.  This yielded 600MB/s or 60%
> greater throughput than 2048, or 40MB RAM for cache.  In his case 60MB
> more RAM than the default was well worth the increase as the machine was
> an iSCSI target server with 8GB RAM.
> 
> In the previous case with 5x rust RAID6 the 2048 value seemed optimal
> (though not yet verified), requiring 40MB less RAM than the 5x Intel
> SSDs.  For a 3 modern rust RAID5 the default of 256, or 3MB, is close to
> optimal but maybe a little low.  Consider that 256 has been the default
> for a very long time, and was selected back when average drive
> throughput was much much lower, as in 50MB/s or less, SSDs hadn't yet
> been invented, and system memories were much smaller.
> 
> Due to the massive difference in throughput between rust and SSD, any
> meaningful change in the default really requires new code to sniff out
> what type of devices constitute the array, if that's possible, and it
> probably isn't, and set a lowish default accordingly.  Again, SSDs
> didn't exist when md-RAID was coded, nor when this default was set, and
> this throws a big monkey wrench into these spokes.

Hi Stan,

nice analytical report, as usual...

My dumb suggestion would be to simply use udev to
setup the drives.
Everything, stripe_cache, read_ahead, stcerr, etc.
can be configured, I suppose, by udev rules.

bye,

-- 

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: linux-raid@vger.kernel.org, "xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint
Date: Sat, 21 Dec 2013 13:20:14 +0100	[thread overview]
Message-ID: <20131221122014.GA3909@lazy.lzy> (raw)
In-Reply-To: <52B57912.5080000@hardwarefreak.com>

On Sat, Dec 21, 2013 at 05:18:42AM -0600, Stan Hoeppner wrote:
> I renamed the subject as your question doesn't really apply to XFS, or
> the OP, but to md-RAID.
> 
> On 12/20/2013 4:43 PM, Arkadiusz Miśkiewicz wrote:
> 
> > I wonder why kernel is giving defaults that everyone repeatly recommends to 
> > change/increase? Has anyone tried to bugreport that for stripe_cache_size 
> > case?
> 
> The answer is balancing default md-RAID5/6 write performance against
> kernel RAM consumption, with more weight given to the latter.  The formula:
> 
> ((4096*stripe_cache_size)*num_drives)= RAM consumed for stripe cache
> 
> High stripe_cache_size values will cause the kernel to eat non trivial
> amounts of RAM for the stripe cache buffer.  This table demonstrates the
> effect today for typical RAID5/6 disk counts.
> 
> stripe_cache_size	drives	RAM consumed
> 256			 4	  4 MB
> 			 8	  8 MB
> 			16	 16 MB
> 512			 4	  8 MB
> 			 8	 16 MB
> 			16	 32 MB
> 1024			 4	 16 MB
> 			 8	 32 MB
> 			16	 64 MB
> 2048			 4	 32 MB
> 			 8	 64 MB
> 			16	128 MB
> 4096			 4	 64 MB
> 			 8	128 MB
> 			16	256 MB
> 
> The powers that be, Linus in particular, are not fond of default
> settings that create a lot of kernel memory structures.  The default
> md-RAID5/6 stripe_cache-size yields 1MB consumed per member device.
> 
> With SSDs becoming mainstream, and becoming ever faster, at some point
> the md-RAID5/6 architecture will have to be redesigned because of the
> memory footprint required for performance.  Currently the required size
> of the stripe cache appears directly proportional to the aggregate write
> throughput of the RAID devices.  Thus the optimal value will vary
> greatly from one system to another depending on the throughput of the
> drives.
> 
> For example, I assisted a user with 5x Intel SSDs back in January and
> his system required 4096, or 80MB of RAM for stripe cache, to reach
> maximum write throughput of the devices.  This yielded 600MB/s or 60%
> greater throughput than 2048, or 40MB RAM for cache.  In his case 60MB
> more RAM than the default was well worth the increase as the machine was
> an iSCSI target server with 8GB RAM.
> 
> In the previous case with 5x rust RAID6 the 2048 value seemed optimal
> (though not yet verified), requiring 40MB less RAM than the 5x Intel
> SSDs.  For a 3 modern rust RAID5 the default of 256, or 3MB, is close to
> optimal but maybe a little low.  Consider that 256 has been the default
> for a very long time, and was selected back when average drive
> throughput was much much lower, as in 50MB/s or less, SSDs hadn't yet
> been invented, and system memories were much smaller.
> 
> Due to the massive difference in throughput between rust and SSD, any
> meaningful change in the default really requires new code to sniff out
> what type of devices constitute the array, if that's possible, and it
> probably isn't, and set a lowish default accordingly.  Again, SSDs
> didn't exist when md-RAID was coded, nor when this default was set, and
> this throws a big monkey wrench into these spokes.

Hi Stan,

nice analytical report, as usual...

My dumb suggestion would be to simply use udev to
setup the drives.
Everything, stripe_cache, read_ahead, stcerr, etc.
can be configured, I suppose, by udev rules.

bye,

-- 

piergiorgio

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-12-21 12:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-18  2:05 XFS blocked task in xlog_cil_force_lsn Kevin Richter
2013-12-18  3:38 ` Stan Hoeppner
2013-12-18 10:27   ` Kevin Richter
2013-12-19 14:11     ` Stan Hoeppner
2013-12-20 10:26       ` Kevin Richter
2013-12-20 12:36         ` Stan Hoeppner
2013-12-21  5:30           ` Dave Chinner
2013-12-22  9:18             ` Stan Hoeppner
2013-12-22 20:14               ` Dave Chinner
2013-12-22 21:01               ` Michael L. Semon
2013-12-22  2:35           ` Kevin Richter
2013-12-22  8:12             ` Stan Hoeppner
2013-12-22 14:10               ` Kevin Richter
2013-12-22 17:29                 ` Stan Hoeppner
2013-12-20 22:43         ` Arkadiusz Miśkiewicz
2013-12-21 11:18           ` md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint Stan Hoeppner
2013-12-21 11:18             ` Stan Hoeppner
2013-12-21 12:20             ` Piergiorgio Sartor [this message]
2013-12-21 12:20               ` Piergiorgio Sartor
2013-12-22  1:41             ` Stan Hoeppner
2013-12-26  8:55             ` Christoph Hellwig
2013-12-26  9:24               ` Stan Hoeppner
2013-12-26 22:14                 ` NeilBrown
2013-12-18  8:33 ` XFS blocked task in xlog_cil_force_lsn Stefan Ring
2013-12-18 22:21 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131221122014.GA3909@lazy.lzy \
    --to=piergiorgio.sartor@nexgo.de \
    --cc=arekm@maven.pl \
    --cc=linux-raid@vger.kernel.org \
    --cc=stan@hardwarefreak.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.