linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: XFS blocked task in xlog_cil_force_lsn
       [not found]   ` <52B41B67.9030308@pzystorm.de>
@ 2013-12-20 22:43     ` Arkadiusz Miśkiewicz
  2013-12-21 11:18       ` md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint Stan Hoeppner
  0 siblings, 1 reply; 7+ messages in thread
From: Arkadiusz Miśkiewicz @ 2013-12-20 22:43 UTC (permalink / raw)
  To: xfs, xfs; +Cc: linux-raid, stan

On Friday 20 of December 2013, Kevin Richter wrote:

> >> $ cat /sys/block/md2/md/stripe_cache_size
> >> 256
> > 
> > 256 is the default and it is way too low.  This is limiting your write
> > througput.  Increase this to a minimum of 1024 which will give you a
> > 20MB stripe cache buffer.  This should become active immediately.  Add
> > it to a startup script to make it permanent.

> > $ echo 256 > /sys/block/md2/md/stripe_cache_size
> > $ time cp -a /olddisk/testfolder /6tb/foo1/
> > real    25m38.925s
> > user    0m0.595s
> > sys     1m23.182s
> > 
> > $ echo 1024 > /sys/block/md2/md/stripe_cache_size
> > $ time cp -a /olddisk/testfolder /raid/foo2/
> > real    7m32.824s
> > user    0m0.438s
> > sys     1m6.759s
> > 
> > $ echo 2048 > /sys/block/md2/md/stripe_cache_size
> > $ time cp -a /olddisk/testfolder /raid/foo3/
> > real    5m32.847s
> > user    0m0.418s
> > sys     1m5.671s
> > 
> > $ echo 4096 > /sys/block/md2/md/stripe_cache_size
> > $ time cp -a /olddisk/testfolder /raid/foo4/
> > real    5m54.554s
> > user    0m0.437s
> > sys     1m6.268s
> 
> The difference is really amazing! So 2048 seems to be the best choice.
> 60GB in 5,5minutes are 180MB/sek. That sounds a bit high, doesnt it?
> The RAID only consist of 5 SATA disks with 7200rpm.

I wonder why kernel is giving defaults that everyone repeatly recommends to 
change/increase? Has anyone tried to bugreport that for stripe_cache_size 
case?

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint
  2013-12-20 22:43     ` XFS blocked task in xlog_cil_force_lsn Arkadiusz Miśkiewicz
@ 2013-12-21 11:18       ` Stan Hoeppner
  2013-12-21 12:20         ` Piergiorgio Sartor
                           ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Stan Hoeppner @ 2013-12-21 11:18 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: linux-raid, xfs@oss.sgi.com

I renamed the subject as your question doesn't really apply to XFS, or
the OP, but to md-RAID.

On 12/20/2013 4:43 PM, Arkadiusz Miśkiewicz wrote:

> I wonder why kernel is giving defaults that everyone repeatly recommends to 
> change/increase? Has anyone tried to bugreport that for stripe_cache_size 
> case?

The answer is balancing default md-RAID5/6 write performance against
kernel RAM consumption, with more weight given to the latter.  The formula:

((4096*stripe_cache_size)*num_drives)= RAM consumed for stripe cache

High stripe_cache_size values will cause the kernel to eat non trivial
amounts of RAM for the stripe cache buffer.  This table demonstrates the
effect today for typical RAID5/6 disk counts.

stripe_cache_size	drives	RAM consumed
256			 4	  4 MB
			 8	  8 MB
			16	 16 MB
512			 4	  8 MB
			 8	 16 MB
			16	 32 MB
1024			 4	 16 MB
			 8	 32 MB
			16	 64 MB
2048			 4	 32 MB
			 8	 64 MB
			16	128 MB
4096			 4	 64 MB
			 8	128 MB
			16	256 MB

The powers that be, Linus in particular, are not fond of default
settings that create a lot of kernel memory structures.  The default
md-RAID5/6 stripe_cache-size yields 1MB consumed per member device.

With SSDs becoming mainstream, and becoming ever faster, at some point
the md-RAID5/6 architecture will have to be redesigned because of the
memory footprint required for performance.  Currently the required size
of the stripe cache appears directly proportional to the aggregate write
throughput of the RAID devices.  Thus the optimal value will vary
greatly from one system to another depending on the throughput of the
drives.

For example, I assisted a user with 5x Intel SSDs back in January and
his system required 4096, or 80MB of RAM for stripe cache, to reach
maximum write throughput of the devices.  This yielded 600MB/s or 60%
greater throughput than 2048, or 40MB RAM for cache.  In his case 60MB
more RAM than the default was well worth the increase as the machine was
an iSCSI target server with 8GB RAM.

In the previous case with 5x rust RAID6 the 2048 value seemed optimal
(though not yet verified), requiring 40MB less RAM than the 5x Intel
SSDs.  For a 3 modern rust RAID5 the default of 256, or 3MB, is close to
optimal but maybe a little low.  Consider that 256 has been the default
for a very long time, and was selected back when average drive
throughput was much much lower, as in 50MB/s or less, SSDs hadn't yet
been invented, and system memories were much smaller.

Due to the massive difference in throughput between rust and SSD, any
meaningful change in the default really requires new code to sniff out
what type of devices constitute the array, if that's possible, and it
probably isn't, and set a lowish default accordingly.  Again, SSDs
didn't exist when md-RAID was coded, nor when this default was set, and
this throws a big monkey wrench into these spokes.

-- 
Stan

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint
  2013-12-21 11:18       ` md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint Stan Hoeppner
@ 2013-12-21 12:20         ` Piergiorgio Sartor
  2013-12-22  1:41         ` Stan Hoeppner
  2013-12-26  8:55         ` Christoph Hellwig
  2 siblings, 0 replies; 7+ messages in thread
From: Piergiorgio Sartor @ 2013-12-21 12:20 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Arkadiusz Miśkiewicz, linux-raid, xfs@oss.sgi.com

On Sat, Dec 21, 2013 at 05:18:42AM -0600, Stan Hoeppner wrote:
> I renamed the subject as your question doesn't really apply to XFS, or
> the OP, but to md-RAID.
> 
> On 12/20/2013 4:43 PM, Arkadiusz Miśkiewicz wrote:
> 
> > I wonder why kernel is giving defaults that everyone repeatly recommends to 
> > change/increase? Has anyone tried to bugreport that for stripe_cache_size 
> > case?
> 
> The answer is balancing default md-RAID5/6 write performance against
> kernel RAM consumption, with more weight given to the latter.  The formula:
> 
> ((4096*stripe_cache_size)*num_drives)= RAM consumed for stripe cache
> 
> High stripe_cache_size values will cause the kernel to eat non trivial
> amounts of RAM for the stripe cache buffer.  This table demonstrates the
> effect today for typical RAID5/6 disk counts.
> 
> stripe_cache_size	drives	RAM consumed
> 256			 4	  4 MB
> 			 8	  8 MB
> 			16	 16 MB
> 512			 4	  8 MB
> 			 8	 16 MB
> 			16	 32 MB
> 1024			 4	 16 MB
> 			 8	 32 MB
> 			16	 64 MB
> 2048			 4	 32 MB
> 			 8	 64 MB
> 			16	128 MB
> 4096			 4	 64 MB
> 			 8	128 MB
> 			16	256 MB
> 
> The powers that be, Linus in particular, are not fond of default
> settings that create a lot of kernel memory structures.  The default
> md-RAID5/6 stripe_cache-size yields 1MB consumed per member device.
> 
> With SSDs becoming mainstream, and becoming ever faster, at some point
> the md-RAID5/6 architecture will have to be redesigned because of the
> memory footprint required for performance.  Currently the required size
> of the stripe cache appears directly proportional to the aggregate write
> throughput of the RAID devices.  Thus the optimal value will vary
> greatly from one system to another depending on the throughput of the
> drives.
> 
> For example, I assisted a user with 5x Intel SSDs back in January and
> his system required 4096, or 80MB of RAM for stripe cache, to reach
> maximum write throughput of the devices.  This yielded 600MB/s or 60%
> greater throughput than 2048, or 40MB RAM for cache.  In his case 60MB
> more RAM than the default was well worth the increase as the machine was
> an iSCSI target server with 8GB RAM.
> 
> In the previous case with 5x rust RAID6 the 2048 value seemed optimal
> (though not yet verified), requiring 40MB less RAM than the 5x Intel
> SSDs.  For a 3 modern rust RAID5 the default of 256, or 3MB, is close to
> optimal but maybe a little low.  Consider that 256 has been the default
> for a very long time, and was selected back when average drive
> throughput was much much lower, as in 50MB/s or less, SSDs hadn't yet
> been invented, and system memories were much smaller.
> 
> Due to the massive difference in throughput between rust and SSD, any
> meaningful change in the default really requires new code to sniff out
> what type of devices constitute the array, if that's possible, and it
> probably isn't, and set a lowish default accordingly.  Again, SSDs
> didn't exist when md-RAID was coded, nor when this default was set, and
> this throws a big monkey wrench into these spokes.

Hi Stan,

nice analytical report, as usual...

My dumb suggestion would be to simply use udev to
setup the drives.
Everything, stripe_cache, read_ahead, stcerr, etc.
can be configured, I suppose, by udev rules.

bye,

-- 

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint
  2013-12-21 11:18       ` md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint Stan Hoeppner
  2013-12-21 12:20         ` Piergiorgio Sartor
@ 2013-12-22  1:41         ` Stan Hoeppner
  2013-12-26  8:55         ` Christoph Hellwig
  2 siblings, 0 replies; 7+ messages in thread
From: Stan Hoeppner @ 2013-12-22  1:41 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: linux-raid, xfs@oss.sgi.com

On 12/21/2013 5:18 AM, Stan Hoeppner wrote:
...
> For example, I assisted a user with 5x Intel SSDs back in January and
> his system required 4096, or 80MB of RAM for stripe cache, to reach
> maximum write throughput of the devices.  This yielded 600MB/s or 60%
> greater throughput than 2048, or 40MB RAM for cache.  In his case 60MB
> more RAM than the default was well worth the increase as the machine was
> an iSCSI target server with 8GB RAM.

Correction here.  I said above that 80MB was 60MB greater than the
default for his 5 drives.  This should have said 75MB greater than the
default which is 1MB per member device, or 5MB for 5 drives.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint
  2013-12-21 11:18       ` md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint Stan Hoeppner
  2013-12-21 12:20         ` Piergiorgio Sartor
  2013-12-22  1:41         ` Stan Hoeppner
@ 2013-12-26  8:55         ` Christoph Hellwig
  2013-12-26  9:24           ` Stan Hoeppner
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2013-12-26  8:55 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: linux-raid, xfs@oss.sgi.com

On Sat, Dec 21, 2013 at 05:18:42AM -0600, Stan Hoeppner wrote:
> The powers that be, Linus in particular, are not fond of default
> settings that create a lot of kernel memory structures.  The default
> md-RAID5/6 stripe_cache-size yields 1MB consumed per member device.

The default sizing is stupid as it basically makes RAID unusable out
of the box, I always have to fix that up, as well as a somewhat
reasonable chunk size for parity RAID to make it usable.  I'm also
pretty sure I complained about it at least once a while ago, but never
got a reply.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint
  2013-12-26  8:55         ` Christoph Hellwig
@ 2013-12-26  9:24           ` Stan Hoeppner
  2013-12-26 22:14             ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Stan Hoeppner @ 2013-12-26  9:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-raid, xfs@oss.sgi.com

On 12/26/2013 2:55 AM, Christoph Hellwig wrote:
> On Sat, Dec 21, 2013 at 05:18:42AM -0600, Stan Hoeppner wrote:
>> The powers that be, Linus in particular, are not fond of default
>> settings that create a lot of kernel memory structures.  The default
>> md-RAID5/6 stripe_cache-size yields 1MB consumed per member device.
> 
> The default sizing is stupid as it basically makes RAID unusable out
> of the box, I always have to fix that up, as well as a somewhat
> reasonable chunk size for parity RAID to make it usable.  I'm also
> pretty sure I complained about it at least once a while ago, but never
> got a reply.

IIRC you Dave C. and myself all voiced criticism after the default chunk
size was changed from 64KB to 512KB.  I guess we didn't make a strong
enough case to have it reduced, or maybe didn't use the right approach.

Maybe Neil is waiting for patches to be submitted for changing these
defaults, and to argue the merits in that context instead of pure
discussion?  Dunno.  Just guessing.  Maybe he'll read this and jump in.

-- 
Stan


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint
  2013-12-26  9:24           ` Stan Hoeppner
@ 2013-12-26 22:14             ` NeilBrown
  0 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2013-12-26 22:14 UTC (permalink / raw)
  To: stan; +Cc: Christoph Hellwig, linux-raid, xfs@oss.sgi.com


[-- Attachment #1.1: Type: text/plain, Size: 1219 bytes --]

On Thu, 26 Dec 2013 03:24:00 -0600 Stan Hoeppner <stan@hardwarefreak.com>
wrote:

> On 12/26/2013 2:55 AM, Christoph Hellwig wrote:
> > On Sat, Dec 21, 2013 at 05:18:42AM -0600, Stan Hoeppner wrote:
> >> The powers that be, Linus in particular, are not fond of default
> >> settings that create a lot of kernel memory structures.  The default
> >> md-RAID5/6 stripe_cache-size yields 1MB consumed per member device.
> > 
> > The default sizing is stupid as it basically makes RAID unusable out
> > of the box, I always have to fix that up, as well as a somewhat
> > reasonable chunk size for parity RAID to make it usable.  I'm also
> > pretty sure I complained about it at least once a while ago, but never
> > got a reply.
> 
> IIRC you Dave C. and myself all voiced criticism after the default chunk
> size was changed from 64KB to 512KB.  I guess we didn't make a strong
> enough case to have it reduced, or maybe didn't use the right approach.
> 
> Maybe Neil is waiting for patches to be submitted for changing these
> defaults, and to argue the merits in that context instead of pure
> discussion?  Dunno.  Just guessing.  Maybe he'll read this and jump in.
> 

Good guess.

NeilBrown

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-12-26 22:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <52B102FF.8040404@pzystorm.de>
     [not found] ` <52B2FE9E.50307@hardwarefreak.com>
     [not found]   ` <52B41B67.9030308@pzystorm.de>
2013-12-20 22:43     ` XFS blocked task in xlog_cil_force_lsn Arkadiusz Miśkiewicz
2013-12-21 11:18       ` md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint Stan Hoeppner
2013-12-21 12:20         ` Piergiorgio Sartor
2013-12-22  1:41         ` Stan Hoeppner
2013-12-26  8:55         ` Christoph Hellwig
2013-12-26  9:24           ` Stan Hoeppner
2013-12-26 22:14             ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).