From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stan Hoeppner Subject: md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint Date: Sat, 21 Dec 2013 05:18:42 -0600 Message-ID: <52B57912.5080000@hardwarefreak.com> References: <52B102FF.8040404@pzystorm.de> <52B2FE9E.50307@hardwarefreak.com> <52B41B67.9030308@pzystorm.de> <201312202343.47895.arekm@maven.pl> Reply-To: stan@hardwarefreak.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <201312202343.47895.arekm@maven.pl> Sender: linux-raid-owner@vger.kernel.org To: =?UTF-8?B?QXJrYWRpdXN6IE1pxZtraWV3aWN6?= Cc: linux-raid@vger.kernel.org, "xfs@oss.sgi.com" List-Id: linux-raid.ids I renamed the subject as your question doesn't really apply to XFS, or the OP, but to md-RAID. On 12/20/2013 4:43 PM, Arkadiusz Mi=C5=9Bkiewicz wrote: > I wonder why kernel is giving defaults that everyone repeatly recomme= nds to=20 > change/increase? Has anyone tried to bugreport that for stripe_cache_= size=20 > case? The answer is balancing default md-RAID5/6 write performance against kernel RAM consumption, with more weight given to the latter. The form= ula: ((4096*stripe_cache_size)*num_drives)=3D RAM consumed for stripe cache High stripe_cache_size values will cause the kernel to eat non trivial amounts of RAM for the stripe cache buffer. This table demonstrates th= e effect today for typical RAID5/6 disk counts. stripe_cache_size drives RAM consumed 256 4 4 MB 8 8 MB 16 16 MB 512 4 8 MB 8 16 MB 16 32 MB 1024 4 16 MB 8 32 MB 16 64 MB 2048 4 32 MB 8 64 MB 16 128 MB 4096 4 64 MB 8 128 MB 16 256 MB The powers that be, Linus in particular, are not fond of default settings that create a lot of kernel memory structures. The default md-RAID5/6 stripe_cache-size yields 1MB consumed per member device. With SSDs becoming mainstream, and becoming ever faster, at some point the md-RAID5/6 architecture will have to be redesigned because of the memory footprint required for performance. Currently the required size of the stripe cache appears directly proportional to the aggregate writ= e throughput of the RAID devices. Thus the optimal value will vary greatly from one system to another depending on the throughput of the drives. =46or example, I assisted a user with 5x Intel SSDs back in January and his system required 4096, or 80MB of RAM for stripe cache, to reach maximum write throughput of the devices. This yielded 600MB/s or 60% greater throughput than 2048, or 40MB RAM for cache. In his case 60MB more RAM than the default was well worth the increase as the machine wa= s an iSCSI target server with 8GB RAM. In the previous case with 5x rust RAID6 the 2048 value seemed optimal (though not yet verified), requiring 40MB less RAM than the 5x Intel SSDs. For a 3 modern rust RAID5 the default of 256, or 3MB, is close t= o optimal but maybe a little low. Consider that 256 has been the default for a very long time, and was selected back when average drive throughput was much much lower, as in 50MB/s or less, SSDs hadn't yet been invented, and system memories were much smaller. Due to the massive difference in throughput between rust and SSD, any meaningful change in the default really requires new code to sniff out what type of devices constitute the array, if that's possible, and it probably isn't, and set a lowish default accordingly. Again, SSDs didn't exist when md-RAID was coded, nor when this default was set, and this throws a big monkey wrench into these spokes. --=20 Stan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (Postfix) with ESMTP id C625F7F3F for ; Sat, 21 Dec 2013 05:18:51 -0600 (CST) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay3.corp.sgi.com (Postfix) with ESMTP id 6D2B3AC00C for ; Sat, 21 Dec 2013 03:18:51 -0800 (PST) Received: from greer.hardwarefreak.com (mo-65-41-216-221.sta.embarqhsd.net [65.41.216.221]) by cuda.sgi.com with ESMTP id oak5PMOgzRWVnqrR for ; Sat, 21 Dec 2013 03:18:47 -0800 (PST) Message-ID: <52B57912.5080000@hardwarefreak.com> Date: Sat, 21 Dec 2013 05:18:42 -0600 From: Stan Hoeppner MIME-Version: 1.0 Subject: md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint References: <52B102FF.8040404@pzystorm.de> <52B2FE9E.50307@hardwarefreak.com> <52B41B67.9030308@pzystorm.de> <201312202343.47895.arekm@maven.pl> In-Reply-To: <201312202343.47895.arekm@maven.pl> Reply-To: stan@hardwarefreak.com List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: =?UTF-8?B?QXJrYWRpdXN6IE1pxZtraWV3aWN6?= Cc: linux-raid@vger.kernel.org, "xfs@oss.sgi.com" SSByZW5hbWVkIHRoZSBzdWJqZWN0IGFzIHlvdXIgcXVlc3Rpb24gZG9lc24ndCByZWFsbHkgYXBw bHkgdG8gWEZTLCBvcgp0aGUgT1AsIGJ1dCB0byBtZC1SQUlELgoKT24gMTIvMjAvMjAxMyA0OjQz IFBNLCBBcmthZGl1c3ogTWnFm2tpZXdpY3ogd3JvdGU6Cgo+IEkgd29uZGVyIHdoeSBrZXJuZWwg aXMgZ2l2aW5nIGRlZmF1bHRzIHRoYXQgZXZlcnlvbmUgcmVwZWF0bHkgcmVjb21tZW5kcyB0byAK PiBjaGFuZ2UvaW5jcmVhc2U/IEhhcyBhbnlvbmUgdHJpZWQgdG8gYnVncmVwb3J0IHRoYXQgZm9y IHN0cmlwZV9jYWNoZV9zaXplIAo+IGNhc2U/CgpUaGUgYW5zd2VyIGlzIGJhbGFuY2luZyBkZWZh dWx0IG1kLVJBSUQ1LzYgd3JpdGUgcGVyZm9ybWFuY2UgYWdhaW5zdAprZXJuZWwgUkFNIGNvbnN1 bXB0aW9uLCB3aXRoIG1vcmUgd2VpZ2h0IGdpdmVuIHRvIHRoZSBsYXR0ZXIuICBUaGUgZm9ybXVs YToKCigoNDA5NipzdHJpcGVfY2FjaGVfc2l6ZSkqbnVtX2RyaXZlcyk9IFJBTSBjb25zdW1lZCBm b3Igc3RyaXBlIGNhY2hlCgpIaWdoIHN0cmlwZV9jYWNoZV9zaXplIHZhbHVlcyB3aWxsIGNhdXNl IHRoZSBrZXJuZWwgdG8gZWF0IG5vbiB0cml2aWFsCmFtb3VudHMgb2YgUkFNIGZvciB0aGUgc3Ry aXBlIGNhY2hlIGJ1ZmZlci4gIFRoaXMgdGFibGUgZGVtb25zdHJhdGVzIHRoZQplZmZlY3QgdG9k YXkgZm9yIHR5cGljYWwgUkFJRDUvNiBkaXNrIGNvdW50cy4KCnN0cmlwZV9jYWNoZV9zaXplCWRy aXZlcwlSQU0gY29uc3VtZWQKMjU2CQkJIDQJICA0IE1CCgkJCSA4CSAgOCBNQgoJCQkxNgkgMTYg TUIKNTEyCQkJIDQJICA4IE1CCgkJCSA4CSAxNiBNQgoJCQkxNgkgMzIgTUIKMTAyNAkJCSA0CSAx NiBNQgoJCQkgOAkgMzIgTUIKCQkJMTYJIDY0IE1CCjIwNDgJCQkgNAkgMzIgTUIKCQkJIDgJIDY0 IE1CCgkJCTE2CTEyOCBNQgo0MDk2CQkJIDQJIDY0IE1CCgkJCSA4CTEyOCBNQgoJCQkxNgkyNTYg TUIKClRoZSBwb3dlcnMgdGhhdCBiZSwgTGludXMgaW4gcGFydGljdWxhciwgYXJlIG5vdCBmb25k IG9mIGRlZmF1bHQKc2V0dGluZ3MgdGhhdCBjcmVhdGUgYSBsb3Qgb2Yga2VybmVsIG1lbW9yeSBz dHJ1Y3R1cmVzLiAgVGhlIGRlZmF1bHQKbWQtUkFJRDUvNiBzdHJpcGVfY2FjaGUtc2l6ZSB5aWVs ZHMgMU1CIGNvbnN1bWVkIHBlciBtZW1iZXIgZGV2aWNlLgoKV2l0aCBTU0RzIGJlY29taW5nIG1h aW5zdHJlYW0sIGFuZCBiZWNvbWluZyBldmVyIGZhc3RlciwgYXQgc29tZSBwb2ludAp0aGUgbWQt UkFJRDUvNiBhcmNoaXRlY3R1cmUgd2lsbCBoYXZlIHRvIGJlIHJlZGVzaWduZWQgYmVjYXVzZSBv ZiB0aGUKbWVtb3J5IGZvb3RwcmludCByZXF1aXJlZCBmb3IgcGVyZm9ybWFuY2UuICBDdXJyZW50 bHkgdGhlIHJlcXVpcmVkIHNpemUKb2YgdGhlIHN0cmlwZSBjYWNoZSBhcHBlYXJzIGRpcmVjdGx5 IHByb3BvcnRpb25hbCB0byB0aGUgYWdncmVnYXRlIHdyaXRlCnRocm91Z2hwdXQgb2YgdGhlIFJB SUQgZGV2aWNlcy4gIFRodXMgdGhlIG9wdGltYWwgdmFsdWUgd2lsbCB2YXJ5CmdyZWF0bHkgZnJv bSBvbmUgc3lzdGVtIHRvIGFub3RoZXIgZGVwZW5kaW5nIG9uIHRoZSB0aHJvdWdocHV0IG9mIHRo ZQpkcml2ZXMuCgpGb3IgZXhhbXBsZSwgSSBhc3Npc3RlZCBhIHVzZXIgd2l0aCA1eCBJbnRlbCBT U0RzIGJhY2sgaW4gSmFudWFyeSBhbmQKaGlzIHN5c3RlbSByZXF1aXJlZCA0MDk2LCBvciA4ME1C IG9mIFJBTSBmb3Igc3RyaXBlIGNhY2hlLCB0byByZWFjaAptYXhpbXVtIHdyaXRlIHRocm91Z2hw dXQgb2YgdGhlIGRldmljZXMuICBUaGlzIHlpZWxkZWQgNjAwTUIvcyBvciA2MCUKZ3JlYXRlciB0 aHJvdWdocHV0IHRoYW4gMjA0OCwgb3IgNDBNQiBSQU0gZm9yIGNhY2hlLiAgSW4gaGlzIGNhc2Ug NjBNQgptb3JlIFJBTSB0aGFuIHRoZSBkZWZhdWx0IHdhcyB3ZWxsIHdvcnRoIHRoZSBpbmNyZWFz ZSBhcyB0aGUgbWFjaGluZSB3YXMKYW4gaVNDU0kgdGFyZ2V0IHNlcnZlciB3aXRoIDhHQiBSQU0u CgpJbiB0aGUgcHJldmlvdXMgY2FzZSB3aXRoIDV4IHJ1c3QgUkFJRDYgdGhlIDIwNDggdmFsdWUg c2VlbWVkIG9wdGltYWwKKHRob3VnaCBub3QgeWV0IHZlcmlmaWVkKSwgcmVxdWlyaW5nIDQwTUIg bGVzcyBSQU0gdGhhbiB0aGUgNXggSW50ZWwKU1NEcy4gIEZvciBhIDMgbW9kZXJuIHJ1c3QgUkFJ RDUgdGhlIGRlZmF1bHQgb2YgMjU2LCBvciAzTUIsIGlzIGNsb3NlIHRvCm9wdGltYWwgYnV0IG1h eWJlIGEgbGl0dGxlIGxvdy4gIENvbnNpZGVyIHRoYXQgMjU2IGhhcyBiZWVuIHRoZSBkZWZhdWx0 CmZvciBhIHZlcnkgbG9uZyB0aW1lLCBhbmQgd2FzIHNlbGVjdGVkIGJhY2sgd2hlbiBhdmVyYWdl IGRyaXZlCnRocm91Z2hwdXQgd2FzIG11Y2ggbXVjaCBsb3dlciwgYXMgaW4gNTBNQi9zIG9yIGxl c3MsIFNTRHMgaGFkbid0IHlldApiZWVuIGludmVudGVkLCBhbmQgc3lzdGVtIG1lbW9yaWVzIHdl cmUgbXVjaCBzbWFsbGVyLgoKRHVlIHRvIHRoZSBtYXNzaXZlIGRpZmZlcmVuY2UgaW4gdGhyb3Vn aHB1dCBiZXR3ZWVuIHJ1c3QgYW5kIFNTRCwgYW55Cm1lYW5pbmdmdWwgY2hhbmdlIGluIHRoZSBk ZWZhdWx0IHJlYWxseSByZXF1aXJlcyBuZXcgY29kZSB0byBzbmlmZiBvdXQKd2hhdCB0eXBlIG9m IGRldmljZXMgY29uc3RpdHV0ZSB0aGUgYXJyYXksIGlmIHRoYXQncyBwb3NzaWJsZSwgYW5kIGl0 CnByb2JhYmx5IGlzbid0LCBhbmQgc2V0IGEgbG93aXNoIGRlZmF1bHQgYWNjb3JkaW5nbHkuICBB Z2FpbiwgU1NEcwpkaWRuJ3QgZXhpc3Qgd2hlbiBtZC1SQUlEIHdhcyBjb2RlZCwgbm9yIHdoZW4g dGhpcyBkZWZhdWx0IHdhcyBzZXQsIGFuZAp0aGlzIHRocm93cyBhIGJpZyBtb25rZXkgd3JlbmNo IGludG8gdGhlc2Ugc3Bva2VzLgoKLS0gClN0YW4KCl9fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fCnhmcyBtYWlsaW5nIGxpc3QKeGZzQG9zcy5zZ2kuY29tCmh0 dHA6Ly9vc3Muc2dpLmNvbS9tYWlsbWFuL2xpc3RpbmZvL3hmcwo=