From mboxrd@z Thu Jan 1 00:00:00 1970 From: Corey Hickey Subject: Re: RAID 5: low sequential write performance? Date: Mon, 17 Jun 2013 10:14:14 -0700 Message-ID: <51BF43E6.4000900@fatooh.org> References: <51BCF46B.40704@fatooh.org> <20926.11718.556180.928129@tree.ty.sabi.co.uk> <51BEAF12.10909@fatooh.org> <51BF1BAC.9020701@hardwarefreak.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <51BF1BAC.9020701@hardwarefreak.com> Sender: linux-raid-owner@vger.kernel.org To: stan@hardwarefreak.com Cc: Peter Grandi , Linux RAID List-Id: linux-raid.ids On 2013-06-17 07:22, Stan Hoeppner wrote: > On 6/17/2013 1:39 AM, Corey Hickey wrote: > >> 32768 seems to be the maximum for the stripe cache. I'm quite happy to >> spend 32 MB for this. 256 KB seems quite low, especially since it's only >> half the default chunk size. > > FULL STOP. Your stripe cache is consuming *384MB* of RAM, not 32MB. > Check your actual memory consumption. The value plugged into > stripe_cache_size is not a byte value. The value specifies the number > of data elements in the stripe cache array. Each element is #disks*4KB > in size. The formula for calculating memory consumed by the stripe > cache is: > > (num_of_disks * 4KB) * stripe_cache_size > > In your case this would be > > (3 * 4KB) * 32768 = 384MB I'm actually seeing a bit more memory difference: 401-402 MB when going from 256 to to 32768, on a mostly idle system, so maybe there's something else coming into play. Still your formula does make more sense. Apparently the idea of the value being KB is a common misconception, possibly perpetuated by this: https://raid.wiki.kernel.org/index.php/Performance --- # Set stripe-cache_size for RAID5. echo "Setting stripe_cache_size to 16 MiB for /dev/md3" echo 16384 > /sys/block/md3/md/stripe_cache_size --- Is 256 really a reasonable default? Given what I've been seeing, it appears that 256 is either unreasonably low or I have something else wrong. >> mkfs.xfs /dev/m3 >> direct: 89.8 MB/s not direct: 90.0 MB/s > > You didn't align XFS. Though with large streaming writes it won't > matter much as md and the block layer will fill the stripes. However, > XFS' big advantage is parallel IO and you're testing serial IO. Fire up > 4 O_DIRECT threads/processes and compare to EXT4 w/4 write threads. The > throughput gap will increase until you run out of hardware. This will be something to test next time I rebuild my "real" array. Thanks, Corey