From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id q3A7Yh6X064746 for ; Tue, 10 Apr 2012 02:34:43 -0500 Received: from greer.hardwarefreak.com (mo-65-41-216-221.sta.embarqhsd.net [65.41.216.221]) by cuda.sgi.com with ESMTP id bxnmaADpcwlMYC9U for ; Tue, 10 Apr 2012 00:34:42 -0700 (PDT) Received: from [192.168.100.53] (gffx.hardwarefreak.com [192.168.100.53]) by greer.hardwarefreak.com (Postfix) with ESMTP id 6DB636C105 for ; Tue, 10 Apr 2012 02:34:41 -0500 (CDT) Message-ID: <4F83E293.8000509@hardwarefreak.com> Date: Tue, 10 Apr 2012 02:34:43 -0500 From: Stan Hoeppner MIME-Version: 1.0 Subject: Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?) References: <20350.9643.379841.771496@tree.ty.sabi.co.UK> <20350.13616.901974.523140@tree.ty.sabi.co.UK> <20352.28730.273834.568559@tree.ty.sabi.co.UK> <4F8074EC.2030108@gmail.com> <4F82063F.4070609@hardwarefreak.com> <4F826FFA.4050207@hardwarefreak.com> In-Reply-To: Reply-To: stan@hardwarefreak.com List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: xfs@oss.sgi.com On 4/9/2012 6:52 AM, Stefan Ring wrote: > Whatever the problem with the controller may be, it behaves quite > nicely usually. It seems clear though, that, regardless of the storage > technology, it cannot be a good idea to schedule tiny blocks in the > order that XFS schedules them in my case. > > This: > AG0 * * * > AG1 * * * > AG2 * * * > AG3 * * * > > cannot be better than this: > > AG0 *** > AG1 *** > AG2 *** > AG3 *** With 4 AGs this must represent the RAID6 or RAID10 case. Those don't seem to show any overlapping concurrency. Maybe I'm missing something, but it should look more like this, at least in the concat case: AG0 *** AG1 *** AG2 *** > Yes, in theory, a good cache controller should be able to sort this > out. But at least this particular controller is not able to do so and > could use a little help. Is the cache in write-through or write-back mode? The latter should allow for aggressive reordering. The former none, or very little. And is all of it dedicated to writes, or is it split? If split, dedicate it all to writes. Linux is going to cache block reads anyway, so it makes little sense to cache them in the controller as well. > Also, a single consumer-grade drive is > certainly not helped by this write ordering. Are you referring to the Mushkin SSD I mentioned? The SandForce 2281 onboard the Enhanced Chronos Deluxe is capable of a *sustained* 20,000 4KB random write IOPs, 60,000 peak. Mushkin states 90,000, which may be due to their use of Toggle Mode NAND instead ONFi, and/or they're simply fudging. Regardless, 20K real write IOPS is enough to make scheduling/ordering mostly irrelevant I'd think. Just format with 8 AGs to be on the safe side for DLP (directory level parallelism), and you're off to the races. The features of the SF2000 series make MLC SSDs based on it much more like 'enterprise' SLC SSDs in most respects. The lines between "consumer" and "enterprise" SSDs have already been blurred as many vendors have already been selling "enterprise" MLC SSDs for a while now, including Intel, Kingston, OCZ, PNY, and Seagate. All are based on the same SandForce 2281 as in this Mushkin, or the 2282, which is required for devices over 512GB. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs