From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id 663BF7F54 for ; Thu, 20 Jun 2013 14:31:29 -0500 (CDT) Message-ID: <51C35891.7000501@sgi.com> Date: Thu, 20 Jun 2013 14:31:29 -0500 From: Mark Tinguely MIME-Version: 1.0 Subject: Re: [PATCH 04/60] xfs: don't use speculative prealloc for small files References: <1371617468-32559-1-git-send-email-david@fromorbit.com> <1371617468-32559-5-git-send-email-david@fromorbit.com> In-Reply-To: <1371617468-32559-5-git-send-email-david@fromorbit.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: xfs@oss.sgi.com On 06/18/13 23:50, Dave Chinner wrote: > From: Dave Chinner > > Dedicated small file workloads have been seeing significant free > space fragmentation causing premature inode allocation failure > when large inode sizes are in use. A particular test case showed > that a workload that runs to a real ENOSPC on 256 byte inodes would > fail inode allocation with ENOSPC about about 80% full with 512 byte > inodes, and at about 50% full with 1024 byte inodes. > > The same workload, when run with -o allocsize=4096 on 1024 byte > inodes would run to being 100% full before giving ENOSPC. That is, > no freespace fragmentation at all. > > The issue was caused by the specific IO pattern the application had > - the framework it was using did not support direct IO, and so it > was emulating it by using fadvise(DONT_NEED). The result was that > the data was getting written back before the speculative prealloc > had been trimmed from memory by the close(), and so small single > block files were being allocated with 2 blocks, and then having one > truncated away. The result was lots of small 4k free space extents, > and hence each new 8k allocation would take another 8k from > contiguous free space and turn it into 4k of allocated space and 4k > of free space. > > Hence inode allocation, which requires contiguous, aligned > allocation of 16k (256 byte inodes), 32k (512 byte inodes) or 64k > (1024 byte inodes) can fail to find sufficiently large freespace and > hence fail while there is still lots of free space available. > > There's a simple fix for this, and one that has precendence in the > allocator code already - don't do speculative allocation unless the > size of the file is larger than a certain size. In this case, that > size is the minimum default preallocation size: > mp->m_writeio_blocks. And to keep with the concept of being nice to > people when the files are still relatively small, cap the prealloc > to mp->m_writeio_blocks until the file goes over a stripe unit is > size, at which point we'll fall back to the current behaviour based > on the last extent size. > > This will effectively turn off speculative prealloc for very small > files, keep preallocation low for small files, and behave as it > currently does for any file larger than a stripe unit. This > completely avoids the freespace fragmentation problem this > particular IO pattern was causing. > > Signed-off-by: Dave Chinner > --- I agree with Brian, it looks good. Reviewed-by: Mark Tinguely _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs