public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 04/27] xfs: don't use speculative prealloc for small files
Date: Wed, 12 Jun 2013 20:22:24 +1000	[thread overview]
Message-ID: <1371032567-21772-5-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1371032567-21772-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

Dedicated small file workloads have been seeing significant free
space fragmentation causing premature inode allocation failure
when large inode sizes are in use. A particular test case showed
that a workload that runs to a real ENOSPC on 256 byte inodes would
fail inode allocation with ENOSPC about about 80% full with 512 byte
inodes, and at about 50% full with 1024 byte inodes.

The same workload, when run with -o allocsize=4096 on 1024 byte
inodes would run to being 100% full before giving ENOSPC. That is,
no freespace fragmentation at all.

The issue was caused by the specific IO pattern the application had
- the framework it was using did not support direct IO, and so it
was emulating it by using fadvise(DONT_NEED). The result was that
the data was getting written back before the speculative prealloc
had been trimmed from memory by the close(), and so small single
block files were being allocated with 2 blocks, and then having one
truncated away. The result was lots of small 4k free space extents,
and hence each new 8k allocation would take another 8k from
contiguous free space and turn it into 4k of allocated space and 4k
of free space.

Hence inode allocation, which requires contiguous, aligned
allocation of 16k (256 byte inodes), 32k (512 byte inodes) or 64k
(1024 byte inodes) can fail to find sufficiently large freespace and
hence fail while there is still lots of free space available.

There's a simple fix for this, and one that has precendence in the
allocator code already - don't do speculative allocation unless the
size of the file is larger than a certain size. In this case, that
size is the minimum default preallocation size:
mp->m_writeio_blocks. And to keep with the concept of being nice to
people when the files are still relatively small, cap the prealloc
to mp->m_writeio_blocks until the file goes over a stripe unit is
size, at which point we'll fall back to the current behaviour based
on the last extent size.

This will effectively turn off speculative prealloc for very small
files, keep preallocation low for small files, and behave as it
currently does for any file larger than a stripe unit. This
completely avoids the freespace fragmentation problem this
particular IO pattern was causing.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_iomap.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 8f8aaee..14be676 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -284,6 +284,15 @@ xfs_iomap_eof_want_preallocate(
 		return 0;
 
 	/*
+	 * If the file is smaller than the minimum prealloc and we are using
+	 * dynamic preallocation, don't do any preallocation at all as it is
+	 * likely this is the only write to the file that is going to be done.
+	 */
+	if (!(mp->m_flags & XFS_MOUNT_DFLT_IOSIZE) &&
+	    XFS_ISIZE(ip) < mp->m_writeio_blocks)
+		return 0;
+
+	/*
 	 * If there are any real blocks past eof, then don't
 	 * do any speculative allocation.
 	 */
@@ -345,6 +354,10 @@ xfs_iomap_eof_prealloc_initial_size(
 	if (mp->m_flags & XFS_MOUNT_DFLT_IOSIZE)
 		return 0;
 
+	/* If the file is small, then use the minimum prealloc */
+	if (XFS_ISIZE(ip) < mp->m_dalign)
+		return 0;
+
 	/*
 	 * As we write multiple pages, the offset will always align to the
 	 * start of a page and hence point to a hole at EOF. i.e. if the size is
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2013-06-12 10:23 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-12 10:22 [PATCH 00/27] xfs: current patch queue for 3.11 Dave Chinner
2013-06-12 10:22 ` [PATCH 01/27] xfs: update mount options documentation Dave Chinner
2013-06-13 13:34   ` Eric Sandeen
2013-06-14  0:40     ` Dave Chinner
2013-06-14  0:53       ` Eric Sandeen
2013-06-12 10:22 ` [PATCH 02/27] xfs: add pluging for bulkstat readahead Dave Chinner
2013-06-12 10:22 ` [PATCH 03/27] xfs: plug directory buffer readahead Dave Chinner
2013-06-12 10:22 ` Dave Chinner [this message]
2013-06-12 16:10   ` [PATCH 04/27] xfs: don't use speculative prealloc for small files Brian Foster
2013-06-13  0:50     ` Dave Chinner
2013-06-12 10:22 ` [PATCH 05/27] xfs: don't do IO when creating an new inode Dave Chinner
2013-06-12 10:22 ` [PATCH 06/27] xfs: xfs_ifree doesn't need to modify the inode buffer Dave Chinner
2013-06-12 10:22 ` [PATCH 07/27] xfs: Introduce ordered log vector support Dave Chinner
2013-06-12 10:22 ` [PATCH 08/27] xfs: Introduce an ordered buffer item Dave Chinner
2013-06-12 10:22 ` [PATCH 09/27] xfs: Inode create log items Dave Chinner
2013-06-12 10:22 ` [PATCH 10/27] xfs: Inode create transaction reservations Dave Chinner
2013-06-12 10:22 ` [PATCH 11/27] xfs: Inode create item recovery Dave Chinner
2013-06-12 10:22 ` [PATCH 12/27] xfs: Use inode create transaction Dave Chinner
2013-06-12 10:22 ` [PATCH 13/27] xfs: remove local fork format handling from xfs_bmapi_write() Dave Chinner
2013-06-12 10:22 ` [PATCH 14/27] xfs: move getdents code into it's own file Dave Chinner
2013-06-12 10:22 ` [PATCH 15/27] xfs: reshuffle dir2 definitions around for userspace Dave Chinner
2013-06-17 16:05   ` Christoph Hellwig
2013-06-18 21:12     ` Dave Chinner
2013-06-18 21:35       ` Dave Chinner
2013-06-12 10:22 ` [PATCH 16/27] xfs: split out attribute listing code into separate file Dave Chinner
2013-06-12 10:22 ` [PATCH 17/27] xfs: split out attribute fork truncation " Dave Chinner
2013-06-12 10:22 ` [PATCH 18/27] xfs: split out xfs inode operations " Dave Chinner
2013-06-12 14:05   ` Christoph Hellwig
2013-06-13  1:14     ` Dave Chinner
2013-06-13  8:00       ` Dave Chinner
2013-06-17 15:56         ` Christoph Hellwig
2013-06-17 18:14           ` Ben Myers
2013-06-18 20:40             ` Dave Chinner
2013-06-18 21:37               ` Ben Myers
2013-06-18 22:02                 ` Dave Chinner
2013-06-12 10:22 ` [PATCH 19/27] xfs: consolidate xfs_vnodeops.c into xfs_inode_ops.c Dave Chinner
2013-06-12 13:59   ` Christoph Hellwig
2013-06-13  1:39     ` Dave Chinner
2013-06-17 16:02       ` Christoph Hellwig
2013-06-18 20:55         ` Dave Chinner
2013-06-12 10:22 ` [PATCH 20/27] xfs: move xfs_getbmap to xfs_extent_ops.c Dave Chinner
2013-06-12 10:22 ` [PATCH 21/27] xfs: introduce xfs_sb.c for sharing with libxfs Dave Chinner
2013-06-12 10:22 ` [PATCH 22/27] xfs: move xfs_trans_reservations to xfs_trans.h Dave Chinner
2013-06-12 10:22 ` [PATCH 23/27] xfs: sync minor header differences needed by userspace Dave Chinner
2013-06-12 10:22 ` [PATCH 24/27] xfs: move xfs_bmap_punch_delalloc() to xfs_aops.c Dave Chinner
2013-06-12 14:06   ` Christoph Hellwig
2013-06-13  1:39     ` Dave Chinner
2013-06-12 10:22 ` [PATCH 25/27] xfs: split out transaction reservation code Dave Chinner
2013-06-12 10:22 ` [PATCH 26/27] xfs: minor cleanups Dave Chinner
2013-06-12 10:22 ` [PATCH 27/27] xfs: fix issues that cause userspace warnings Dave Chinner
2013-06-17 19:32   ` Brian Foster
2013-06-18 21:42     ` Dave Chinner
2013-06-12 13:06 ` [PATCH 00/27] xfs: current patch queue for 3.11 Brian Foster
2013-06-13  1:40   ` Dave Chinner
2013-06-12 14:17 ` Ben Myers
2013-06-13  1:58   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1371032567-21772-5-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox