From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 04/16] xfs: dynamic speculative EOF preallocation
Date: Mon, 8 Nov 2010 19:55:07 +1100 [thread overview]
Message-ID: <1289206519-18377-5-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1289206519-18377-1-git-send-email-david@fromorbit.com>
From: Dave Chinner <dchinner@redhat.com>
Currently the size of the speculative preallocation during delayed
allocation is fixed by either the allocsize mount option of a
default size. We are seeing a lot of cases where we need to
recommend using the allocsize mount option to prevent fragmentation
when buffered writes land in the same AG.
Rather than using a fixed preallocation size by default (up to 64k),
make it dynamic by basing it on the current inode size. That way the
EOF preallocation will increase as the file size increases. Hence
for streaming writes we are much more likely to get large
preallocations exactly when we need it to reduce fragementation.
For default settings, ひe size and the initial extents is determined
by the number of parallel writers and the amount of memory in the
machine. For 4GB RAM and 4 concurrent 32GB file writes:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..1048575]: 1048672..2097247 0 (1048672..2097247) 1048576
1: [1048576..2097151]: 5242976..6291551 0 (5242976..6291551) 1048576
2: [2097152..4194303]: 12583008..14680159 0 (12583008..14680159) 2097152
3: [4194304..8388607]: 25165920..29360223 0 (25165920..29360223) 4194304
4: [8388608..16777215]: 58720352..67108959 0 (58720352..67108959) 8388608
5: [16777216..33554423]: 117440584..134217791 0 (117440584..134217791) 16777208
6: [33554424..50331511]: 184549056..201326143 0 (184549056..201326143) 16777088
7: [50331512..67108599]: 251657408..268434495 0 (251657408..268434495) 16777088
and for 16 concurrent 16GB file writes:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..262143]: 2490472..2752615 0 (2490472..2752615) 262144
1: [262144..524287]: 6291560..6553703 0 (6291560..6553703) 262144
2: [524288..1048575]: 13631592..14155879 0 (13631592..14155879) 524288
3: [1048576..2097151]: 30408808..31457383 0 (30408808..31457383) 1048576
4: [2097152..4194303]: 52428904..54526055 0 (52428904..54526055) 2097152
5: [4194304..8388607]: 104857704..109052007 0 (104857704..109052007) 4194304
6: [8388608..16777215]: 209715304..218103911 0 (209715304..218103911) 8388608
7: [16777216..33554423]: 452984848..469762055 0 (452984848..469762055) 16777208
The allocsize mount option still controls the minimum preallocation size, so
the smallest extent size can stil be bound in situations where this behaviour
is not sufficient.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_iomap.c | 53 ++++++++++++++++++++++++++++++++++++++++++---------
1 files changed, 43 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 2057614..0227ac1 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -389,6 +389,9 @@ error_out:
* If the caller is doing a write at the end of the file, then extend the
* allocation out to the file system's write iosize. We clean up any extra
* space left over when the file is closed in xfs_inactive().
+ *
+ * If we find we already have delalloc preallocation beyond EOF, don't do more
+ * preallocation as it it not needed.
*/
STATIC int
xfs_iomap_eof_want_preallocate(
@@ -405,6 +408,7 @@ xfs_iomap_eof_want_preallocate(
xfs_filblks_t count_fsb;
xfs_fsblock_t firstblock;
int n, error, imaps;
+ int found_delalloc = 0;
*prealloc = 0;
if ((offset + count) <= ip->i_size)
@@ -427,11 +431,16 @@ xfs_iomap_eof_want_preallocate(
if ((imap[n].br_startblock != HOLESTARTBLOCK) &&
(imap[n].br_startblock != DELAYSTARTBLOCK))
return 0;
+
start_fsb += imap[n].br_blockcount;
count_fsb -= imap[n].br_blockcount;
+
+ if (imap[n].br_startblock == DELAYSTARTBLOCK)
+ found_delalloc = 1;
}
}
- *prealloc = 1;
+ if (!found_delalloc)
+ *prealloc = 1;
return 0;
}
@@ -469,6 +478,7 @@ xfs_iomap_write_delay(
extsz = xfs_get_extsz_hint(ip);
offset_fsb = XFS_B_TO_FSBT(mp, offset);
+
error = xfs_iomap_eof_want_preallocate(mp, ip, offset, count,
ioflag, imap, XFS_WRITE_IMAPS, &prealloc);
if (error)
@@ -476,9 +486,23 @@ xfs_iomap_write_delay(
retry:
if (prealloc) {
+ xfs_fileoff_t alloc_blocks = 0;
+ /*
+ * If we don't have a user specified preallocation size, dynamically
+ * increase the preallocation size as the size of the file
+ * grows. Cap the maximum size at a single extent.
+ */
+ if (!(mp->m_flags & XFS_MOUNT_DFLT_IOSIZE)) {
+ alloc_blocks = XFS_B_TO_FSB(mp, ip->i_size);
+ alloc_blocks = XFS_FILEOFF_MIN(MAXEXTLEN,
+ rounddown_pow_of_two(alloc_blocks));
+ }
+ if (alloc_blocks < mp->m_writeio_blocks)
+ alloc_blocks = mp->m_writeio_blocks;
+
aligned_offset = XFS_WRITEIO_ALIGN(mp, (offset + count - 1));
ioalign = XFS_B_TO_FSBT(mp, aligned_offset);
- last_fsb = ioalign + mp->m_writeio_blocks;
+ last_fsb = ioalign + alloc_blocks;
} else {
last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
}
@@ -496,22 +520,31 @@ retry:
XFS_BMAPI_DELAY | XFS_BMAPI_WRITE |
XFS_BMAPI_ENTIRE, &firstblock, 1, imap,
&nimaps, NULL);
- if (error && (error != ENOSPC))
+ switch (error) {
+ case 0:
+ case ENOSPC:
+ case EDQUOT:
+ break;
+ default:
return XFS_ERROR(error);
+ }
/*
- * If bmapi returned us nothing, and if we didn't get back EDQUOT,
- * then we must have run out of space - flush all other inodes with
- * delalloc blocks and retry without EOF preallocation.
+ * If bmapi returned us nothing, we got either ENOSPC or EDQUOT. For
+ * ENOSPC, * flush all other inodes with delalloc blocks to free up
+ * some of the excess reserved metadata space. For both cases, retry
+ * without EOF preallocation.
*/
if (nimaps == 0) {
trace_xfs_delalloc_enospc(ip, offset, count);
if (flushed)
- return XFS_ERROR(ENOSPC);
+ return XFS_ERROR(error ? error : ENOSPC);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- xfs_flush_inodes(ip);
- xfs_ilock(ip, XFS_ILOCK_EXCL);
+ if (error == ENOSPC) {
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ xfs_flush_inodes(ip);
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ }
flushed = 1;
error = 0;
--
1.7.2.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2010-11-08 8:55 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-08 8:55 [PATCH 00/16] xfs: current patch stack for 2.6.38 window Dave Chinner
2010-11-08 8:55 ` [PATCH 01/16] xfs: fix per-ag reference counting in inode reclaim tree walking Dave Chinner
2010-11-08 9:23 ` Christoph Hellwig
2010-11-08 8:55 ` [PATCH 02/16] xfs: move delayed write buffer trace Dave Chinner
2010-11-08 9:24 ` Christoph Hellwig
2010-11-08 8:55 ` [PATCH 03/16] [RFC] xfs: use generic per-cpu counter infrastructure Dave Chinner
2010-11-08 12:13 ` Christoph Hellwig
2010-11-09 0:20 ` Dave Chinner
2010-11-08 8:55 ` Dave Chinner [this message]
2010-11-08 11:43 ` [PATCH 04/16] xfs: dynamic speculative EOF preallocation Christoph Hellwig
2010-11-09 0:08 ` Dave Chinner
2010-11-08 8:55 ` [PATCH 05/16] xfs: don't truncate prealloc from frequently accessed inodes Dave Chinner
2010-11-08 11:36 ` Christoph Hellwig
2010-11-08 23:56 ` Dave Chinner
2010-11-08 8:55 ` [PATCH 06/16] patch xfs-inode-hash-fake Dave Chinner
2010-11-08 9:19 ` Christoph Hellwig
2010-11-08 8:55 ` [PATCH 07/16] xfs: convert inode cache lookups to use RCU locking Dave Chinner
2010-11-08 23:09 ` Christoph Hellwig
2010-11-09 0:24 ` Dave Chinner
2010-11-09 3:36 ` Paul E. McKenney
2010-11-09 5:04 ` Dave Chinner
2010-11-10 5:12 ` Paul E. McKenney
2010-11-10 6:20 ` Dave Chinner
2010-11-08 8:55 ` [PATCH 08/16] xfs: convert pag_ici_lock to a spin lock Dave Chinner
2010-11-08 23:10 ` Christoph Hellwig
2010-11-08 8:55 ` [PATCH 09/16] xfs: convert xfsbud shrinker to a per-buftarg shrinker Dave Chinner
2010-11-08 8:55 ` [PATCH 10/16] xfs: add a lru to the XFS buffer cache Dave Chinner
2010-11-08 23:19 ` Christoph Hellwig
2010-11-08 23:45 ` Dave Chinner
2010-11-08 8:55 ` [PATCH 11/16] xfs: connect up buffer reclaim priority hooks Dave Chinner
2010-11-08 11:25 ` Christoph Hellwig
2010-11-08 23:50 ` Dave Chinner
2010-11-08 8:55 ` [PATCH 12/16] xfs: bulk AIL insertion during transaction commit Dave Chinner
2010-11-08 8:55 ` [PATCH 13/16] xfs: reduce the number of AIL push wakeups Dave Chinner
2010-11-08 11:32 ` Christoph Hellwig
2010-11-08 23:51 ` Dave Chinner
2010-11-08 8:55 ` [PATCH 14/16] xfs: remove all the inodes on a buffer from the AIL in bulk Dave Chinner
2010-11-08 8:55 ` [PATCH 15/16] xfs: only run xfs_error_test if error injection is active Dave Chinner
2010-11-08 11:33 ` Christoph Hellwig
2010-11-08 8:55 ` [PATCH 16/16] xfs: make xlog_space_left() independent of the grant lock Dave Chinner
2010-11-08 14:17 ` [PATCH 00/16] xfs: current patch stack for 2.6.38 window Christoph Hellwig
2010-11-09 0:21 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1289206519-18377-5-git-send-email-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox