From: Alex Elder <aelder@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 06/34] xfs: dynamic speculative EOF preallocation
Date: Mon, 27 Dec 2010 09:00:56 -0600 [thread overview]
Message-ID: <1293462056.3851.1.camel@doink> (raw)
In-Reply-To: <1292916570-25015-7-git-send-email-david@fromorbit.com>
On Tue, 2010-12-21 at 18:29 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Currently the size of the speculative preallocation during delayed
> allocation is fixed by either the allocsize mount option of a
> default size. We are seeing a lot of cases where we need to
> recommend using the allocsize mount option to prevent fragmentation
> when buffered writes land in the same AG.
This looks good. Logarithmic reduction in the size
as the file system gets close to full seems like a
reasonable heuristic. A few more minor comments
below, which you can address at your option.
In any case:
Reviewed-by: Alex Elder <aelder@sgi.com>
>
> Rather than using a fixed preallocation size by default (up to 64k),
> make it dynamic by basing it on the current inode size. That way the
> EOF preallocation will increase as the file size increases. Hence
> for streaming writes we are much more likely to get large
> preallocations exactly when we need it to reduce fragementation.
>
> For default settings, the size of the initial extents is determined
> by the number of parallel writers and the amount of memory in the
> machine. For 4GB RAM and 4 concurrent 32GB file writes:
>
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
> 0: [0..1048575]: 1048672..2097247 0 (1048672..2097247) 1048576
> 1: [1048576..2097151]: 5242976..6291551 0 (5242976..6291551) 1048576
> 2: [2097152..4194303]: 12583008..14680159 0 (12583008..14680159) 2097152
> 3: [4194304..8388607]: 25165920..29360223 0 (25165920..29360223) 4194304
> 4: [8388608..16777215]: 58720352..67108959 0 (58720352..67108959) 8388608
> 5: [16777216..33554423]: 117440584..134217791 0 (117440584..134217791) 16777208
> 6: [33554424..50331511]: 184549056..201326143 0 (184549056..201326143) 16777088
> 7: [50331512..67108599]: 251657408..268434495 0 (251657408..268434495) 16777088
>
> and for 16 concurrent 16GB file writes:
>
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
> 0: [0..262143]: 2490472..2752615 0 (2490472..2752615) 262144
> 1: [262144..524287]: 6291560..6553703 0 (6291560..6553703) 262144
> 2: [524288..1048575]: 13631592..14155879 0 (13631592..14155879) 524288
> 3: [1048576..2097151]: 30408808..31457383 0 (30408808..31457383) 1048576
> 4: [2097152..4194303]: 52428904..54526055 0 (52428904..54526055) 2097152
> 5: [4194304..8388607]: 104857704..109052007 0 (104857704..109052007) 4194304
> 6: [8388608..16777215]: 209715304..218103911 0 (209715304..218103911) 8388608
> 7: [16777216..33554423]: 452984848..469762055 0 (452984848..469762055) 16777208
>
> Because it is hard to take back specualtive preallocation, cases
> where there are large slow growing log files on a nearly full
> filesystem may cause premature ENOSPC. Hence as the filesystem nears
> full, the maximum dynamic prealloc size іs reduced according to this
> table (based on 4k block size):
>
> freespace max prealloc size
> >5% full extent (8GB)
> 4-5% 2GB (8GB >> 2)
> 3-4% 1GB (8GB >> 3)
> 2-3% 512MB (8GB >> 4)
> 1-2% 256MB (8GB >> 5)
> <1% 128MB (8GB >> 6)
On really small filesystems this might be excessive.
On the other hand, you're already basing the size on
the file size so that should limit things just fine.
> This should reduce the amount of space held in speculative
> preallocation for such cases.
>
> The allocsize mount option turns off the dynamic behaviour and fixes
> the prealloc size to whatever the mount option specifies. i.e. the
> behaviour is unchanged.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_fsops.c | 1 +
> fs/xfs/xfs_iomap.c | 84 +++++++++++++++++++++++++++++++++++++++++++++------
> fs/xfs/xfs_mount.c | 21 +++++++++++++
> fs/xfs/xfs_mount.h | 14 ++++++++
> 4 files changed, 110 insertions(+), 10 deletions(-)
>
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index be34ff2..6d17206 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -374,6 +374,7 @@ xfs_growfs_data_private(
> mp->m_maxicount = icount << mp->m_sb.sb_inopblog;
> } else
> mp->m_maxicount = 0;
> + xfs_set_low_space_thresholds(mp);
>
> /* update secondary superblocks. */
> for (agno = 1; agno < nagcount; agno++) {
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 22b62a1..f36d2c8 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -267,6 +267,9 @@ error_out:
> * If the caller is doing a write at the end of the file, then extend the
> * allocation out to the file system's write iosize. We clean up any extra
> * space left over when the file is closed in xfs_inactive().
> + *
> + * If we find we already have delalloc preallocation beyond EOF, don't do more
> + * preallocation as it it not needed.
> */
> STATIC int
> xfs_iomap_eof_want_preallocate(
> @@ -282,6 +285,7 @@ xfs_iomap_eof_want_preallocate(
> xfs_filblks_t count_fsb;
> xfs_fsblock_t firstblock;
> int n, error, imaps;
> + int found_delalloc = 0;
>
> *prealloc = 0;
> if ((offset + count) <= ip->i_size)
> @@ -306,12 +310,60 @@ xfs_iomap_eof_want_preallocate(
> return 0;
> start_fsb += imap[n].br_blockcount;
> count_fsb -= imap[n].br_blockcount;
> +
> + if (imap[n].br_startblock == DELAYSTARTBLOCK)
> + found_delalloc = 1;
> }
> }
> - *prealloc = 1;
> + if (!found_delalloc)
> + *prealloc = 1;
Isn't this a separate change, possibly worthy of its
own commit?
> return 0;
> }
>
> +/*
> + * If we don't have a user specified preallocation size, dynamically increase
> + * the preallocation size as the size of the file grows. Cap the maximum size
> + * at a single extent or less if the filesystem is near full. The closer the
> + * filesystem is to full, the smaller the maximum prealocation.
> + */
> +STATIC xfs_fsblock_t
> +xfs_iomap_prealloc_size(
> + struct xfs_mount *mp,
> + struct xfs_inode *ip)
> +{
> + xfs_fsblock_t alloc_blocks = 0;
> +
> + if (!(mp->m_flags & XFS_MOUNT_DFLT_IOSIZE)) {
> + int shift = 0;
> + int64_t freesp;
> +
> + alloc_blocks = XFS_B_TO_FSB(mp, ip->i_size);
Why not i_size_read()? Only matters for 32-bit, but
you're using do_div() below so you do seem to care
about it sometimes.
> + alloc_blocks = XFS_FILEOFF_MIN(MAXEXTLEN,
> + rounddown_pow_of_two(alloc_blocks));
> +
> + freesp = percpu_counter_read_positive(
> + &mp->m_icsb[XFS_ICSB_FDBLOCKS]);
> + if (freesp < mp->m_low_space[XFS_LOWSP_5_PCNT]) {
> + shift = 2;
> + if (freesp < mp->m_low_space[XFS_LOWSP_4_PCNT])
> + shift++;
> + if (freesp < mp->m_low_space[XFS_LOWSP_3_PCNT])
> + shift++;
> + if (freesp < mp->m_low_space[XFS_LOWSP_2_PCNT])
> + shift++;
> + if (freesp < mp->m_low_space[XFS_LOWSP_1_PCNT])
> + shift++;
> + }
> + if (shift)
> + alloc_blocks >>= shift;
> + }
> +
> + if (alloc_blocks < mp->m_writeio_blocks)
> + alloc_blocks = mp->m_writeio_blocks;
> +
> + return alloc_blocks;
> +}
> +
> int
> xfs_iomap_write_delay(
> xfs_inode_t *ip,
> @@ -344,6 +396,7 @@ xfs_iomap_write_delay(
> extsz = xfs_get_extsz_hint(ip);
> offset_fsb = XFS_B_TO_FSBT(mp, offset);
>
> +
Extra blank line.
> error = xfs_iomap_eof_want_preallocate(mp, ip, offset, count,
> imap, XFS_WRITE_IMAPS, &prealloc);
> if (error)
> @@ -351,9 +404,11 @@ xfs_iomap_write_delay(
>
> retry:
> if (prealloc) {
> + xfs_fsblock_t alloc_blocks = xfs_iomap_prealloc_size(mp, ip);
> +
> aligned_offset = XFS_WRITEIO_ALIGN(mp, (offset + count - 1));
> ioalign = XFS_B_TO_FSBT(mp, aligned_offset);
> - last_fsb = ioalign + mp->m_writeio_blocks;
> + last_fsb = ioalign + alloc_blocks;
> } else {
> last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
> }
> @@ -371,22 +426,31 @@ retry:
> XFS_BMAPI_DELAY | XFS_BMAPI_WRITE |
> XFS_BMAPI_ENTIRE, &firstblock, 1, imap,
> &nimaps, NULL);
> - if (error && (error != ENOSPC))
> + switch (error) {
> + case 0:
> + case ENOSPC:
> + case EDQUOT:
> + break;
This (above and below, in this hunk) is also a separate
change, possibly worthy of its own small commit.
> + default:
> return XFS_ERROR(error);
> + }
>
> /*
> - * If bmapi returned us nothing, and if we didn't get back EDQUOT,
> - * then we must have run out of space - flush all other inodes with
> - * delalloc blocks and retry without EOF preallocation.
> + * If bmapi returned us nothing, we got either ENOSPC or EDQUOT. For
> + * ENOSPC, * flush all other inodes with delalloc blocks to free up
> + * some of the excess reserved metadata space. For both cases, retry
> + * without EOF preallocation.
> */
> if (nimaps == 0) {
> trace_xfs_delalloc_enospc(ip, offset, count);
> if (flushed)
> - return XFS_ERROR(ENOSPC);
> + return XFS_ERROR(error ? error : ENOSPC);
>
> - xfs_iunlock(ip, XFS_ILOCK_EXCL);
> - xfs_flush_inodes(ip);
> - xfs_ilock(ip, XFS_ILOCK_EXCL);
> + if (error == ENOSPC) {
> + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> + xfs_flush_inodes(ip);
> + xfs_ilock(ip, XFS_ILOCK_EXCL);
> + }
>
> flushed = 1;
> error = 0;
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index d5710232..f1b094d 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1101,6 +1101,24 @@ xfs_set_rw_sizes(xfs_mount_t *mp)
> }
>
> /*
> + * precalculate the low space thresholds for dynamic speculative preallocation.
> + */
> +void
> +xfs_set_low_space_thresholds(
> + struct xfs_mount *mp)
> +{
> + int i;
> +
> + for (i = 0; i < XFS_LOWSP_MAX; i++) {
> + __uint64_t space = mp->m_sb.sb_dblocks;
> +
> + do_div(space, 100);
How about computing this once, outside the loop?
> + mp->m_low_space[i] = space * (i + 1);
> + }
> +}
> +
> +
> +/*
> * Set whether we're using inode alignment.
> */
> STATIC void
> @@ -1322,6 +1340,9 @@ xfs_mountfs(
> */
> xfs_set_rw_sizes(mp);
>
> + /* set the low space thresholds for dynamic preallocation */
> + xfs_set_low_space_thresholds(mp);
> +
> /*
> * Set the inode cluster size.
> * This may still be overridden by the file system
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 03ad25c6..7b42e04 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -75,6 +75,16 @@ enum {
> XFS_ICSB_MAX,
> };
>
> +/* dynamic preallocation free space thresholds, 5% down to 1% */
> +enum {
> + XFS_LOWSP_1_PCNT = 0,
Why not make 1% be 1, 2% be 2, etc.?
> + XFS_LOWSP_2_PCNT,
> + XFS_LOWSP_3_PCNT,
> + XFS_LOWSP_4_PCNT,
> + XFS_LOWSP_5_PCNT,
> + XFS_LOWSP_MAX,
> +};
> +
> typedef struct xfs_mount {
> struct super_block *m_super;
> xfs_tid_t m_tid; /* next unused tid for fs */
> @@ -169,6 +179,8 @@ typedef struct xfs_mount {
> on the next remount,rw */
> struct shrinker m_inode_shrink; /* inode reclaim shrinker */
> struct percpu_counter m_icsb[XFS_ICSB_MAX];
> + int64_t m_low_space[XFS_LOWSP_MAX];
> + /* low free space thresholds */
> } xfs_mount_t;
>
> /*
> @@ -333,6 +345,8 @@ extern void xfs_icsb_sync_counters(struct xfs_mount *);
> extern int xfs_icsb_modify_inodes(struct xfs_mount *, int, int64_t);
> extern int xfs_icsb_modify_free_blocks(struct xfs_mount *, int64_t, int);
>
> +extern void xfs_set_low_space_thresholds(struct xfs_mount *);
> +
> #endif /* __KERNEL__ */
>
> extern void xfs_mod_sb(struct xfs_trans *, __int64_t);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2010-12-27 14:59 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-21 7:28 [PATCH 00/34] xfs: scalability patchset for 2.6.38 Dave Chinner
2010-12-21 7:28 ` [PATCH 01/34] xfs: provide a inode iolock lockdep class Dave Chinner
2010-12-21 15:15 ` Christoph Hellwig
2010-12-21 7:28 ` [PATCH 02/34] xfs: use KM_NOFS for allocations during attribute list operations Dave Chinner
2010-12-21 15:16 ` Christoph Hellwig
2010-12-21 7:28 ` [PATCH 03/34] lib: percpu counter add unless less than functionality Dave Chinner
2010-12-22 2:20 ` Alex Elder
2010-12-22 3:46 ` Dave Chinner
2010-12-21 7:29 ` [PATCH 04/34] xfs: use generic per-cpu counter infrastructure Dave Chinner
2010-12-21 7:29 ` [PATCH 05/34] xfs: demultiplex xfs_icsb_modify_counters() Dave Chinner
2010-12-21 7:29 ` [PATCH 06/34] xfs: dynamic speculative EOF preallocation Dave Chinner
2010-12-21 15:15 ` Christoph Hellwig
2010-12-21 21:42 ` Dave Chinner
2010-12-21 23:44 ` Dave Chinner
2010-12-22 2:29 ` Alex Elder
2010-12-29 12:56 ` Christoph Hellwig
2010-12-27 14:57 ` Alex Elder
2010-12-27 15:00 ` Alex Elder [this message]
2011-01-06 18:16 ` Christoph Hellwig
2010-12-21 7:29 ` [PATCH 07/34] xfs: don't truncate prealloc from frequently accessed inodes Dave Chinner
2010-12-21 16:45 ` Christoph Hellwig
2010-12-21 7:29 ` [PATCH 08/34] xfs: rcu free inodes Dave Chinner
2010-12-21 7:29 ` [PATCH 09/34] xfs: convert inode cache lookups to use RCU locking Dave Chinner
2010-12-21 7:29 ` [PATCH 10/34] xfs: convert pag_ici_lock to a spin lock Dave Chinner
2010-12-21 7:29 ` [PATCH 11/34] xfs: convert xfsbud shrinker to a per-buftarg shrinker Dave Chinner
2010-12-21 7:29 ` [PATCH 12/34] xfs: add a lru to the XFS buffer cache Dave Chinner
2010-12-21 7:29 ` [PATCH 13/34] xfs: connect up buffer reclaim priority hooks Dave Chinner
2010-12-21 7:29 ` [PATCH 14/34] xfs: fix EFI transaction cancellation Dave Chinner
2010-12-21 7:29 ` [PATCH 15/34] xfs: Pull EFI/EFD handling out from under the AIL lock Dave Chinner
2010-12-21 7:29 ` [PATCH 16/34] xfs: clean up xfs_ail_delete() Dave Chinner
2010-12-21 7:29 ` [PATCH 17/34] xfs: bulk AIL insertion during transaction commit Dave Chinner
2010-12-21 7:29 ` [PATCH 18/34] xfs: reduce the number of AIL push wakeups Dave Chinner
2010-12-21 7:29 ` [PATCH 19/34] xfs: consume iodone callback items on buffers as they are processed Dave Chinner
2010-12-21 7:29 ` [PATCH 20/34] xfs: remove all the inodes on a buffer from the AIL in bulk Dave Chinner
2010-12-22 2:20 ` Alex Elder
2010-12-22 3:49 ` Dave Chinner
2010-12-21 7:29 ` [PATCH 22/34] xfs: use AIL bulk delete function to implement single delete Dave Chinner
2010-12-21 7:29 ` [PATCH 23/34] xfs: convert log grant ticket queues to list heads Dave Chinner
2010-12-21 7:29 ` [PATCH 24/34] xfs: fact out common grant head/log tail verification code Dave Chinner
2010-12-21 7:29 ` [PATCH 25/34] xfs: rework log grant space calculations Dave Chinner
2010-12-21 7:29 ` [PATCH 26/34] xfs: combine grant heads into a single 64 bit integer Dave Chinner
2010-12-21 7:29 ` [PATCH 27/34] xfs: use wait queues directly for the log wait queues Dave Chinner
2010-12-21 7:29 ` [PATCH 28/34] xfs: make AIL tail pushing independent of the grant lock Dave Chinner
2010-12-21 7:29 ` [PATCH 29/34] xfs: convert l_last_sync_lsn to an atomic variable Dave Chinner
2010-12-21 7:29 ` [PATCH 30/34] xfs: convert l_tail_lsn " Dave Chinner
2010-12-29 12:52 ` Christoph Hellwig
2010-12-29 15:49 ` Alex Elder
2010-12-21 7:29 ` [PATCH 31/34] xfs: convert log grant heads to atomic variables Dave Chinner
2010-12-21 7:29 ` [PATCH 32/34] xfs: introduce new locks for the log grant ticket wait queues Dave Chinner
2010-12-21 7:29 ` [PATCH 33/34] xfs: convert grant head manipulations to lockless algorithm Dave Chinner
2010-12-21 7:29 ` [PATCH 34/34] xfs: kill useless spinlock_destroy macro Dave Chinner
2010-12-23 1:15 ` [PATCH 00/34] xfs: scalability patchset for 2.6.38 Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1293462056.3851.1.camel@doink \
--to=aelder@sgi.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox