public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Peter Vajgel <pv@fb.com>
Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: allocsize mount option
Date: Fri, 21 Jan 2011 11:48:16 +1100	[thread overview]
Message-ID: <20110121004816.GW16267@dastard> (raw)
In-Reply-To: <3F5ACD12257C714E9C0535D0A83917180288FB@SC-MBX02-2.TheFacebook.com>

On Thu, Jan 20, 2011 at 08:41:19PM +0000, Peter Vajgel wrote:
> We write about 100 100GB files into a single 10TB volume with xfs.
> We are using allocsize=1g to limit the fragmentation with a great
> success. We also need to reserve some space (~200GB) on each
> filesystem for processing the files and writing new versions of
> the files. Once we have only 200GB available we stop writing to
> the files. However with allocsize it's not that easy - we see +/-
> 100GB added or taken depending if there are still writes going and
> if the file was reopened ... Is there a way to programmatically
> disable allocsize speculative preallocation once we exceed certain
> threshold and also return the current speculative preallocation
> back to the free space (without closing the file)?

No and no.

However, if you take a look at the new dynamic specualtive
allocation code in 2.6.38-rc1, it scales back the preallocation as
ENOSPC is approached but doesn't do any reclaiming of existing
preallocation. It will also preallocates much larger extents so it
may not be ideal for you, either. I've appended the commit message
below.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

commit 055388a3188f56676c21e92962fc366ac8b5cb72
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Jan 4 11:35:03 2011 +1100

    xfs: dynamic speculative EOF preallocation
    
    Currently the size of the speculative preallocation during delayed
    allocation is fixed by either the allocsize mount option of a
    default size. We are seeing a lot of cases where we need to
    recommend using the allocsize mount option to prevent fragmentation
    when buffered writes land in the same AG.
    
    Rather than using a fixed preallocation size by default (up to 64k),
    make it dynamic by basing it on the current inode size. That way the
    EOF preallocation will increase as the file size increases.  Hence
    for streaming writes we are much more likely to get large
    preallocations exactly when we need it to reduce fragementation.
    
    For default settings, the size of the initial extents is determined
    by the number of parallel writers and the amount of memory in the
    machine. For 4GB RAM and 4 concurrent 32GB file writes:
    
    EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 T
       0: [0..1048575]:         1048672..2097247      0 (1048672..2097247)      10
       1: [1048576..2097151]:   5242976..6291551      0 (5242976..6291551)      10
       2: [2097152..4194303]:   12583008..14680159    0 (12583008..14680159)    20
       3: [4194304..8388607]:   25165920..29360223    0 (25165920..29360223)    41
       4: [8388608..16777215]:  58720352..67108959    0 (58720352..67108959)    83
       5: [16777216..33554423]: 117440584..134217791  0 (117440584..134217791) 167
       6: [33554424..50331511]: 184549056..201326143  0 (184549056..201326143) 167
       7: [50331512..67108599]: 251657408..268434495  0 (251657408..268434495) 167
    
    and for 16 concurrent 16GB file writes:
    
     EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 
       0: [0..262143]:          2490472..2752615      0 (2490472..2752615)       2
       1: [262144..524287]:     6291560..6553703      0 (6291560..6553703)       2
       2: [524288..1048575]:    13631592..14155879    0 (13631592..14155879)     5
       3: [1048576..2097151]:   30408808..31457383    0 (30408808..31457383)    10
       4: [2097152..4194303]:   52428904..54526055    0 (52428904..54526055)    20
       5: [4194304..8388607]:   104857704..109052007  0 (104857704..109052007)  41
       6: [8388608..16777215]:  209715304..218103911  0 (209715304..218103911)  83
       7: [16777216..33554423]: 452984848..469762055  0 (452984848..469762055) 167
    
    Because it is hard to take back specualtive preallocation, cases
    where there are large slow growing log files on a nearly full
    filesystem may cause premature ENOSPC. Hence as the filesystem nears
    full, the maximum dynamic prealloc size іs reduced according to this
    table (based on 4k block size):
    
    freespace       max prealloc size
      >5%             full extent (8GB)
      4-5%             2GB (8GB >> 2)
      3-4%             1GB (8GB >> 3)
      2-3%           512MB (8GB >> 4)
      1-2%           256MB (8GB >> 5)
      <1%            128MB (8GB >> 6)
    
    This should reduce the amount of space held in speculative
    preallocation for such cases.
    
    The allocsize mount option turns off the dynamic behaviour and fixes
    the prealloc size to whatever the mount option specifies. i.e. the
    behaviour is unchanged.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-01-21  0:46 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-20 20:41 allocsize mount option Peter Vajgel
2011-01-21  0:48 ` Dave Chinner [this message]
  -- strict thread matches above, loose matches on Subject: below --
2010-09-28 18:53 Ivan.Novick
2010-09-29  0:31 ` Dave Chinner
2010-01-24  6:44 Gim Leong Chin
2010-01-15  3:08 Gim Leong Chin
2010-01-14 17:25 Gim Leong Chin
2010-01-14 17:42 ` Eric Sandeen
2010-01-14 23:28 ` Dave Chinner
2010-01-13  9:42 Gim Leong Chin
2010-01-13 10:50 ` Dave Chinner
2010-01-11 17:25 Gim Leong Chin
2010-01-11 18:16 ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110121004816.GW16267@dastard \
    --to=david@fromorbit.com \
    --cc=pv@fb.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox