public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Candler <B.Candler@pobox.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: xfs@oss.sgi.com
Subject: Re: df bigger than ls?
Date: Thu, 8 Mar 2012 08:50:35 +0000	[thread overview]
Message-ID: <20120308085035.GA23992@nsrc.org> (raw)
In-Reply-To: <4F57A32A.5010704@sandeen.net>

On Wed, Mar 07, 2012 at 12:04:26PM -0600, Eric Sandeen wrote:
> XFS speculatively preallocates space off the end of a file.  The amount of
> space allocated depends on the present size of the file, and the amount of
> available free space.  This can be overridden
> with mount -o allocsize=64k (or other size for example)

Aha.  This may well be what is screwing up gluster's disk usage on a striped
volume - I believe XFS is preallocating space which is actually going to end
up being a hole!

Here are the extent maps for two of the twelve files in my stripe:

root@storage1:~# xfs_bmap /disk1/scratch2/work/PRSRA1/PRSRA1.1.0.bff 
/disk1/scratch2/work/PRSRA1/PRSRA1.1.0.bff:
	0: [0..255]: 2933325744..2933325999
	1: [256..3071]: hole
	2: [3072..3327]: 2933326000..2933326255
	3: [3328..6143]: hole
	4: [6144..8191]: 2933326472..2933328519
	5: [8192..9215]: hole
	6: [9216..13311]: 2933369480..2933373575
	7: [13312..15359]: hole
	8: [15360..23551]: 2933375624..2933383815
	9: [23552..24575]: hole
	10: [24576..40959]: 2933587168..2933603551
	11: [40960..43007]: hole
	12: [43008..75775]: 2933623008..2933655775
	13: [75776..76799]: hole
	14: [76800..142335]: 2933656800..2933722335
	15: [142336..144383]: hole
	16: [144384..275455]: 2933724384..2933855455
	17: [275456..276479]: hole
	18: [276480..538623]: 2935019808..2935281951
	19: [538624..540671]: hole
	20: [540672..1064959]: 2935284000..2935808287
	21: [1064960..1065983]: hole
	22: [1065984..2114559]: 2935809312..2936857887
	23: [2114560..2116607]: hole
	24: [2116608..2119935]: 2943037984..2943041311
root@storage1:~# xfs_bmap /disk2/scratch2/work/PRSRA1/PRSRA1.1.0.bff 
/disk2/scratch2/work/PRSRA1/PRSRA1.1.0.bff:
	0: [0..255]: hole
	1: [256..511]: 2933194944..2933195199
	2: [512..3327]: hole
	3: [3328..3839]: 2933195200..2933195711
	4: [3840..6399]: hole
	5: [6400..8447]: 2933204416..2933206463
	6: [8448..9471]: hole
	7: [9472..13567]: 2933328792..2933332887
	8: [13568..15615]: hole
	9: [15616..23807]: 2933334936..2933343127
	10: [23808..24831]: hole
	11: [24832..41215]: 2933344152..2933360535
	12: [41216..43263]: hole
	13: [43264..76031]: 2934672032..2934704799
	14: [76032..77055]: hole
	15: [77056..142591]: 2934705824..2934771359
	16: [142592..144639]: hole
	17: [144640..275711]: 2934773408..2934904479
	18: [275712..276735]: hole
	19: [276736..538879]: 2934343328..2934605471
	20: [538880..540927]: hole
	21: [540928..1065215]: 2935498152..2936022439
	22: [1065216..1066239]: hole
	23: [1066240..2114815]: 2936023464..2937072039
	24: [2114816..2116863]: hole
	25: [2116864..2120191]: 2937074088..2937077415

You can see that at the start it works fine. There is a stripe size of
256 blocks, so:

* disk 1:    data for 1 x 256 blocks     <-- stripe 0, chunk 0
             hole for 11 x 256 blocks
             data for 1 x 256 block      <-- stripe 0, chunk 1
             ...

* disk 2:    hole for 1 x 256 blocks
             data for 1 x 256 blocks     <-- stripe 1, chunk 0
             hole for 11 x 256 blocks
             data for 1 x 256 blocks     <-- stripe 1, chunk 1
             ...

But after four chunks it gets screwed up. By the end the files are mostly
extent and hardly any hole.  The extent sizes increase in roughly powers of
two which seems to match the speculative preallocation algorithm.

I think this ought to be fixable. For example, if you seek *into* the
preallocated area and start writing, could you change the preallocation to
start at this location with a hole before?

(But would that mess up some 'seeky' workloads like databases? But they
would have ended up creating holes in filesystems which don't have
preallocation, so I doubt they do this)

Or for a more sledgehammer approach: if a file already contains any holes
then you could just disable preallocation completely.

Regards,

Brian.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2012-03-08  8:50 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-07 15:54 df bigger than ls? Brian Candler
2012-03-07 17:16 ` Brian Candler
2012-03-07 18:04   ` Eric Sandeen
2012-03-08  2:10     ` Dave Chinner
2012-03-08  2:17       ` Eric Sandeen
2012-03-08  9:10         ` Brian Candler
2012-03-08  9:28           ` Dave Chinner
2012-03-08 16:23       ` Ben Myers
2012-03-09  0:17         ` Dave Chinner
2012-03-09  1:56           ` Ben Myers
2012-03-09  2:57             ` Dave Chinner
2012-03-08  8:04     ` Arkadiusz Miśkiewicz
2012-03-08 10:03       ` Dave Chinner
2012-03-08  8:50     ` Brian Candler [this message]
2012-03-08  9:59       ` Brian Candler
2012-03-08 10:22         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120308085035.GA23992@nsrc.org \
    --to=b.candler@pobox.com \
    --cc=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox