public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com, Brian Candler <B.Candler@pobox.com>
Subject: Re: df bigger than ls?
Date: Wed, 07 Mar 2012 20:17:58 -0600	[thread overview]
Message-ID: <4F5816D6.80801@sandeen.net> (raw)
In-Reply-To: <20120308021054.GM3592@dastard>

On 3/7/12 8:10 PM, Dave Chinner wrote:
> On Wed, Mar 07, 2012 at 12:04:26PM -0600, Eric Sandeen wrote:

...

>> # du -hc bigfile* 2.0G	bigfile1 1.1G	bigfile2 907M
>> bigfile3
>>
>> Dave, is this working as intended?
> 
> Yes. Your problem is that you have a very small filesystem, which is
> not the case that we optimise XFS for. :/

Well, sure ;)

>> I know the speculative
>> preallocation amount for new files is supposed to go down as the
>> fs fills, but is there no way to discard prealloc space to avoid
>> ENOSPC on other files?
> 
> We don't track what files have current active preallocations, we
> only reduce the preallocation size as the filesystem nears ENOSPC.
> This generally works just fine in situations where the filesystem
> size is significantly greater than the maximum extent size. i.e. the
> common case
> 
> The problem you are tripping over here is that the maximum extent
> size is greater than the filesystem size, so the preallocation size
> is also greater than the filesystem size and hence can contribute
> significantly to premature ENOSPC. I see two possible ways to
> minimise this problem:

FWIW I'm not overly worried about my 4G filesystem, that was just
an obviously extreme case to test the behavior.

I'd be more interested to know if the behavior was causing any
issues for Brian's case, or if it was just confusing.  :)

> 	1. reduce the maximum speculative preallocation size based
> 	on filesystem size at mount time.
> 
> 	2. track inodes with active speculative preallocation and
> 	have an enospc based trigger that can find them and truncate
> 	away excess idle speculative preallocation.
> 
> The first is relatively easy to do, but will only reduce the
> incidence of your problem - we still need to allow significant
> preallocation sizes (e.g. 64MB) to avoid the fragmentation problems.

Might be worth it; even though a 4G fs is obviously not a design point,
it's good not to have diabolical behavior.  Although I suppose if xfstests
doesn't hit it in everything it does, it's probably not a big deal.
ISTR you did have to fix one thing up, though.

> The second is needed to reclaim the space we've already preallocated
> but is not being used. That's more complex to do - probably a radix
> tree bit and a periodic background scan to reduce the time window
> the preallocation sits around from cache lifetime to "idle for some
> time" along with a on-demand, synchronous ENOSPC scan. This will
> need some more thought as to how to do it effectively, but isn't
> impossible to do....

It seems worth thinking about.  I guess I'm still a little concerned
about the ENOSPC case; it could lead to some confusion - I could imagine
several hundreds of gigs under preallocation, with a reasonable-sized
filesystem returning ENOSPC quite early.

-Eric

> Cheers,
> 
> Dave.
>>
>> -Eric
>>
>>> root@storage1:~# du -h /disk*/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 2.0G	/disk10/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 2.0G	/disk11/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 2.0G	/disk12/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk1/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk2/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 2.0G	/disk3/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 2.0G	/disk4/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 2.0G	/disk5/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 2.0G	/disk6/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 2.0G	/disk7/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 2.0G	/disk8/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 2.0G	/disk9/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> root@storage1:~# echo 3 >/proc/sys/vm/drop_caches 
>>> root@storage1:~# du -h /disk*/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk10/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk11/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk12/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk1/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk2/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk3/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk4/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk5/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk6/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk7/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk8/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> 1.1G	/disk9/scratch2/work/PRSRA1/PRSRA1.1.0.bff
>>> root@storage1:~# 
>>>
>>> Very odd, but not really a major problem other than the confusion it causes.
>>>
>>> Regards,
>>>
>>> Brian.
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>>
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2012-03-08  2:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-07 15:54 df bigger than ls? Brian Candler
2012-03-07 17:16 ` Brian Candler
2012-03-07 18:04   ` Eric Sandeen
2012-03-08  2:10     ` Dave Chinner
2012-03-08  2:17       ` Eric Sandeen [this message]
2012-03-08  9:10         ` Brian Candler
2012-03-08  9:28           ` Dave Chinner
2012-03-08 16:23       ` Ben Myers
2012-03-09  0:17         ` Dave Chinner
2012-03-09  1:56           ` Ben Myers
2012-03-09  2:57             ` Dave Chinner
2012-03-08  8:04     ` Arkadiusz Miśkiewicz
2012-03-08 10:03       ` Dave Chinner
2012-03-08  8:50     ` Brian Candler
2012-03-08  9:59       ` Brian Candler
2012-03-08 10:22         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F5816D6.80801@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=B.Candler@pobox.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox