All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Pádraig Brady" <P@draigBrady.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Theodore Tso <tytso@MIT.EDU>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: fallocate vs ENOSPC
Date: Tue, 29 Nov 2011 14:11:48 +0000	[thread overview]
Message-ID: <4ED4E824.4030107@draigBrady.com> (raw)
In-Reply-To: <20111129002432.GA2386@dastard>

On 11/29/2011 12:24 AM, Dave Chinner wrote:
> On Mon, Nov 28, 2011 at 08:55:02AM +0000, Pádraig Brady wrote:
>> On 11/28/2011 05:10 AM, Dave Chinner wrote:
>>> Quite frankly, if system utilities like cp and tar start to abuse
>>> fallocate() by default so they can get "upfront ENOSPC detection",
>>> then I will seriously consider making XFS use delayed allocation for
>>> fallocate rather than unwritten extents so we don't lose the past 15
>>> years worth of IO and aging optimisations that delayed allocation
>>> provides us with....
>>
>> For the record I was considering fallocate() for these reasons.
>>
>>   1. Improved file layout for subsequent access
>>   2. Immediate indication of ENOSPC
>>   3. Efficient writing of NUL portions
>>
>> You lucidly detailed issues with 1. which I suppose could be somewhat
>> mitigated by not fallocating < say 1MB, though I suppose file systems
>> could be smarter here and not preallocate small chunks (or when
>> otherwise not appropriate).
> 
> When you consider that some high end filesystem deployments have alignment
> characteristics over 50MB (e.g. so each uncompressed 4k resolution
> video frame is located on a different set of non-overlapping disks),
> arbitrary "don't fallocate below this amount" heuristics will always
> have unforseen failure cases...

So about this alignment policy, I don't understand the issues so I'm guessing here.
You say delalloc packs files, while fallocate() will align on XFS according to
the stripe config. Is that assuming that when writing lots of files, that they
will be more likely to be read together, rather than independently.
That's a big assumption if true. Also the converse is a big assumption, that
fallocate() should be aligned, as that's more likely to be read independently.

> In short: leave optimising general allocation strategies to the
> filesytems and their developers - there is no One True Solution for
> optimal file layout in a given filesystem, let alone across
> different filesytems. In fact, I don't even want to think about the
> mess fallocate() on everything would make of btrfs because of it's
> COW structure - it seems to me to guarantee worse fragmentation than
> using delayed allocation...
> 
>> We can already get ENOSPC from a write()
>> after an fallocate() in certain edge cases, so it would probably make
>> sense to expand those cases.
> 
> fallocate is for preallocation, not for ENOSPC detection. If you
> want efficient and effective ENOSPC detection before writing
> anything, then you really want a space -reservation- extension to
> fallocate. Filesystems that use delayed allocation already have a
> space reservation subsystem - it how they account for space that is
> reserved by delayed allocation prior to the real allocation being
> done. IMO, allowing userspace some level of access to those
> reservations would be more appropriate for early detection of ENOSPC
> than using preallocation for everything...

Fair enough, so fallocate() would be a superset of reserve(),
though I'm having a hard time thinking of why one might ever need to
fallocate() then.

> As to efficient writing of NULL ranges - that's what sparse files
> are for - you do not need to write or even preallocate NULL ranges
> when copying files. Indeed, the most efficient way of dealing with
> NULL ranges is to punch a hole and let the filesystem deal with
> it.....

well not for `cp --sparse=never` which might be used
so that processing of the copy will not result in ENOSPC.

I'm also linking here to a related discussion.
http://oss.sgi.com/archives/xfs/2011-06/msg00064.html

Note also that the gold linker does fallocate() on output files by default.

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-11-29 14:11 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-25 10:26 fallocate vs ENOSPC Pádraig Brady
2011-11-25 10:40 ` Christoph Hellwig
2011-11-27  3:14   ` Ted Ts'o
2011-11-27 23:43     ` Dave Chinner
2011-11-28  0:13       ` Pádraig Brady
2011-11-28  3:51         ` Dave Chinner
2011-11-28  0:40       ` Theodore Tso
2011-11-28  5:10         ` Dave Chinner
2011-11-28  8:55           ` Pádraig Brady
2011-11-28 10:41             ` tao.peng
2011-11-28 12:02               ` Pádraig Brady
2011-11-28 14:36             ` Theodore Tso
2011-11-28 14:51               ` Pádraig Brady
2011-11-28 20:29                 ` Ted Ts'o
2011-11-28 20:49                   ` Jeremy Allison
2011-11-29 22:39                     ` Eric Sandeen
2011-11-29 23:04                       ` Jeremy Allison
2011-11-29 23:19                         ` Eric Sandeen
2011-11-28 18:49               ` Jeremy Allison
2011-11-29  0:26                 ` Dave Chinner
2011-11-29  0:45                   ` Jeremy Allison
2011-11-29  0:24             ` Dave Chinner
2011-11-29 14:11               ` Pádraig Brady [this message]
2011-11-29 23:37                 ` Dave Chinner
2011-11-30  9:28                   ` Pádraig Brady
2011-11-30 15:32                     ` Ted Ts'o
2011-11-30 16:11                       ` Pádraig Brady
2011-11-30 17:01                         ` Ted Ts'o
2011-11-30 23:39                           ` Dave Chinner
2011-12-01  0:11                           ` Pádraig Brady
2011-12-07 11:42                             ` Pádraig Brady

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ED4E824.4030107@draigBrady.com \
    --to=p@draigbrady.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.