linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Pádraig Brady" <P@draigBrady.com>
Cc: Theodore Tso <tytso@MIT.EDU>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: fallocate vs ENOSPC
Date: Tue, 29 Nov 2011 11:24:32 +1100	[thread overview]
Message-ID: <20111129002432.GA2386@dastard> (raw)
In-Reply-To: <4ED34C66.8050300@draigBrady.com>

On Mon, Nov 28, 2011 at 08:55:02AM +0000, Pádraig Brady wrote:
> On 11/28/2011 05:10 AM, Dave Chinner wrote:
> > Quite frankly, if system utilities like cp and tar start to abuse
> > fallocate() by default so they can get "upfront ENOSPC detection",
> > then I will seriously consider making XFS use delayed allocation for
> > fallocate rather than unwritten extents so we don't lose the past 15
> > years worth of IO and aging optimisations that delayed allocation
> > provides us with....
> 
> For the record I was considering fallocate() for these reasons.
> 
>   1. Improved file layout for subsequent access
>   2. Immediate indication of ENOSPC
>   3. Efficient writing of NUL portions
> 
> You lucidly detailed issues with 1. which I suppose could be somewhat
> mitigated by not fallocating < say 1MB, though I suppose file systems
> could be smarter here and not preallocate small chunks (or when
> otherwise not appropriate).

When you consider that some high end filesystem deployments have alignment
characteristics over 50MB (e.g. so each uncompressed 4k resolution
video frame is located on a different set of non-overlapping disks),
arbitrary "don't fallocate below this amount" heuristics will always
have unforseen failure cases...

In short: leave optimising general allocation strategies to the
filesytems and their developers - there is no One True Solution for
optimal file layout in a given filesystem, let alone across
different filesytems. In fact, I don't even want to think about the
mess fallocate() on everything would make of btrfs because of it's
COW structure - it seems to me to guarantee worse fragmentation than
using delayed allocation...

> We can already get ENOSPC from a write()
> after an fallocate() in certain edge cases, so it would probably make
> sense to expand those cases.

fallocate is for preallocation, not for ENOSPC detection. If you
want efficient and effective ENOSPC detection before writing
anything, then you really want a space -reservation- extension to
fallocate. Filesystems that use delayed allocation already have a
space reservation subsystem - it how they account for space that is
reserved by delayed allocation prior to the real allocation being
done. IMO, allowing userspace some level of access to those
reservations would be more appropriate for early detection of ENOSPC
than using preallocation for everything...

As to efficient writing of NULL ranges - that's what sparse files
are for - you do not need to write or even preallocate NULL ranges
when copying files. Indeed, the most efficient way of dealing with
NULL ranges is to punch a hole and let the filesystem deal with
it.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2011-11-29  0:24 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-25 10:26 fallocate vs ENOSPC Pádraig Brady
2011-11-25 10:40 ` Christoph Hellwig
2011-11-27  3:14   ` Ted Ts'o
2011-11-27 23:43     ` Dave Chinner
2011-11-28  0:13       ` Pádraig Brady
2011-11-28  3:51         ` Dave Chinner
2011-11-28  0:40       ` Theodore Tso
2011-11-28  5:10         ` Dave Chinner
2011-11-28  8:55           ` Pádraig Brady
2011-11-28 10:41             ` tao.peng
2011-11-28 12:02               ` Pádraig Brady
2011-11-28 14:36             ` Theodore Tso
2011-11-28 14:51               ` Pádraig Brady
2011-11-28 20:29                 ` Ted Ts'o
2011-11-28 20:49                   ` Jeremy Allison
2011-11-29 22:39                     ` Eric Sandeen
2011-11-29 23:04                       ` Jeremy Allison
2011-11-29 23:19                         ` Eric Sandeen
2011-11-28 18:49               ` Jeremy Allison
2011-11-29  0:26                 ` Dave Chinner
2011-11-29  0:45                   ` Jeremy Allison
2011-11-29  0:24             ` Dave Chinner [this message]
2011-11-29 14:11               ` Pádraig Brady
2011-11-29 23:37                 ` Dave Chinner
2011-11-30  9:28                   ` Pádraig Brady
2011-11-30 15:32                     ` Ted Ts'o
2011-11-30 16:11                       ` Pádraig Brady
2011-11-30 17:01                         ` Ted Ts'o
2011-11-30 23:39                           ` Dave Chinner
2011-12-01  0:11                           ` Pádraig Brady
2011-12-07 11:42                             ` Pádraig Brady

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111129002432.GA2386@dastard \
    --to=david@fromorbit.com \
    --cc=P@draigBrady.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).