linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Ted Ts'o <tytso@mit.edu>
Cc: "Pádraig Brady" <P@draigbrady.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: fallocate vs ENOSPC
Date: Thu, 1 Dec 2011 10:39:43 +1100	[thread overview]
Message-ID: <20111130233943.GV7046@dastard> (raw)
In-Reply-To: <20111130170116.GA6154@thunk.org>

On Wed, Nov 30, 2011 at 12:01:16PM -0500, Ted Ts'o wrote:
> What Dave was talking about is something different.  He's suggesting a
> new call which reserves space, but which does not actually make the
> block allocation decision until the time of the write.  He suggested
> tieing it to the file descriptor, but I wonder if it's actually more
> functional to tie it to the process --- that is, the process says,
> "guarantee that I will be able to write 5MB", and writes made by that
> process get counted against that 5MB reservation.  When the process
> exits, any reservation made by that process evaporates.

It needs to be tied to the inode in some way - there's metadata
reservations that need to be made per inode that delayed allocation
reserations are made for to take into account the potential need to
allocate extent tree blocks as well. If we on't do this, then we'll
get ENOSPC reported for writes during writeback that should have
succeeded. And that is a Bad Thing.

Further, you need to track all the ranges that have space reserved
like a special type of delayed allocation extent. That way, when the
write() comes along into the reserved range, you don't account for
it a second time as delayed allocation as the space usage has
already been accounted for.

And then there is the problem of freeing space that you don't use.
Close the fd and you automatically terminate the reservation. fiemap
can be used to find unused reserved ranges. You could probably even
release them by punching the range.

If you have a per-process pool, how do you only use it for the
write() calls you want, on the file you want, over the range you
wanted reserved? And when you have finished writing to that file,
how do you release any unused reservation? How do you know that
you've got reservations remaining?

Then the interesting questions start - how does per-process
reservation interact with quotas? The quota needs to be checked
whenthe reservation is made, and without knowing what file it is
being made for this canot be done sanely. Especially for project
quotas....

Also, per-process reservatin pools can't really be managed through
existing APIs, so we'd need new ones. And then we'd be asking
application developers to use two different models for almost
identical functionality, which means they'll just use the one that
is most effective for their purpose (i.e. fallocate() because they
already have a fd open on the file they are going to write to).

IOWs, all I see from an implementation persepctive of per-process
reservation pools is complexity and nasty corner cases. And from the
user persepctive, an API that doesn't match up with the operations
at hand. i.e. that of writing a file....

> Whether we tie this space reservation to a fd or a process, we also
> would need to decide up front whether this space shows up as "missing"
> by statfs(2)/df or not.

IMO, reserved space is used space - it's not free for just anyone to
use anymore, and it has to be checked and accounted against quotas
even before it gets used....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2011-11-30 23:39 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-25 10:26 fallocate vs ENOSPC Pádraig Brady
2011-11-25 10:40 ` Christoph Hellwig
2011-11-27  3:14   ` Ted Ts'o
2011-11-27 23:43     ` Dave Chinner
2011-11-28  0:13       ` Pádraig Brady
2011-11-28  3:51         ` Dave Chinner
2011-11-28  0:40       ` Theodore Tso
2011-11-28  5:10         ` Dave Chinner
2011-11-28  8:55           ` Pádraig Brady
2011-11-28 10:41             ` tao.peng
2011-11-28 12:02               ` Pádraig Brady
2011-11-28 14:36             ` Theodore Tso
2011-11-28 14:51               ` Pádraig Brady
2011-11-28 20:29                 ` Ted Ts'o
2011-11-28 20:49                   ` Jeremy Allison
2011-11-29 22:39                     ` Eric Sandeen
2011-11-29 23:04                       ` Jeremy Allison
2011-11-29 23:19                         ` Eric Sandeen
2011-11-28 18:49               ` Jeremy Allison
2011-11-29  0:26                 ` Dave Chinner
2011-11-29  0:45                   ` Jeremy Allison
2011-11-29  0:24             ` Dave Chinner
2011-11-29 14:11               ` Pádraig Brady
2011-11-29 23:37                 ` Dave Chinner
2011-11-30  9:28                   ` Pádraig Brady
2011-11-30 15:32                     ` Ted Ts'o
2011-11-30 16:11                       ` Pádraig Brady
2011-11-30 17:01                         ` Ted Ts'o
2011-11-30 23:39                           ` Dave Chinner [this message]
2011-12-01  0:11                           ` Pádraig Brady
2011-12-07 11:42                             ` Pádraig Brady

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111130233943.GV7046@dastard \
    --to=david@fromorbit.com \
    --cc=P@draigbrady.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).