All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	xfs@oss.sgi.com, linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	linux-api@vger.kernel.org
Subject: Re: fallocate mode flag for "unshare blocks"?
Date: Thu, 31 Mar 2016 07:13:50 -0400	[thread overview]
Message-ID: <56FD066E.4080204@gmail.com> (raw)
In-Reply-To: <20160331075801.GC4209@infradead.org>

On 2016-03-31 03:58, Christoph Hellwig wrote:
> On Wed, Mar 30, 2016 at 02:58:38PM -0400, Austin S. Hemmelgarn wrote:
>> Nothing that I can find in the man-pages or API documentation for Linux's
>> fallocate explicitly says that it will be fast.  There are bits that say it
>> should be efficient, but that is not itself well defined (given context, I
>> would assume it to mean that it doesn't use as much I/O as writing out that
>> many bytes of zero data, not necessarily that it will return quickly).
>
> And that's pretty much as narrow as an defintion we get.  But apparently
> gfs2 already breaks that expectation :(
GFS2 breaks other expectations as well (mostly stuff with locking) in 
arguably more significant ways, so I would not personally consider it to 
be precedent for breaking this on other filesystems.
>
>>> delalloc system is careful enough to check that there are enough free
>>> blocks to handle both the allocation and the metadata updates.  The
>>> only gap in this scheme that I can see is if we fallocate, crash, and
>>> upon restart the program then tries to write without retrying the
>>> fallocate.  Can we trade some performance for the added requirement
>>> that we must fallocate -> write -> fsync, and retry the trio if we
>>> crash before the fsync returns?  I think that's already an implicit
>>> requirement, so we might be ok here.
>> Most of the software I've seen that doesn't use fallocate like this is
>> either doing odd things otherwise, or is just making sure it has space for
>> temporary files, so I think it is probably safe to require this.
>
> posix_fallocate gurantees you that you don't get ENOSPC from the write,
> and there is plenty of software relying on that or crashing / cause data
> integrity problems that way.
>
posix_fallocate is not the same thing as the fallocate syscall.  It's 
there for compatibility, it has less functionality, and most 
importantly, it _can_ be slow (because at least glibc will emulate it if 
the underlying FS doesn't support fallocate, which means it's no faster 
than just using dd).

WARNING: multiple messages have this Message-ID (diff)
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-api@vger.kernel.org, xfs@oss.sgi.com,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: Re: fallocate mode flag for "unshare blocks"?
Date: Thu, 31 Mar 2016 07:13:50 -0400	[thread overview]
Message-ID: <56FD066E.4080204@gmail.com> (raw)
In-Reply-To: <20160331075801.GC4209@infradead.org>

On 2016-03-31 03:58, Christoph Hellwig wrote:
> On Wed, Mar 30, 2016 at 02:58:38PM -0400, Austin S. Hemmelgarn wrote:
>> Nothing that I can find in the man-pages or API documentation for Linux's
>> fallocate explicitly says that it will be fast.  There are bits that say it
>> should be efficient, but that is not itself well defined (given context, I
>> would assume it to mean that it doesn't use as much I/O as writing out that
>> many bytes of zero data, not necessarily that it will return quickly).
>
> And that's pretty much as narrow as an defintion we get.  But apparently
> gfs2 already breaks that expectation :(
GFS2 breaks other expectations as well (mostly stuff with locking) in 
arguably more significant ways, so I would not personally consider it to 
be precedent for breaking this on other filesystems.
>
>>> delalloc system is careful enough to check that there are enough free
>>> blocks to handle both the allocation and the metadata updates.  The
>>> only gap in this scheme that I can see is if we fallocate, crash, and
>>> upon restart the program then tries to write without retrying the
>>> fallocate.  Can we trade some performance for the added requirement
>>> that we must fallocate -> write -> fsync, and retry the trio if we
>>> crash before the fsync returns?  I think that's already an implicit
>>> requirement, so we might be ok here.
>> Most of the software I've seen that doesn't use fallocate like this is
>> either doing odd things otherwise, or is just making sure it has space for
>> temporary files, so I think it is probably safe to require this.
>
> posix_fallocate gurantees you that you don't get ENOSPC from the write,
> and there is plenty of software relying on that or crashing / cause data
> integrity problems that way.
>
posix_fallocate is not the same thing as the fallocate syscall.  It's 
there for compatibility, it has less functionality, and most 
importantly, it _can_ be slow (because at least glibc will emulate it if 
the underlying FS doesn't support fallocate, which means it's no faster 
than just using dd).

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-03-31 11:13 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-02 15:50 falloc vs reflink revisited Christoph Hellwig
2016-03-02 16:42 ` Darrick J. Wong
     [not found] ` <20160302155007.GB7125-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-03-30 18:27   ` fallocate mode flag for "unshare blocks"? Darrick J. Wong
2016-03-30 18:27     ` Darrick J. Wong
2016-03-30 18:27     ` Darrick J. Wong
2016-03-30 18:58     ` Austin S. Hemmelgarn
2016-03-30 18:58       ` Austin S. Hemmelgarn
2016-03-31  7:58       ` Christoph Hellwig
2016-03-31  7:58         ` Christoph Hellwig
2016-03-31 11:13         ` Austin S. Hemmelgarn [this message]
2016-03-31 11:13           ` Austin S. Hemmelgarn
     [not found]     ` <20160330182755.GC2236-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-31  0:32       ` Liu Bo
2016-03-31  0:32         ` Liu Bo
2016-03-31  0:32         ` Liu Bo
2016-03-31  7:55         ` Christoph Hellwig
2016-03-31  7:55           ` Christoph Hellwig
     [not found]           ` <20160331075529.GB4209-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-03-31 15:31             ` Andreas Dilger
2016-03-31 15:31               ` Andreas Dilger
2016-03-31 15:31               ` Andreas Dilger
2016-03-31 15:43               ` Austin S. Hemmelgarn
2016-03-31 15:43                 ` Austin S. Hemmelgarn
     [not found]               ` <3E147309-67EA-4B29-B4E0-883BA03B7BFC-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2016-03-31 16:47                 ` Henk Slager
2016-03-31 16:47                   ` Henk Slager
2016-03-31 16:47                   ` Henk Slager
2016-03-31 11:18         ` Austin S. Hemmelgarn
2016-03-31 11:18           ` Austin S. Hemmelgarn
2016-03-31 11:38           ` Austin S. Hemmelgarn
2016-03-31 11:38             ` Austin S. Hemmelgarn
     [not found]           ` <56FD079F.3060606-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-03-31 19:52             ` Liu Bo
2016-03-31 19:52               ` Liu Bo
2016-03-31 19:52               ` Liu Bo
2016-03-31  1:18     ` Dave Chinner
2016-03-31  1:18       ` Dave Chinner
2016-03-31  7:54       ` Christoph Hellwig
2016-03-31  7:54         ` Christoph Hellwig
2016-03-31 11:18         ` Dave Chinner
2016-03-31 11:18           ` Dave Chinner
2016-03-31 18:08           ` J. Bruce Fields
2016-03-31 18:08             ` J. Bruce Fields
2016-03-31 18:19             ` Darrick J. Wong
2016-03-31 18:19               ` Darrick J. Wong
     [not found]             ` <20160331180821.GD22462-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2016-03-31 19:47               ` Andreas Dilger
2016-03-31 19:47                 ` Andreas Dilger
2016-03-31 19:47                 ` Andreas Dilger
     [not found]                 ` <779E9BCF-8224-44FE-8AAE-E0341A7B475C-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2016-03-31 22:20                   ` Dave Chinner
2016-03-31 22:20                     ` Dave Chinner
2016-03-31 22:20                     ` Dave Chinner
2016-03-31 22:34                     ` J. Bruce Fields
2016-03-31 22:34                       ` J. Bruce Fields
2016-04-01  0:33                       ` Dave Chinner
2016-04-01  0:33                         ` Dave Chinner
2016-04-01  2:00                         ` J. Bruce Fields
2016-04-01  2:00                           ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FD066E.4080204@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.