From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
xfs@oss.sgi.com, linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-btrfs <linux-btrfs@vger.kernel.org>,
linux-api@vger.kernel.org
Subject: Re: fallocate mode flag for "unshare blocks"?
Date: Thu, 31 Mar 2016 22:18:50 +1100 [thread overview]
Message-ID: <20160331111850.GP11812@dastard> (raw)
In-Reply-To: <20160331075440.GA4209@infradead.org>
On Thu, Mar 31, 2016 at 12:54:40AM -0700, Christoph Hellwig wrote:
> On Thu, Mar 31, 2016 at 12:18:13PM +1100, Dave Chinner wrote:
> > On Wed, Mar 30, 2016 at 11:27:55AM -0700, Darrick J. Wong wrote:
> > > Or is it ok that fallocate could block, potentially for a long time as
> > > we stream cows through the page cache (or however unshare works
> > > internally)? Those same programs might not be expecting fallocate to
> > > take a long time.
> >
> > Yes, it's perfectly fine for fallocate to block for long periods of
> > time. See what gfs2 does during preallocation of blocks - it ends up
> > calling sb_issue_zerout() because it doesn't have unwritten
> > extents, and hence can block for long periods of time....
>
> gfs2 fallocate is an implementation that will cause all but the most
> trivial users real pain. Even the initial XFS implementation just
> marking the transactions synchronous made it unusable for all kinds
> of applications, and this is much worse. E.g. a NFS ALLOCATE operation
> to gfs2 will probab;ly hand your connection for extended periods of
> time.
>
> If we need to support something like what gfs2 does we should have a
> separate flag for it.
Using fallocate() for preallocation was always intended to
be a faster, more efficient method allocating zeroed space
than having userspace write blocks of data. Faster, more efficient
does not mean instantaneous, and gfs2 using sb_issue_zerout() means
that if the hardware has zeroing offloads (deterministic trim, write
same, etc) it will use them, and that will be much faster than
writing zeros from userspace.
IMO, what gfs2 is definitely within the intended usage of
fallocate() for accelerating the preallocation of blocks.
Yes, it may not be optimal for things like NFS servers which haven't
considered that a fallocate based offload operation might take some
time to execute, but that's not a problem with fallocate. i.e.
that's a problem with the nfs server ALLOCATE implementation not
being prepared to return NFSERR_JUKEBOX to prevent client side hangs
and timeouts while the operation is run....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2016-03-31 11:18 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20160302155007.GB7125@infradead.org>
[not found] ` <20160302155007.GB7125-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-03-30 18:27 ` fallocate mode flag for "unshare blocks"? Darrick J. Wong
2016-03-30 18:58 ` Austin S. Hemmelgarn
2016-03-31 7:58 ` Christoph Hellwig
2016-03-31 11:13 ` Austin S. Hemmelgarn
[not found] ` <20160330182755.GC2236-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-31 0:32 ` Liu Bo
2016-03-31 7:55 ` Christoph Hellwig
[not found] ` <20160331075529.GB4209-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-03-31 15:31 ` Andreas Dilger
2016-03-31 15:43 ` Austin S. Hemmelgarn
[not found] ` <3E147309-67EA-4B29-B4E0-883BA03B7BFC-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2016-03-31 16:47 ` Henk Slager
2016-03-31 11:18 ` Austin S. Hemmelgarn
2016-03-31 11:38 ` Austin S. Hemmelgarn
[not found] ` <56FD079F.3060606-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-03-31 19:52 ` Liu Bo
2016-03-31 1:18 ` Dave Chinner
2016-03-31 7:54 ` Christoph Hellwig
2016-03-31 11:18 ` Dave Chinner [this message]
2016-03-31 18:08 ` J. Bruce Fields
2016-03-31 18:19 ` Darrick J. Wong
[not found] ` <20160331180821.GD22462-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2016-03-31 19:47 ` Andreas Dilger
[not found] ` <779E9BCF-8224-44FE-8AAE-E0341A7B475C-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2016-03-31 22:20 ` Dave Chinner
2016-03-31 22:34 ` J. Bruce Fields
2016-04-01 0:33 ` Dave Chinner
2016-04-01 2:00 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160331111850.GP11812@dastard \
--to=david@fromorbit.com \
--cc=darrick.wong@oracle.com \
--cc=hch@infradead.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).