From: Liu Bo <bo.li.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: "Darrick J. Wong" <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org,
linux-fsdevel
<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
linux-btrfs <linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: fallocate mode flag for "unshare blocks"?
Date: Wed, 30 Mar 2016 17:32:42 -0700 [thread overview]
Message-ID: <20160331003242.GA5813@localhost.localdomain> (raw)
In-Reply-To: <20160330182755.GC2236-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
On Wed, Mar 30, 2016 at 11:27:55AM -0700, Darrick J. Wong wrote:
> Hi all,
>
> Christoph and I have been working on adding reflink and CoW support to
> XFS recently. Since the purpose of (mode 0) fallocate is to make sure
> that future file writes cannot ENOSPC, I extended the XFS fallocate
> handler to unshare any shared blocks via the copy on write mechanism I
> built for it. However, Christoph shared the following concerns with
> me about that interpretation:
>
> > I know that I suggested unsharing blocks on fallocate, but it turns out
> > this is causing problems. Applications expect falloc to be a fast
> > metadata operation, and copying a potentially large number of blocks
> > is against that expextation. This is especially bad for the NFS
> > server, which should not be blocked for a long time in a synchronous
> > operation.
> >
> > I think we'll have to remove the unshare and just fail the fallocate
> > for a reflinked region for now. I still think it makes sense to expose
> > an unshare operation, and we probably should make that another
> > fallocate mode.
I'm expecting fallocate to be fast, too.
Well, btrfs fallocate doesn't allocate space if it's a shared one
because it thinks the space is already allocated. So a later overwrite
over this shared extent may hit enospc errors.
>
> With that in mind, how do you all think we ought to resolve this?
> Should we add a new fallocate mode flag that means "unshare the shared
> blocks"? Obviously, this unshare flag cannot be used in conjunction
> with hole punching, zero range, insert range, or collapse range. This
> breaks the expectation that writing to a file after fallocate won't
> ENOSPC.
>
> Or is it ok that fallocate could block, potentially for a long time as
> we stream cows through the page cache (or however unshare works
> internally)? Those same programs might not be expecting fallocate to
> take a long time.
>
> Can we do better than either solution? It occurs to me that XFS does
> unshare by reading the file data into the pagecache, marking the pages
> dirty, and flushing the dirty pages; performance could be improved by
> skipping the flush at the end. We won't ENOSPC, because the XFS
> delalloc system is careful enough to check that there are enough free
> blocks to handle both the allocation and the metadata updates. The
> only gap in this scheme that I can see is if we fallocate, crash, and
> upon restart the program then tries to write without retrying the
> fallocate. Can we trade some performance for the added requirement
> that we must fallocate -> write -> fsync, and retry the trio if we
> crash before the fsync returns? I think that's already an implicit
> requirement, so we might be ok here.
>
> Opinions? I rather like the last option, though I've only just
> thought of it and have not had time to examine it thoroughly, and it's
> specific to XFS. :)
I'd vote for another mode for 'unshare the shared blocks'.
Thanks,
-liubo
WARNING: multiple messages have this Message-ID (diff)
From: Liu Bo <bo.li.liu@oracle.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>,
xfs@oss.sgi.com, linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-btrfs <linux-btrfs@vger.kernel.org>,
linux-api@vger.kernel.org
Subject: Re: fallocate mode flag for "unshare blocks"?
Date: Wed, 30 Mar 2016 17:32:42 -0700 [thread overview]
Message-ID: <20160331003242.GA5813@localhost.localdomain> (raw)
In-Reply-To: <20160330182755.GC2236@birch.djwong.org>
On Wed, Mar 30, 2016 at 11:27:55AM -0700, Darrick J. Wong wrote:
> Hi all,
>
> Christoph and I have been working on adding reflink and CoW support to
> XFS recently. Since the purpose of (mode 0) fallocate is to make sure
> that future file writes cannot ENOSPC, I extended the XFS fallocate
> handler to unshare any shared blocks via the copy on write mechanism I
> built for it. However, Christoph shared the following concerns with
> me about that interpretation:
>
> > I know that I suggested unsharing blocks on fallocate, but it turns out
> > this is causing problems. Applications expect falloc to be a fast
> > metadata operation, and copying a potentially large number of blocks
> > is against that expextation. This is especially bad for the NFS
> > server, which should not be blocked for a long time in a synchronous
> > operation.
> >
> > I think we'll have to remove the unshare and just fail the fallocate
> > for a reflinked region for now. I still think it makes sense to expose
> > an unshare operation, and we probably should make that another
> > fallocate mode.
I'm expecting fallocate to be fast, too.
Well, btrfs fallocate doesn't allocate space if it's a shared one
because it thinks the space is already allocated. So a later overwrite
over this shared extent may hit enospc errors.
>
> With that in mind, how do you all think we ought to resolve this?
> Should we add a new fallocate mode flag that means "unshare the shared
> blocks"? Obviously, this unshare flag cannot be used in conjunction
> with hole punching, zero range, insert range, or collapse range. This
> breaks the expectation that writing to a file after fallocate won't
> ENOSPC.
>
> Or is it ok that fallocate could block, potentially for a long time as
> we stream cows through the page cache (or however unshare works
> internally)? Those same programs might not be expecting fallocate to
> take a long time.
>
> Can we do better than either solution? It occurs to me that XFS does
> unshare by reading the file data into the pagecache, marking the pages
> dirty, and flushing the dirty pages; performance could be improved by
> skipping the flush at the end. We won't ENOSPC, because the XFS
> delalloc system is careful enough to check that there are enough free
> blocks to handle both the allocation and the metadata updates. The
> only gap in this scheme that I can see is if we fallocate, crash, and
> upon restart the program then tries to write without retrying the
> fallocate. Can we trade some performance for the added requirement
> that we must fallocate -> write -> fsync, and retry the trio if we
> crash before the fsync returns? I think that's already an implicit
> requirement, so we might be ok here.
>
> Opinions? I rather like the last option, though I've only just
> thought of it and have not had time to examine it thoroughly, and it's
> specific to XFS. :)
I'd vote for another mode for 'unshare the shared blocks'.
Thanks,
-liubo
WARNING: multiple messages have this Message-ID (diff)
From: Liu Bo <bo.li.liu@oracle.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-api@vger.kernel.org,
linux-btrfs <linux-btrfs@vger.kernel.org>,
xfs@oss.sgi.com
Subject: Re: fallocate mode flag for "unshare blocks"?
Date: Wed, 30 Mar 2016 17:32:42 -0700 [thread overview]
Message-ID: <20160331003242.GA5813@localhost.localdomain> (raw)
In-Reply-To: <20160330182755.GC2236@birch.djwong.org>
On Wed, Mar 30, 2016 at 11:27:55AM -0700, Darrick J. Wong wrote:
> Hi all,
>
> Christoph and I have been working on adding reflink and CoW support to
> XFS recently. Since the purpose of (mode 0) fallocate is to make sure
> that future file writes cannot ENOSPC, I extended the XFS fallocate
> handler to unshare any shared blocks via the copy on write mechanism I
> built for it. However, Christoph shared the following concerns with
> me about that interpretation:
>
> > I know that I suggested unsharing blocks on fallocate, but it turns out
> > this is causing problems. Applications expect falloc to be a fast
> > metadata operation, and copying a potentially large number of blocks
> > is against that expextation. This is especially bad for the NFS
> > server, which should not be blocked for a long time in a synchronous
> > operation.
> >
> > I think we'll have to remove the unshare and just fail the fallocate
> > for a reflinked region for now. I still think it makes sense to expose
> > an unshare operation, and we probably should make that another
> > fallocate mode.
I'm expecting fallocate to be fast, too.
Well, btrfs fallocate doesn't allocate space if it's a shared one
because it thinks the space is already allocated. So a later overwrite
over this shared extent may hit enospc errors.
>
> With that in mind, how do you all think we ought to resolve this?
> Should we add a new fallocate mode flag that means "unshare the shared
> blocks"? Obviously, this unshare flag cannot be used in conjunction
> with hole punching, zero range, insert range, or collapse range. This
> breaks the expectation that writing to a file after fallocate won't
> ENOSPC.
>
> Or is it ok that fallocate could block, potentially for a long time as
> we stream cows through the page cache (or however unshare works
> internally)? Those same programs might not be expecting fallocate to
> take a long time.
>
> Can we do better than either solution? It occurs to me that XFS does
> unshare by reading the file data into the pagecache, marking the pages
> dirty, and flushing the dirty pages; performance could be improved by
> skipping the flush at the end. We won't ENOSPC, because the XFS
> delalloc system is careful enough to check that there are enough free
> blocks to handle both the allocation and the metadata updates. The
> only gap in this scheme that I can see is if we fallocate, crash, and
> upon restart the program then tries to write without retrying the
> fallocate. Can we trade some performance for the added requirement
> that we must fallocate -> write -> fsync, and retry the trio if we
> crash before the fsync returns? I think that's already an implicit
> requirement, so we might be ok here.
>
> Opinions? I rather like the last option, though I've only just
> thought of it and have not had time to examine it thoroughly, and it's
> specific to XFS. :)
I'd vote for another mode for 'unshare the shared blocks'.
Thanks,
-liubo
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2016-03-31 0:32 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-02 15:50 falloc vs reflink revisited Christoph Hellwig
2016-03-02 16:42 ` Darrick J. Wong
[not found] ` <20160302155007.GB7125-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-03-30 18:27 ` fallocate mode flag for "unshare blocks"? Darrick J. Wong
2016-03-30 18:27 ` Darrick J. Wong
2016-03-30 18:27 ` Darrick J. Wong
2016-03-30 18:58 ` Austin S. Hemmelgarn
2016-03-30 18:58 ` Austin S. Hemmelgarn
2016-03-31 7:58 ` Christoph Hellwig
2016-03-31 7:58 ` Christoph Hellwig
2016-03-31 11:13 ` Austin S. Hemmelgarn
2016-03-31 11:13 ` Austin S. Hemmelgarn
[not found] ` <20160330182755.GC2236-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-31 0:32 ` Liu Bo [this message]
2016-03-31 0:32 ` Liu Bo
2016-03-31 0:32 ` Liu Bo
2016-03-31 7:55 ` Christoph Hellwig
2016-03-31 7:55 ` Christoph Hellwig
[not found] ` <20160331075529.GB4209-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-03-31 15:31 ` Andreas Dilger
2016-03-31 15:31 ` Andreas Dilger
2016-03-31 15:31 ` Andreas Dilger
2016-03-31 15:43 ` Austin S. Hemmelgarn
2016-03-31 15:43 ` Austin S. Hemmelgarn
[not found] ` <3E147309-67EA-4B29-B4E0-883BA03B7BFC-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2016-03-31 16:47 ` Henk Slager
2016-03-31 16:47 ` Henk Slager
2016-03-31 16:47 ` Henk Slager
2016-03-31 11:18 ` Austin S. Hemmelgarn
2016-03-31 11:18 ` Austin S. Hemmelgarn
2016-03-31 11:38 ` Austin S. Hemmelgarn
2016-03-31 11:38 ` Austin S. Hemmelgarn
[not found] ` <56FD079F.3060606-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-03-31 19:52 ` Liu Bo
2016-03-31 19:52 ` Liu Bo
2016-03-31 19:52 ` Liu Bo
2016-03-31 1:18 ` Dave Chinner
2016-03-31 1:18 ` Dave Chinner
2016-03-31 7:54 ` Christoph Hellwig
2016-03-31 7:54 ` Christoph Hellwig
2016-03-31 11:18 ` Dave Chinner
2016-03-31 11:18 ` Dave Chinner
2016-03-31 18:08 ` J. Bruce Fields
2016-03-31 18:08 ` J. Bruce Fields
2016-03-31 18:19 ` Darrick J. Wong
2016-03-31 18:19 ` Darrick J. Wong
[not found] ` <20160331180821.GD22462-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2016-03-31 19:47 ` Andreas Dilger
2016-03-31 19:47 ` Andreas Dilger
2016-03-31 19:47 ` Andreas Dilger
[not found] ` <779E9BCF-8224-44FE-8AAE-E0341A7B475C-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2016-03-31 22:20 ` Dave Chinner
2016-03-31 22:20 ` Dave Chinner
2016-03-31 22:20 ` Dave Chinner
2016-03-31 22:34 ` J. Bruce Fields
2016-03-31 22:34 ` J. Bruce Fields
2016-04-01 0:33 ` Dave Chinner
2016-04-01 0:33 ` Dave Chinner
2016-04-01 2:00 ` J. Bruce Fields
2016-04-01 2:00 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160331003242.GA5813@localhost.localdomain \
--to=bo.li.liu-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
--cc=darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.