From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Nick Piggin <npiggin@suse.de>, Josef Bacik <josef@redhat.com>,
linux-fsdevel@vger.kernel.org, chris.mason@oracle.com,
hch@infradead.org, akpm@linux-foundation.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC] new ->perform_write fop
Date: Sat, 22 May 2010 10:27:59 +1000 [thread overview]
Message-ID: <20100522002759.GA8120@dastard> (raw)
In-Reply-To: <20100521185846.GD10149@quack.suse.cz>
On Fri, May 21, 2010 at 08:58:46PM +0200, Jan Kara wrote:
> On Fri 21-05-10 09:05:24, Dave Chinner wrote:
> > On Thu, May 20, 2010 at 10:12:32PM +0200, Jan Kara wrote:
> > > b) E.g. ext4 can do even without hole punching. It can allocate extent
> > > as 'unwritten' and when something during the write fails, it just
> > > leaves the extent allocated and the 'unwritten' flag makes sure that
> > > any read will see zeros. I suppose that other filesystems that care
> > > about multipage writes are able to do similar things (e.g. btrfs can
> > > do the same as far as I remember, I'm not sure about gfs2).
> >
> > Allocating multipage writes as unwritten extents turns off delayed
> > allocation and hence we'd lose all the benefits that this gives...
> Ah, sorry. That was a short-circuit in my brain. But when we do delayed
> I don't see why we should actually do any hole punching... The write needs
> to:
> a) reserve enough blocks for the write - I don't know about other
> filesystems but for ext4 this means just incrementing a counter.
> b) copy data page by page
> c) release part of reservation (i.e. decrement counter) if we actually
> copied less than we originally thought.
>
> Am I missing something?
Possibly. Delayed allocation is made up of two parts - space
reservation and recording the regions of delayed allocation in an
extent tree, page/bufferhead state or both.
In XFS, these two steps happen in the same get_blocks call, but the
result of that is we have to truncate/punch delayed allocate extents
out just like normal extents if we are not going to use them. Hence
a reserve/allocate interface allows us to split the operation -
reserve ensures we have space for the delayed allocation, allocate
inserts the delayed extents into the inode extent tree for later
real allocation during writeback. Hence the unreserve call can
simply be accounting - it has no requirement to punch out delayed
extents that may have already been allocated, just do work on
counters.
btrfs already has this split design - it reserves space, does the
copy, then marks the extent ranges as delalloc once the copy has
succeeded, otherwise it simply unreserves the unused space.
Once again, I don't know if ext4 does this internal delayed
allocation extent tracking or whether it just uses page state to
track those extents, but it would probably still have to use the
allocate call to mark all the pages/bufferheads as delalloc so
that uneserve didn't have to do any extra work.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2010-05-22 0:27 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-12 21:24 [RFC] new ->perform_write fop Josef Bacik
2010-05-13 1:39 ` Josef Bacik
2010-05-13 15:36 ` Christoph Hellwig
2010-05-14 1:00 ` Dave Chinner
2010-05-14 3:30 ` Josef Bacik
2010-05-14 5:50 ` Nick Piggin
2010-05-14 7:20 ` Dave Chinner
2010-05-14 7:33 ` Nick Piggin
2010-05-14 6:41 ` Dave Chinner
2010-05-14 7:22 ` Nick Piggin
2010-05-14 8:38 ` Dave Chinner
2010-05-14 13:33 ` Chris Mason
2010-05-18 6:36 ` Nick Piggin
2010-05-18 8:05 ` Dave Chinner
2010-05-18 10:43 ` Nick Piggin
2010-05-18 12:27 ` Dave Chinner
2010-05-18 15:09 ` Nick Piggin
2010-05-19 23:50 ` Dave Chinner
2010-05-20 6:48 ` Nick Piggin
2010-05-20 20:12 ` Jan Kara
2010-05-20 23:05 ` Dave Chinner
2010-05-21 9:05 ` Steven Whitehouse
2010-05-21 13:50 ` Josef Bacik
2010-05-21 14:23 ` Nick Piggin
2010-05-21 15:19 ` Josef Bacik
2010-05-24 3:29 ` Nick Piggin
2010-05-22 0:31 ` Dave Chinner
2010-05-21 18:58 ` Jan Kara
2010-05-22 0:27 ` Dave Chinner [this message]
2010-05-24 9:20 ` Jan Kara
2010-05-24 9:33 ` Nick Piggin
2010-06-05 15:05 ` tytso
2010-06-06 7:59 ` Nick Piggin
2010-05-21 15:15 ` Christoph Hellwig
2010-05-22 2:31 ` Nick Piggin
2010-05-22 8:37 ` Dave Chinner
2010-05-24 3:09 ` Nick Piggin
2010-05-24 5:53 ` Dave Chinner
2010-05-24 6:55 ` Nick Piggin
2010-05-24 10:21 ` Dave Chinner
2010-06-01 6:27 ` Nick Piggin
2010-05-24 18:40 ` Joel Becker
2010-05-17 23:35 ` Jan Kara
2010-05-18 1:21 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100522002759.GA8120@dastard \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=josef@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).