From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: bpm@sgi.com, xfs@oss.sgi.com
Subject: Re: Issues with delalloc->real extent allocation
Date: Thu, 20 Jan 2011 00:31:47 +1100 [thread overview]
Message-ID: <20110119133147.GN16267@dastard> (raw)
In-Reply-To: <20110119120321.GC12941@infradead.org>
On Wed, Jan 19, 2011 at 07:03:21AM -0500, Christoph Hellwig wrote:
> On Wed, Jan 19, 2011 at 10:18:32AM +1100, Dave Chinner wrote:
> > Except for the fact we use the delalloc state from the buffer to
> > trigger allocation after mapping. We could probably just use
> > isnullstartblock() for this - only a mapping from a buffer over a
> > delalloc range should return a null startblock.
>
> isnullstartblock returns true for both DELAYSTARTBLOCK and
> HOLESTARTBLOCK, so we want to be explicit we can check for
> br_startblock == DELAYSTARTBLOCK.
True.
>
> Note that we also need to explicitly check for br_state ==
> XFS_EXT_UNWRITTEN to set the correct type for the ioend structure.
Yes. As i mentioned on IRC I hacked a quick prototype together to
test this out and did exactly this. ;)
>
> > XFS_BMAPI_IGSTATE does a couple of things. Firstly, It prevents
> > xfs_bmapi() from allocating new extents (turns off write mode).
> > This isn't an issue where it is used because neither of the call
> > sites set XFS_BMAPI_WRITE.
>
> I've been down enough to understand what it does. But yes, the
> single large I/O mapping might explain why we want it. The next
> version of xfs_map_blocks will get a comment for it..
>
> > In fact, we probably should be setting this for normal written
> > extents as well, so that the case:
> >
> > A B C
> > +---------------+-----------------+
> > written unwritten
> >
> > is also handled with the same optimisation. That makes handling
> > unwritten and overwrites identical, with only delalloc being
> > different. If we assume delalloc when we get a null startblock,
> > then we don't need to look at the buffer state at all for the
> > initial mapping.
>
> With the current xfs_bmapi code that won't work. When merging a second
> extent into the first we only add up the br_blockcount. So for the
> case above we'd get an extent returned that's not marked as unwrittent
> and we wouldn't mark the ioend as unwrittent and thus perform not
> extent conversion after I/O completion. Just adding XFS_BMAPI_IGSTATE
> blindly for the delalloc case on the other hand is fine, as the
> merging of delayed extents is handled in a different if clause that
> totally ignores XFS_BMAPI_IGSTATE.
>
> The potention fix for this is to always set br_state if one of the
> merged extents is an unwrittent extent. The only other caller is
> xfs_getbmap which does report the unwrittent state to userspace,
> but already is sloppy for merging the other way if BMV_IF_PREALLOC
> is not set, so I can't see how beening sloppy this way to makes any
> difference.
Yup:
@@ -4827,6 +4827,18 @@ xfs_bmapi(
ASSERT(mval->br_startoff ==
mval[-1].br_startoff + mval[-1].br_blockcount);
mval[-1].br_blockcount += mval->br_blockcount;
+ /*
+ * if one of the extent types is unwritten, make sure
+ * the extent is reported as unwritten so the caller
+ * always takes the correct action for unwritten
+ * extents. This means we always return consistent
+ * state regardless of whether we find a written or
+ * unwritten extent first.
+ */
+ if (mval[-1].br_state != XFS_EXT_UNWRITTEN &&
+ mval->br_state == XFS_EXT_UNWRITTEN) {
+ mval[-1].br_state = XFS_EXT_UNWRITTEN;
+ }
} else if (n > 0 &&
mval->br_startblock == DELAYSTARTBLOCK &&
mval[-1].br_startblock == DELAYSTARTBLOCK &&
> > It seems to me that with such modifications, the only thing that we
> > are using the bufferhead for is the buffer_uptodate() flag to
> > determine if we should write the block or not. If we can find some
> > other way of holding this state then we don't need bufferheads in
> > the write path at all, right?
>
> There's really two steps. First we can stop needing buffers headers
> for the space allocation / mapping. This is doable with the slight
> tweak of XFS_BMAPI_IGSTATE semantics.
>
> We still do need to set BH_Delay or BH_Unwrittent for use in
> __block_write_begin and block_truncate_page, but they become completely
> interchangeable at that point.
>
> If we want to completely get rid of buffers heads things are a bit
> more complicated. It's doable as shown by the _nobh aops, but we'll
> use quite a bit of per-block state that needs to be replaced by per-page
> state,
Sure, or use a similar method to btrfs which stores dirty state bits
in a separate extent tree. Worst case memory usage is still much
less than a bufferhead per block...
> and we'll lose the way to cache the block number in the buffer
> head. While we don't make use of that in writepage we do so in
> the write path, although I'm not sure how important it is. If we
> get your multi-page write work in it probably won't matter any more.
The only place we use bh->b_blocknr is for ioend manipulation. Am I
missing something else?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-01-19 13:29 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-14 0:29 Issues with delalloc->real extent allocation Dave Chinner
2011-01-14 16:40 ` Geoffrey Wehrman
2011-01-14 22:59 ` Dave Chinner
2011-01-15 4:16 ` Geoffrey Wehrman
2011-01-17 5:18 ` Dave Chinner
2011-01-17 14:37 ` Geoffrey Wehrman
2011-01-18 0:24 ` Dave Chinner
2011-01-18 14:30 ` Geoffrey Wehrman
2011-01-18 20:40 ` Christoph Hellwig
2011-01-18 22:03 ` Dave Chinner
2011-01-14 21:43 ` bpm
2011-01-14 23:32 ` bpm
2011-01-14 23:50 ` bpm
2011-01-14 23:55 ` Dave Chinner
2011-01-17 20:12 ` bpm
2011-01-18 1:44 ` Dave Chinner
2011-01-18 20:47 ` Christoph Hellwig
2011-01-18 23:18 ` Dave Chinner
2011-01-19 12:03 ` Christoph Hellwig
2011-01-19 13:31 ` Dave Chinner [this message]
2011-01-19 13:55 ` Christoph Hellwig
2011-01-20 1:33 ` Dave Chinner
2011-01-20 11:16 ` Christoph Hellwig
2011-01-21 1:59 ` Dave Chinner
2011-01-20 14:45 ` Geoffrey Wehrman
2011-01-21 2:51 ` Dave Chinner
2011-01-21 14:41 ` Geoffrey Wehrman
2011-01-23 23:26 ` Dave Chinner
2011-01-17 0:28 ` Lachlan McIlroy
2011-01-17 4:37 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110119133147.GN16267@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=hch@infradead.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox