From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: bpm@sgi.com, xfs@oss.sgi.com
Subject: Re: Issues with delalloc->real extent allocation
Date: Thu, 20 Jan 2011 00:31:47 +1100 [thread overview]
Message-ID: <20110119133147.GN16267@dastard> (raw)
In-Reply-To: <20110119120321.GC12941@infradead.org>
On Wed, Jan 19, 2011 at 07:03:21AM -0500, Christoph Hellwig wrote:
> On Wed, Jan 19, 2011 at 10:18:32AM +1100, Dave Chinner wrote:
> > Except for the fact we use the delalloc state from the buffer to
> > trigger allocation after mapping. We could probably just use
> > isnullstartblock() for this - only a mapping from a buffer over a
> > delalloc range should return a null startblock.
>
> isnullstartblock returns true for both DELAYSTARTBLOCK and
> HOLESTARTBLOCK, so we want to be explicit we can check for
> br_startblock == DELAYSTARTBLOCK.
True.
>
> Note that we also need to explicitly check for br_state ==
> XFS_EXT_UNWRITTEN to set the correct type for the ioend structure.
Yes. As i mentioned on IRC I hacked a quick prototype together to
test this out and did exactly this. ;)
>
> > XFS_BMAPI_IGSTATE does a couple of things. Firstly, It prevents
> > xfs_bmapi() from allocating new extents (turns off write mode).
> > This isn't an issue where it is used because neither of the call
> > sites set XFS_BMAPI_WRITE.
>
> I've been down enough to understand what it does. But yes, the
> single large I/O mapping might explain why we want it. The next
> version of xfs_map_blocks will get a comment for it..
>
> > In fact, we probably should be setting this for normal written
> > extents as well, so that the case:
> >
> > A B C
> > +---------------+-----------------+
> > written unwritten
> >
> > is also handled with the same optimisation. That makes handling
> > unwritten and overwrites identical, with only delalloc being
> > different. If we assume delalloc when we get a null startblock,
> > then we don't need to look at the buffer state at all for the
> > initial mapping.
>
> With the current xfs_bmapi code that won't work. When merging a second
> extent into the first we only add up the br_blockcount. So for the
> case above we'd get an extent returned that's not marked as unwrittent
> and we wouldn't mark the ioend as unwrittent and thus perform not
> extent conversion after I/O completion. Just adding XFS_BMAPI_IGSTATE
> blindly for the delalloc case on the other hand is fine, as the
> merging of delayed extents is handled in a different if clause that
> totally ignores XFS_BMAPI_IGSTATE.
>
> The potention fix for this is to always set br_state if one of the
> merged extents is an unwrittent extent. The only other caller is
> xfs_getbmap which does report the unwrittent state to userspace,
> but already is sloppy for merging the other way if BMV_IF_PREALLOC
> is not set, so I can't see how beening sloppy this way to makes any
> difference.
Yup:
@@ -4827,6 +4827,18 @@ xfs_bmapi(
ASSERT(mval->br_startoff ==
mval[-1].br_startoff + mval[-1].br_blockcount);
mval[-1].br_blockcount += mval->br_blockcount;
+ /*
+ * if one of the extent types is unwritten, make sure
+ * the extent is reported as unwritten so the caller
+ * always takes the correct action for unwritten
+ * extents. This means we always return consistent
+ * state regardless of whether we find a written or
+ * unwritten extent first.
+ */
+ if (mval[-1].br_state != XFS_EXT_UNWRITTEN &&
+ mval->br_state == XFS_EXT_UNWRITTEN) {
+ mval[-1].br_state = XFS_EXT_UNWRITTEN;
+ }
} else if (n > 0 &&
mval->br_startblock == DELAYSTARTBLOCK &&
mval[-1].br_startblock == DELAYSTARTBLOCK &&
> > It seems to me that with such modifications, the only thing that we
> > are using the bufferhead for is the buffer_uptodate() flag to
> > determine if we should write the block or not. If we can find some
> > other way of holding this state then we don't need bufferheads in
> > the write path at all, right?
>
> There's really two steps. First we can stop needing buffers headers
> for the space allocation / mapping. This is doable with the slight
> tweak of XFS_BMAPI_IGSTATE semantics.
>
> We still do need to set BH_Delay or BH_Unwrittent for use in
> __block_write_begin and block_truncate_page, but they become completely
> interchangeable at that point.
>
> If we want to completely get rid of buffers heads things are a bit
> more complicated. It's doable as shown by the _nobh aops, but we'll
> use quite a bit of per-block state that needs to be replaced by per-page
> state,
Sure, or use a similar method to btrfs which stores dirty state bits
in a separate extent tree. Worst case memory usage is still much
less than a bufferhead per block...
> and we'll lose the way to cache the block number in the buffer
> head. While we don't make use of that in writepage we do so in
> the write path, although I'm not sure how important it is. If we
> get your multi-page write work in it probably won't matter any more.
The only place we use bh->b_blocknr is for ioend manipulation. Am I
missing something else?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-01-19 13:29 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-14 0:29 Issues with delalloc->real extent allocation Dave Chinner
2011-01-14 16:40 ` Geoffrey Wehrman
2011-01-14 22:59 ` Dave Chinner
2011-01-15 4:16 ` Geoffrey Wehrman
2011-01-17 5:18 ` Dave Chinner
2011-01-17 14:37 ` Geoffrey Wehrman
2011-01-18 0:24 ` Dave Chinner
2011-01-18 14:30 ` Geoffrey Wehrman
2011-01-18 20:40 ` Christoph Hellwig
2011-01-18 22:03 ` Dave Chinner
2011-01-14 21:43 ` bpm
2011-01-14 23:32 ` bpm
2011-01-14 23:50 ` bpm
2011-01-14 23:55 ` Dave Chinner
2011-01-17 20:12 ` bpm
2011-01-18 1:44 ` Dave Chinner
2011-01-18 20:47 ` Christoph Hellwig
2011-01-18 23:18 ` Dave Chinner
2011-01-19 12:03 ` Christoph Hellwig
2011-01-19 13:31 ` Dave Chinner [this message]
2011-01-19 13:55 ` Christoph Hellwig
2011-01-20 1:33 ` Dave Chinner
2011-01-20 11:16 ` Christoph Hellwig
2011-01-21 1:59 ` Dave Chinner
2011-01-20 14:45 ` Geoffrey Wehrman
2011-01-21 2:51 ` Dave Chinner
2011-01-21 14:41 ` Geoffrey Wehrman
2011-01-23 23:26 ` Dave Chinner
2011-01-17 0:28 ` Lachlan McIlroy
2011-01-17 4:37 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110119133147.GN16267@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=hch@infradead.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.