From: Christoph Hellwig <hch@lst.de>
To: Brian Foster <bfoster@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>,
linux-xfs@vger.kernel.org, darrick.wong@oracle.com
Subject: Re: [PATCH v2] xfs: fix COW writeback race
Date: Tue, 17 Jan 2017 15:37:21 +0100 [thread overview]
Message-ID: <20170117143721.GA32641@lst.de> (raw)
In-Reply-To: <20170117134421.GB12426@bfoster.bfoster>
On Tue, Jan 17, 2017 at 08:44:21AM -0500, Brian Foster wrote:
> Any reason we don't try to address the core race rather than shake up
> the affected code to accommodate it?
I think there are two aspects to the whole thing. One is the way
xfs_bmapi_write currently works is fundamentally wrong - if the
caller only needs a conversion from delalloc to real space trying
to allocate space is always wrong and we should catch it early.
The second is if we should do the eager conversion of the whole
found extent for either the data and/or the COW fork.
> I ask for a couple reasons: 1.) I'm
> not quite following the specific race from the description and 2.) I
> considered doing the exact same thing at first for the eofblocks i_size
> issue, but more digging rooted out the problem in the eofblocks code.
> This one may not be as straightforward a fix, of course... (but if not,
> the commit log should probably explain why).
My hope was that the long commit message explained the issue, but
I guess I need to go into even more details.
The writeback code (both COW and real) works like this
take ilock
read in an extent at offset O
drop ilock
if extent is delalloc:
while !done with the whole extent
take ilock
convert partial extent to a real allocation
drop ilock
But if multiple threads are doing writeback on pages next to
each other another thread might have already converted parts
of all of the extent found in the beginning to a real allocation.
That on it's own is not a problem because xfs_bmapi_write
handles a call to allocate an already real allocation as a no-op.
But with the COW code moving real extents from the COW fork to
the data fork after I/O completion this can become a problem,
because we now might not only have delalloc or real extents in
the area covered by extent returned from the inital xfs_bmapi_read
call, but also a hole in the COW work. At which point it blows up.
As for why we're doing the eager conversion: at least for the data
fork this was initentional to get better allocation patterns, I
remember a discussion with Dave on that a long time ago. Maybe we
shouldn't do it for the COW for to avoid these sorts of races, but
then again without xfs_bmapi_write being stupid the race is harmless.
> What happens in this case if eof is true? It looks like got could be
> bogus, yet we still carry on using it in the post-allocation part of the
> loop.
For an initial EOF lookup it could indeed be bogus. To properly
work it would need something like the trick xfs_bmapi_read uses
for this case.
> The fact that the allocation code breaks out of the loop if
> allocation doesn't occur is a bit of a red flag that the post-allocation
> code may very well expect to always have an allocated mapping.
The post-allocation cleanup code bust handle xfs_bmapi_allocate
returning an error before doing anything, and because of that it's
full of conditionals for everything that could or could not have
happened.
> That aside... if we do want to do something like this, I wonder whether
> it's more cleanly handled by the caller.
I don't see how it could be done in the caller - the caller wants
the bmap code to convert a delayed allocation and not allocate
entirely new blocks. The best way to do so is to tell the bmapi
code not do allocate new blocks. Now if you mean splitting up
xfs_bmapi_write into different functions for allocating real blocks,
converting delalloc blocks or just flipping the unwritten bit: that's
something I'd like to look into - the current interface is just
too confusing.
next prev parent reply other threads:[~2017-01-17 14:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-17 7:48 [PATCH v2] xfs: fix COW writeback race Christoph Hellwig
2017-01-17 13:44 ` Brian Foster
2017-01-17 14:37 ` Christoph Hellwig [this message]
2017-01-17 17:14 ` Brian Foster
2017-01-17 18:39 ` Darrick J. Wong
2017-01-17 18:58 ` Brian Foster
2017-01-17 20:02 ` Darrick J. Wong
2017-01-18 8:45 ` Christoph Hellwig
2017-01-18 8:49 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170117143721.GA32641@lst.de \
--to=hch@lst.de \
--cc=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).